The Poker AI That Out-Bluffed the Best Humans (English)

Thursday, February 02, 2017

The Poker AI That Out-Bluffed the Best Humans (English)

For almost three weeks, Dong Kim sat at a casino in Pittsburgh and played poker against a machine. But Kim wasn’t just any poker player. This wasn’t just any machine. And it wasn’t just any game of poker.

Покер - игра решённая.

There is more below.
Ниже есть продолжение.

Kim, 28, is among the best players in the world. The machine, built by two computer science researchers at Carnegie Mellon, is an artificially intelligent system that runs on a Pittsburg supercomputer. And for twenty straight days, they played no-limit Texas Hold ‘Em, an especially complex form of poker in which betting strategies play out over dozens of hands.

About halfway through the competition, which ended this week, Kim started to feel like Libratus could see his cards. “I’m not accusing it of cheating,” he said. “It was just that good.” So good, in fact, that it beat Kim and three more of the world’s top human players—a first for artificial intelligence.

During the competition, the creators of Libratus were coy about how the system worked—how it managed to be so successful, how it mimicked human intuition in a way no other machine ever had. But as it turns out, this AI reached such heights because it wasn’t just one AI.

Libratus relied on three different systems that worked together, a reminder that modern AI is driven not by one technology but many. Deep neural networks get most of the attention these days, and for good reason: They power everything from image recognition to translation to search at some of the world’s biggest tech companies. But the success of neural nets has also pumped new life into so many other AI techniques that help machines mimic and even surpass human talents.

Libratus, for one, did not use neural networks. Mainly, it relied on a form of AI known as reinforcement learning, a method of extreme trial-and-error. In essence, it played game after game against itself. Google’s DeepMind lab used reinforcement learning in building AlphaGo, the system that that cracked the ancient game of Go ten years ahead of schedule, but there’s a key difference between the two systems. AlphaGo learned the game by analyzing 30 million Go moves from human players, before refining its skills by playing against itself. By contrast, Libratus learned from scratch.

Through an algorithm called counterfactual regret minimization, it began by playing at random, and eventually, after several months of training and trillions of hands of poker, it too reached a level where it could not just challenge the best humans but play in ways they couldn’t—playing a much wider range of bets and randomizing these bets, so that rivals have more trouble guessing what cards it holds. “We give the AI a description of the game. We don’t tell it how to play,” says Noam Brown, a CMU grad student who built the system alongside his professor, Tuomas Sandholm. “It develops a strategy completely independently from human play, and it can be very different from the way humans play the game.”

But that was just the first stage. During the games in Pittsburgh, a second system would analyze the state of play and focus the attention of the first. With help from the second—an “end-game solver” detailed in a research paper Sandholm and Brown published late Monday—the first system didn’t have to run through all the possible scenarios it had explored in the past. It could run through just some of them. Libratus didn’t just learn before the match. It learned while it was playing.

These two systems alone would have been effective. But Kim and the other players could still find patterns in the machine’s play and exploit them. That’s why Brown and Sandholm built a third system. Each evening, Brown would run an algorithm that could identify those patterns and remove them. “It could compute this overnight and have everything in place the next day,” he says.

If that seems unfair, well, it’s how AI works. It’s not just that AI spans many technologies. Humans are so often in the mix, too, actively improving, running, or augmenting the AI. Libratus is indeed a milestone, displaying a breed of AI that could play a role with everything from Wall Street trading to cybersecurity to auctions and political negotiations. “Poker has been one of the hardest games for AI to crack, because you see only partial information about the game state,” says Andrew Ng, who helped found Google’s central AI lab and is now chief scientist at Baidu. “There is no single optimal move. Instead, an AI player has to randomize its actions so as to make opponents uncertain when it is bluffing.”

Libratus did this in the extreme. It would randomize its bets in ways that are well beyond even the best players. And if that didn’t work, Brown’s nighttime algorithm would fill the hole. A finanical trader could work the same way. So could a diplomat. It’s a powerful and rather unsettling proposition: a machine that can out-bluff a human.