DeepMind’s AI became a superhuman chess player in just a few hours.
The descendant of DeepMind’s world champion Go program stretches its chess muscles
The goal for Google’s AI subsidiary DeepMind was never beating people at board games. It’s always been about creating something akin to a combustion engine for intelligence — a generic thinking machine that can be applied to a broad range of challenges. The company is still a long way off achieving this goal, but fascinating new research published by its scientists this week suggests they’re at least headed down the right path.
In the paper, (Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm), DeepMind describes how a descendant of the AI program that first conquered the board game Go has taught itself to play a number of other games at a superhuman level. After just eight hours of self-play, the program bested the AI that first beat the human world Go champion; and after four hours of training, it beat the current world champion chess-playing program, Stockfish. Then for a victory lap, it trained for just two hours and polished off one of the world’s best shogi-playing programs named Elmo (shogi being a Japanese version of chess that’s played on a bigger board).
One of the key advances here is that the new AI program, named AlphaZero, wasn’t specifically designed to play any of these games. In each case, it was given some basic rules (like how knights move in chess, and so on) but was programmed with no other strategies or tactics. It simply got better by playing itself over and over again at an accelerated pace — a method of training AI known as “reinforcement learning.”
Using reinforcement learning in this way isn’t new in and of itself. DeepMind’s engineers used the same method to create AlphaGo Zero; the AI program that was unveiled this October. But, as this week’s paper describes, the new AlphaZero is a “more generic version” of the same software, meaning it can be applied to a broader range of tasks without being primed beforehand.
The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case. That’s a new feat for the world of AI.
This takes DeepMind just that little bit closer to building the generic thinking machine the company dreams of, but major challenges lie ahead. When DeepMind CEO Demis Hassabis showed off AlphaGo Zero earlier this year, he suggested that a future version of the program could help with a range of scientific problems — from designing new drugs to discovering new materials. But these problems are qualitatively very different to just playing board games, and a whole lot of work needs to be done to find out how exactly algorithms can tackle them. All we can say for certain now, is that artificial intelligence has definitely moved on from just playing chess.
AI is a very interesting a fast-moving field. If you have yet to interact with Google Home or Amazon’s Alexa you should find the time and start to learn AI from a voice perspective.