Impact

DeepMind AlphaGo





AlphaGo proved that AI systems can learn to solve the most difficult of problems in highly complex domains




AlphaGo

Google Deepmind's AI system, AlphaGo, learned to master the ancient Chinese game of Go - a complex board game of strategy and creativity. Using - in part - reinforcement learning algorithms created by OptimalAI's Chris Watkins, AlphaGo defeated a human Go world champion a decade before experts thought possible. It proved that AI systems can learn how to solve the most challenging problems in highly complex domains.



Q-Learning

OptimalAI's Chris Watkins developed Q-learning, a foundational algorithm in model-free reinforcement learning. It enables agents to learn optimal actions through trial and error without needing to know the environment's dynamics. While AlphaGo uses Monte Carlo Tree Search (MCTS) and policy gradients for its decision-making and strategy, the broader principles of Q-learning remain vital. Q-learning's focus on maximizing long-term rewards and learning from interactions with the environment significantly influenced the evolution of more advanced reinforcement learning techniques, which underpin AlphaGo.

Q-learning offers an incremental approach to dynamic programming that imposes minimal computational demands. It gradually improves the evaluation of action quality in specific states and converges to optimal action-values, provided all actions are sampled sufficiently. The DeepMind team notably applied Q-learning in conjunction with deep learning to master Atari 2600 games, a breakthrough published in Nature in 2015 as 'deep Q-networks' or 'deep reinforcement learning'. This success paved the way for the groundbreaking achievements of AlphaGo.




Deepmind's Agent 57 playing Solaris

The Challege

Go was long considered a challenge for AI. The game is a googol times more complex than chess; 10 to the power of 170 possible board configurations. That’s more than the number of atoms in the known universe. Despite decades of work, the strongest Go computer programs had only achieved the level of human amateurs. Standard AI methods struggled to assess the sheer number of possible moves and lacked the creativity and intuition of human players.


The Go Matches

In October 2015, AlphaGo made history by playing its first match against Fan Hui, the reigning three-time European Go Champion. AlphaGo secured a 5-0 victory, becoming the first AI system to defeat a professional Go player.

Following this success, AlphaGo took on legendary Go master Lee Sedol, an 18-time world champion considered the greatest player of his era. In March 2016, AlphaGo won the highly anticipated match in Seoul, South Korea, with a 4-1 score. Over 200 million people worldwide tuned in to witness this groundbreaking achievement, which was seen as being at least a decade ahead of its time.

AlphaGo’s stellar performance in these matches earned it the prestigious 9-dan professional ranking, marking the first time a computer Go player had attained this top level of certification.

Throughout the competition, AlphaGo displayed remarkable creativity and innovation. In the second game, it made the now-legendary Move 37, a highly unconventional play with only a 1 in 10,000 probability of being chosen. This surprising move led to a victory, challenging traditional Go strategies that had been followed for centuries.

However, in game four, Lee Sedol responded with an equally extraordinary Move 78, also considered to have a 1 in 10,000 chance of being played. Dubbed "God's Touch," this brilliant move allowed Sedol to win the game, echoing the inventive spirit AlphaGo had demonstrated earlier.

Both of these iconic moves have since been studied by Go players at all levels, reshaping strategies and enhancing the understanding of the game.


Publication
GitHub

"I thought AlphaGo was based on probability calculation and that it was merely a machine. But when I saw this move, I changed my mind. Surely, AlphaGo is creative."

Lee Sodol
Winner of 18 Word Go Titles


Image
Lee Sedol, a top-ranked Go player, loses the last of five games to AlphaGo, Seoul, March 15 2016

Conclusive Proof

AlphaGo’s triumph provided definitive evidence that neural networks could be applied to highly complex domains, while its use of reinforcement learning demonstrated how machines can learn to solve exceptionally difficult problems independently through trial and error.

The techniques AlphaGo used, such as foresight and strategic planning, remain integral to modern AI systems. DeepMind's successors — AlphaZero, MuZero, and AlphaDev — are continuing to build on AlphaGo’s foundation, tackling increasingly complex challenges.


AlphaGo: The Movie  1:30:27