translated from original article in Chinese:【火线速递】——李世石的策略与AlphaGo的弱点
— Li, Zhe (6p)
(translated by: Yi Tong 3d, Chun Sun 5d, Michael Chen 7d)
[You may need to open in Google Chrome for all the pictures to show]
Today is a memorable day in human history.
In the very first game of Go between the human champion and the computer program AlphaGo, the human champion lost.
Pre-match predictions was predominantly in favor of human. Go players unanimously picked human champion Lee Sedol. The scientists’ communityhad more mixed opinions.
Those who bet on Lee did not do so entirely out of ignorance. Most of them just couldn’t believe it would happen so quickly. Some computer scientists thought the algorithm couldn’t compete against human even after they read and understood the Nature article.
From last year’s five game records, to the publications of the paper by Deepmind, there was plenty of evidence showing many weaknesses of Go AI, many of which people believed could not be resolved in such a short time (October to March), and these problems will cause AI’s downfall against human champion Lee Sedol.
Yet, Lee Sedol lost the first game.
There will be many interpretations of this game. It will be studied over and over, today and decades later; for the perspective of the moves themselves, or maybe from the perspective of the difference between the reasoning patterns of humans and machines. It doesn’t matter who will look back into the game, us, or robots in the future. They will feel differently than we do today, so it is our responsibility to record precisely what we think and what we feel now, to provide our descendants some reference of our ridiculous ideas, misunderstandings, and maybe a few true insights.
Let’s take a closer look at what really happened in today’s game, which might be the game where opinions from professional players are mostly divided. What I write below is from my personal perspective. I try my best to provide rational analysis and thought, at this moment, when the era of Go AI strikes us most unexpectedly.
Lee Sedol’s Strategy
I think overall, Lee used one probe, and two strategies.
1) The probe: New opening
For move 7, Lee didn’t form a connection by playing on the top side. Instead, he played the right side. This was never played in any professional games.
This irregular opening was Lee’s probe to AlphaGo. In the 5 games between AlphaGo and Fan Hui, all the openings were “normal” openings. These were all popular openings at a time, even if not much in vogue today.
By deviating from popular openings and even those that are out of date, Lee tested to AlphaGo’s versatility in the opening part of the game. We know that AlphaGo has learned from massive existing Go game records, and now, what would it do against an opening that has never occurred historically?
AlphaGo gave a perfect answer:
White’s approaching at move 8 is normal. Black’s pincer from move 9 will lead to intense fighting. Now, white’s move 10 is spectacular.
In normal situations, when white’s move 8 gets attacked by two space high pincer (black 9), the move 10 at (3,3) is something does not exist and considered suboptimal. You can’t find it in any joseki. There are many choices from joseki books to answer black 9, but non of these will tell you to play move 10.
Yet I think move 10 is a good move.
It reduces black move 7’s efficiency. This compensates for the slight unfavorable position of white in the upper-right corner. Overall white loses nothing.
So yes, AI does think in a global manner! In the past, AIs memorize joseki from existing game databases. A joseki is a locally optimized move sequences acceptable to both sides. Josekis are developed over centuries. However, josekis should not be used blindly, instead they should be adapted in accordance with the surrounding circumstances. From this game, after black 9 pincer, if white took the usual “magic sword” joseki and made move at black 15’s position, the outcome would have made black 7 to be in the perfect position with good efficiency.
Instead, white chose a move that can’t be found in joseki books, with a locally sub-optimal result, yet reduced the efficiency of a far-away opponent stone black 7. This clearly proved two things:
- The computer doesn’t play by “memorize and copy” game records.
- The computer thinks globally instead of just locally.
Of course, we already knew the first point from the game records with Fan Hui. In the regular openings with Fan Hui, AI deviated from the Joseki book a few times. Once in the “large avalanche” joseki, the order of the sequence was different from the book. Also, the AI’s playing order was logically weaker than the book’s. Notice that we have a logically weaker move instead of a heuristically weaker move here. i.e. The order AI played was worse than the book in an absolute manner. This difference would only give AI a disadvantage. Though the final board configuration ended up correctly, the order of it being played out was “wrong” from a human point of view. This showed that AlphaGo didn’t memorize joseki and database, however it also revealed AI’s lack of “logic”, which resulted in a wrong order of sequence on the board. This was one of the reasons that professional players didn’t favor AlphaGo before the match.
However, in this game, we didn’t see this ordering issue, at least not obviously. (There was a slightly similar problem that we will mention later.)
The second point shows the true power of AI in the opening. It dodged Lee’s joseki trap easily, and made a choice outside of the book with a global vision.
The first point is a known feature from Fan Hui’s game, and the second point is a perfect answer to the probe from a top human player. If this AI were only built from “local pattern recognition” with massive game record database, it wouldn’t have accomplished this.
2) Strategy 1: Open and complex board position
With the perfect answer from AI after Lee’s first probe, the human champion used one of his main strategies of leading the game into open and complex board position. Looking back, it was exactly this very strategy that put Lee behind. However, nobody expected this turn of events.
We don’t know if Lee Sedol has ever taken advice from computer experts. However, after seeing this strategy, it’s obvious that he knew what a good response to the AI would be.
As we know, the power of deep learning is to prune the search tree with a policy network and a value network to reduce the size of the search space. This is reflected in us as “human intuition”. On top of that, Monte-Carlo tree search is performed and finally the best candidate move is chosen. In Go, sometimes we have a “closed” area for calculation as in Tsumego problems, for these, AI can exhaust all possible moves to complete the calculation and know where to play, this is the “brute force” method. However, in “open and complex” positions, every move produces a butterfly-effect with wide and far-reaching impact, we call this “Pull one hair to move the whole body”. In these situations, there are much, much more candidates, and stupendously wider and deeper search spaces behind each. A tiny mistake in calculation and even just a misevaluation of a positoin down-stream can quickly result in loss of the whole game. This is the most difficult situation for professional players. This ability to evaluate positions is the most difficult part of the game to master.
Before the match, someone predicted that AlphaGo’s pruning strategy would be less effective when given open and complex positions, in which case deep, wide and accurate searches remain necessary. They thought this might be a weak point of AI.
I’m not sure Lee Sedol already knew this, but he decided to lead the game into just such a position:
Without going into details here, black 23 “attach” and 27 “block”, led the game into an open and complex position. Lee had his chance to choose to play a more “peaceful” way, but he didn’t. White 24, 26, and 28 were very acute battle intuitions. White didn’t hesitate a bit to engage a fierce fight.
In my opinion, black initiated this fight at an inopportune time. In a balanced game, we tried to engage a battle if the chance of winning this battle is over 50%. For those professional players who seem more aggressive than the others, they are just more optimistic when predicting their chance of winning.
Lee Sedol is quite an aggressive player. However in this game, the timing of initiating this fight is still too early. He started the attack on shaky grounds. I don’t believe Lee would choose the same play against a human player. He would have engaged the fight with better timing. It would never be a problem for him because he is the master of finding weaknesses.
This game, Lee chose to engage the fight without finishing the opening.
Up to this point, we have six or seven groups entangled together, with many leads. This is a typical open and complex board position.
If we take a closer look, AI had its chance to avoid this:
White 42 could have atari’ed from here (position 1), take out the middle two stones, and exchange with black by sacrificing three stones on the top. After the exchange, this is still a balanced game, and peaceful too.
In the real game, white saved the three stones from the top and let the game evolve into a big fight. This is a more aggressive approach.
So how’s AlphaGo doing after it enters open and complex positions?
The answer is: Perfect.
I was in a studio doing online broadcasting with Yu Bin (9p, head coach of Chinese national Go team). At this point, Yu was worried that white would extend at the marked position to gain territories by taking advantage of the fact that the top black group was not fully alive. AlphaGo played exactly that move seconds after Mr.Yu spoke.
The white’s attach move here is obviously not going to cut black into two groups or live by itself, because black can easily ladder away the attached stone. This is a move that is very difficult to think of for amateur level players. The white group on the right need some defense, it would be intuitive for white to push down one more move on the right. However, AI’s move at the marked move is a counter-attack during the defense, it intends to sacrifice itself to fix white’s shape and leave black unable to attack the whole group. All these reflects advanced concepts that human players have developed.
Of course, black can easily capture the stone, but the price is to atari at position 1 and getting peeped at position 4. Professional players can judge that white gains by sacrificing the single stone. What surprised me is that AI can do the same judgement and “sense” the chance to counter attack in course of running and defending.
There are still discussions about whether the attaching move is the best move or not among professional players. However, what I’d like to emphasize is that the fact that AI played out this move is a true showcase of its reading ability. For those who thought that AI wouldn’t sacrifice stones or wouldn’t grasp advanced concepts of go playing, after this counter attack, they know they are wrong.
To this point, without going into details, white has been playing very well when given open and complex board position. White showed no hesitation in sacrificing stones or gaining points. Lee Sedol’s first strategy had failed.
This proves that the power of AI doesn’t have struggle when facing open and complex positions. AI defended well and took lead when Lee initiated an unfavorable fight and backfired.
Maybe Lee Sedol realized that this strategy didn’t work, so he adjusted and started the second strategy.
3) Strategy 2: Close game
Since AlphaGo was not afraid of complex fighting, Lee stopped and started to steer the game into a close game.
In close game situation, two players need to be very careful at every and each local territorial battle, one or two mistakes won’t lose many points, but can be vital enough to decide the outcome of the game.
Move 77 marks the starting of the second strategy. It is the cease-fire declaration. Lee wants to use his strength in the second half in a close game position. This was intentional. Lee likely planned to approached lower-left corner and then induce the middle two white stones to escape and continue the battle.
However Lee thinks strategy one has finished and it won’t work, hence the strategy two.
AlphaGo’s Questionable Moves
Professional players have consensus about the fact that AlphaGo plays well from the beginning to the middle game. Main disagreement arise with regard to the moves of AlphaGo in the second half.
1) Too slow?
The first questionable move is move 80. Black just approached lower left corner in the previous move, the “normal” way is to answer in the lower left, however, white chose to tenuki, then spent another move to protect in the upper left.
Black didn’t choose to start the fight directly after white triangle move in the lower right. Lee didn’t have the confidence to start the fight in the upper left immediately. In the game, black approached lower left first, the intention of which was to take some territory before any other initiatives.
To answer the approach (black triangle), a normal way would be answering at move 1. However, after black secures the lower territory, Lee could have use move 2 to threaten the upper left corner. If white refuses to retreat in the corner, after move 4 and move 6, the battle favors black. If white retreats at move 3, then black takes upper left corner territory. It’s not easy to get a concrete conclusion here as so many choices branch out.
White chose to protect upper left and let black take the double approach to the lower right. At this point, opinions of professional players divided, some thought black leads now, others thought white is still OK although the previous move (white 80) was indeed slow.
Is move 80 really slow? I’d like to defer the conclusion later, and move forward in the game.
2) Bad move?
The next questionable move is move 86.
White 86 cut, is something you won’t ever find in a game database. However at this point, I am no longer surprised to whatever AlphaGo plays out. To human players, the intention of the cut is easy to understand: Black is thick in the middle-right, so white wants to sacrifice a stone to get a better shape in the left, to make black to double its thickness and hence creates low-efficiency configurations for black. To human players, this is the most obvious intention of white.
The actual game result as follows. Most of the professional players thought white failed here by quite large amount, because black gained a huge block of territory.
This is the usual variation. White reduces black’s territory while keeping itself alive. Someone thought white gains 6-7 points comparing to the actual game. If that’s true, then white failed a lot by its choice in the actual game.
However, this variation has some risk. I’ll defer a discussion about the risk until later.
A historic move!
After finishing with the lower-left, most human players felt optimistic: Lee Sedol leads now, and AI is so-so.
However, next move by AlphaGo was the most spectacular and shiny move of this whole game:
White 102, peep at the third line on the right!
This move will be remembered in the history of Go, together with many famous moves in the history and beyond.
In the game records of future AI, we might see many more marvelous moves: They might display subtler or deeper understanding. Yet they are not going to challenge the status of this move in the history of Go!
This move was beyond Lee Sedol’s consideration. The response to this move took him the longest time in the entire game, yet still he couldn’t get away without being heavily hit.
Some professional players predicted this move during the live broadcast. Here I want to bring up something interesting and disturbing: Players can consider much less than the amalgamation of numerous observers.
The reason is that, observers can easily switch sides to think about the game, looking for the strongest move for black and white. And players, on the contrary, use most of the time to think about his/her own moves, and would be more likely to overlook opponent’s strongest attacks. If Lee Sedol had predicted this move, maybe he would have peeped before for protection. However as a player, it’s difficult to be careful all the time, especially facing an AI opponent for the first time.
What’s more interesting is, this move requires a huge amount of calculation with many branches. If human players come up with this point as a candidate move, there would still be a huge amount of calculations required, to verify if the move really works. However, the computer played out this move without using more time than any other moves, maybe even shorter than other simpler moves in the human’s eyes.
I can’t help wondering, has the AI read through?
HAS THE AI READ THROUGH?
Here is the result, white captures three stones from the right in sente, then goes back to upper left to protect the corner. BTW, notice how it protects. A lot of game records and most of the players’s intuition would be blocking at one space right to the (3,3) point. But if you read a little bit further, you will realize that AlphaGo has chosen a better point.
At this moment, I thought white was leading by a little bit, but there were others thinking it was still a close game.
The sequence following by move 123 was a mistake by the black side. If black kicked at 1 in the picture above, it might end up better. Still, it requires more study to tell who’s going to win eventually in this picture. My personal opinion is that white would lead a little bit, and my guess would be AlphaGo will still win.
Black’s move 123 lost 6-7 points and immediately decided the game. Eventually both sides have equal points on the board. Lee Sedol couldn’t afford komi and resigned.
We were stunned by AlphaGo’s performance, surprised by Lee’s loss. For the following games, the question is: Did AlphaGo make mistakes?
Has AlphaGo made any mistakes in this game?
Fortunately, as human player, we do see AlphaGo’s obvious mistakes. By “mistakes” here, I mean mistakes that can be proven by logic, not “mistakes” by human heuristics or experience.
a) “Mistake” 1:
White’s 136 to capture the top right stones should be played on the first line. The difference is 1 point.
b) “Mistake” 2:
White’s block at 142, for professional players, this is a clear loss.
The better play is to jump at position 1, later on we have move 5, 7, 9 to kill 2 black stones and rescue 3 white stones. This would be better than actual the game results by 1-2 points. Even white connecting at 8 instead of move 5, it’s still slightly better than the actual game.
All these “mistakes” are local, without affecting the whole board, these are “closed” mistakes, and can easily be proven by logic and math. Compared with AI’s strength everywhere else, these two “mistakes” seem strange.
Some professional players think: “Oh it can’t even read this, AI is so-so.”
I mentioned white’s sequence in the lower-left corner before, someone thought that was a third mistake. However for this one, the judgement is largely based on heuristics and experiences, maybe with some reading, but not complete. As I see it, “this is suspect.”, as Descartes says.
However, the two mistakes I raised above were without a doubt. Why would I put quotation marks on “mistake”?
So here comes the question: What is a “mistake” in a Go game?
c) The definition of “mistake”
To us, the human professional players, what is a mistake? If we exclude everything based on heuristics and experience, and only talk about indisputable calculation and logic, then we can define “mistake” as: If played out, A is more optimal (more points) than B, and I chose B.
In this sense, if we find a “proven more optimal sequence”, we say we made a mistake.
However, to AI, does “mistake” mean the same thing? How do we understand that AI is making mistakes that are so obvious and so much below its reading skill shown everywhere else?
This is about the algorithm. If one day, AI solves Go with complete exhaustive search, then as long as it doesn’t play within the set of optimal sequence, then it’s a mistake. But AI today can’t solve it by exhaustive search.
Today’s AlphaGo use DCNN and MCTS, it’s not looking for the “optimal” solution in human eyes.
MCTS gives out winning rate evaluation after search, and AI chose where to play with that information. That means, AI doesn’t choose the optimal solution, it only chooses the solution with the best winning rate.
So if we talk “mistakes”, I have to in quotes, because from AI’s eyes, they are not mistakes at all!
As human players would see that, logically A is more optimal than B, but AI thought the winning rate of A and B are similar. In order to win the game, A and B have no difference. Maybe it’s easier to make mistakes after playing at A than playing at B, so B might have higher winning rate than A!
If both A and B leads to victory, from AI’s perspective, do you still think it’s a mistake?
But you have to beat AI with this kind of “mistake”, otherwise, we can’t argue that they are “mistakes” at all.
Let’s take a look at the game one more time. White’s protection on the upper-left (move 80), and white’s gaining sente by losing some points in the lower-left, are these bad moves?
The protection on the upper-left means that AI thought protecting has a higher winning rate than answering at lower-left. This judgement was very possible if we consider AI’s understanding on the peep at right (move 102). Even for the lower-left loss might not be a mistake based on the same judgement. All AI wanted was a sente from lower-left and thought the loss was totally fine w.r.t winning rate.
Apparently, Lee Sedol’s evaluation of the game overlooked 102 peep on the right.
The weakness of AlphaGo
So is it true that AlphaGo really invincible? Maybe not. From this game, we can see some weaknesses of AlphaGo. The question is, are these enough to influence the outcome of the game with Lee Sedol?
a) Absence of reasoning and derivation.
As I said previously, mistakes to human are not mistakes to AI, that being said, it still suggests some weakness of AlphaGo.
With DCNN + MCTS, AlphaGo has proven its ability in pruning and searching. Human players also need pruning and searching, and can’t beat AI on those two things.
However, AlphaGo lacks logic. It was shown in the games with Fan Hui. The two “mistakes” in this game highlighted it even more.
MCTS plays based on winrate, not “logically A is better than B”. In locally exhaustive situations, reasoning and derivation is sometimes better than probability.
So AI has these “mistakes” because it can be inadequate in logical reasoning. Could human players take advantage of that?
b) Avoiding ko?
It’s a well known fact that AIs don’t do well in ko. Last year in Beijing, the champion AI (DolBaram) didn’t understand double-ko and was stuck in a local double-ko situation. This might be because that, if based on probability only, AI always calculate a quite large chance to win double-ko. Well, it might be some natural consequence of the first weakness. If based on reasoning, AI would know to win a double-ko is impossible.
Now equipped with another weapon of DCNN, can AI avoid this problem? So far I don’t see examples. But it looks like it intentionally tries to avoid ko fights.
Let’s go back to the lower-left corner. This was the strongest sequence for white suggested by professional players. But AlphaGo might worry about ko:
After black 6, a ko would be inevitable. Black could peep at right for a ko threat, then some exchange might happen. Can white benefit from that? I’m not sure without deeper reading.
And here is an interesting endgame situation:
Now white is going to win for sure. The only thing left are the final local settlements of boundary. Black started lower-left invasion and white needs to make life of this group.
This is the real game, not the strongest sequence for white.
The strongest sequence is like this. Double-ko and white is alive. White gains more points.
However, regardless of the reason, whether white tries to avoid ko or trieswas trying to maximize the winrate, AI didn’t play the second variation.
At least, AlphaGo has not shown the ability to handle complex ko situation.
Is that going to be the Archilles heel of AlphaGo?
So far, these are the only possible weaknesses I can see for AlphaGo.
Possible Strategy for Human
Based on my analysis of AlphaGo’s weakness, I don’t think Lee Sedol has many options.
- My most anticipated strategy is, Lee starts with a well studied opening from human players, because AI don’t memorize. Even without gaining advantage with this, at least Lee can keep the balance of the game. Under this situation, maybe AI would make some local mistakes because of lack of reasoning/logic, so human maybe able to keep the small profit all the way to the end, and finally win by marginal points. However I don’t think this fits Lee Sedol’s style.
- Another strategy is to create as many ko as possible in the game, i.e. Create situations where the opponent has to fight a ko. Of course, it doesn’t mean that AlphaGo lacks the ability to handle ko. It’s not known yet. So this is a risky move. We have to naturally maintain balance of the board situation without deliberately creating ko.
If we keep using the narrow and constrictive part of established human understanding to evaluate AlphaGo, we will never even figure out how we lost.