The last couple of weeks have been focused on transitioning P4 from a single threaded engine that plays to a fixed search depth to a multi-threaded engine that can respond to commands while searching, and can search for a fixed period of time.
As soon as the code to use threads for searching was complete, I immediately pulled out an old laptop, got everything up to date, and turned it loose on doing a series of fixed depth self play matches using the fantastic Cute Chess: https://github.com/cutechess/cutechess .
The first test was to run depth 1 vs depth 1, then 2 vs 2, up to 5 vs 5. Each match was 20,000 games using over 1000 starting positions. The results are below. The score itself isn’t interesting, only that the end result was a perfect .500 in each case, and that there were no crashes, stalls, etc.
Depth | Wins | Losses | Draws | Score |
1 | 2255 | 2255 | 15490 | 0.500 |
2 | 7147 | 7147 | 5706 | 0.500 |
3 | 8913 | 8913 | 2174 | 0.500 |
4 | 7804 | 7804 | 4392 | 0.500 |
5 | 9123 | 9123 | 1754 | 0.500 |
100,000 self play games (actually 200,000 if you consider that the program played both sides) is a pretty good indication that it’s stable, but just out of curiosity I also played some fixed depth self play matches where one side was able to search deeper than the other. Naturally you would expect the side that can “see further” to have a large advantage, and indeed that is the case. The table below shows the results of a series of 20,000 game matches.
1 | 2 | 3 | 4 | 5 | |
1 | 0.5 | 0.012 | 0.001 | ||
2 | 0.988 | 0.5 | 0.038 | 0.045 | |
3 | 0.999 | 0.962 | 0.5 | 0.1 | 0.029 |
4 | 0.955 | 0.9 | 0.5 | 0.095 | |
5 | 0.972 | 0.905 | 0.5 |
As you can see, even a 1 ply advantage is enormous, at least at very shallow depths. I believe that advantage would diminish at larger depths, but our branching factor isn’t good enough to play any longer games without it taking forever. Anyway, the point of this exercise wasn’t really to quantify the advantage of a larger search depth, but to test P4’s stability, and I’m happy to report it’s rock solid. P4 has played around 1 million fixed depth games at this point with 0 crashes.
I’m starting another round of testing now, using incremental clocks – 10 seconds per side with 0.5 second increment. I don’t expect any stability issues but I’d like to see that the engine never runs out of time (or at least very, very rarely).