The last time I wrote anything about chess4j, way back in November, I reported that chess4j was using a small in-memory database of opening moves. The program was reading a library of about 500 games from GM Kasparov, which it would do every time the program initialized. Before each move, chess4j would consult this in-memory database to see what moves have been played by Kasparov (or his opponent), and choose between them using a weighted random selection algorithm. The effect of this was that chess4j no longer played the same opening moves, so it added some variability to its online play. It also produced games that were a little more “aesthetic”, or pleasing to look at.
That was a huge step forward, but it had some limitations. Reading all those games takes a little time, so it isn’t really practical to read in more than about 500. It’s also expensive in terms of memory since they are all held in-memory, so there are some limitations there. Finally, it doesn’t leave any room for any learning algorithms since the program is initialized to the same state each time it starts.
To address those issues I needed a solution that persisted data to the disk. In past programs written in C/C++ I’ve created custom file formats and used fseek() and the like to probe it. This time I decided to use a proper database. I had considered MongoDB, mainly just to play around with it, but in the end I decided on SQLite. SQLite is lightweight, reliable, and fast – a perfect fit. (Side note: the primary author of SQLite, Dr. Richard Hipp, was roommates with my graduate school advisor, Dr. Ronnie Smith when they were both working on their PhDs at Duke University. I’ve met him… very nice guy.)
The database table to hold these opening book moves is very simple:
sqlite> .schema book_moves
CREATE TABLE book_moves (key int not null,fromsq int not null,tosq int not null,frequency int default 1,wins int,losses int,draws int);
CREATE INDEX idx_book_moves_key on book_moves(key);
Really, the only fields that are necessary are the first three: key, fromsq, and tosq. ‘key’ is used as a signature for a position. That is, every possible chess position has its own unique value for ‘key’. (I’m not going to get into the algorithm to do that in this post, just accept that it’s true!) ‘fromsq’ and ‘tosq’ are probably self explanatory – they represent the square the piece is moving from and the square it’s moving to. So, if we want to probe the opening book to see what moves it contains from any given position, we just produce the value of “key”, then perform the query:
select fromsq,tosq,frequency,wins,losses,draws from book_moves where key=?
‘frequency’ is a count of how many times the move was encountered as the opening book was constructed. This is what makes the random weighted selection possible. The idea here is that, if move A was encountered 10 times as often as move B, then chances are it’s a better move and I should play it roughly 10 times as often. This approach helps keep a stray “bad move” that may have found its way into the book database from getting on equal footing with good moves.
Finally, we have ‘wins’, ‘losses’, and ‘draws’. Each time chess4j completes a game, it increments the appropriate counter for the moves it played from the opening book (not moves the opponent played). It doesn’t actually do anything with this information yet, but future versions will probably use the information to avoid any moves it consistently loses with.
So, that’s a brief description of the implementation. To seed the database I found an online collection of about 50,000 games played by Master level players or above. The database I’m using currently has nearly 150,000 moves.
Watching chess4j play with this opening book is really fun. Since it doesn’t have to “think,” it plays almost instantaneously when it has a response in the opening book. Below is a segment from logfile. Notice how it begins with “c4”, then parses the opponent’s response “g8f6” and responds with “b1c3”. The frequency counts on the moves it encounters in its opening book gradually diminishes until it has no book response for “c7c6” and has to start thinking on its own.
# parsing: level 0 5 3
# level: 0, 5, 3
# setting increment to 3000 ms.
# parsing: name Pepo2424
# opponent is: Pepo2424
# parsing: rating 1989 1611
# parsing: time 30000
#time : 300000
# parsing: otim 30000
# parsing: go
# book move: BookMove [move=c2c4, frequency=8939]
move c2c4
# parsing: time 30300
#time : 303000
# parsing: otim 30100
# parsing: usermove g8f6
# book move: BookMove [move=b1c3, frequency=1520]
move b1c3
# parsing: time 30600
#time : 306000
# parsing: otim 30100
# parsing: usermove g7g6
# book move: BookMove [move=g2g3, frequency=182]
move g2g3
# parsing: time 30900
#time : 309000
# parsing: otim 30300
# parsing: usermove f8g7
# book move: BookMove [move=f1g2, frequency=225]
move f1g2
# parsing: time 31200
#time : 312000
# parsing: otim 30400
# parsing: usermove d7d6
# book move: BookMove [move=e2e4, frequency=24]
move e2e4
# parsing: time 31400
#time : 314000
# parsing: otim 30500
# parsing: usermove e7e5
# book move: BookMove [move=g1e2, frequency=5]
move g1e2
# parsing: time 31700
#time : 317000
# parsing: otim 30600
# parsing: usermove e8g8
# book move: BookMove [move=e1g1, frequency=12]
move e1g1
# parsing: time 32000
#time : 320000
# parsing: otim 30200
# parsing: usermove c8e6
# book move: BookMove [move=d2d3, frequency=7]
move d2d3
# parsing: time 32300
#time : 323000
# parsing: otim 30300
# parsing: usermove c7c6
# transposition table initialized with 1048576 entries.
# time remaining: 323000, increment: 3000, allotted time: 15920
1 -183 1 2 c3d5
1 -1 3 4 c3b1
1 4 4 6 c3a4
1 17 4 10 f1e1
1 20 4 12 h2h3
1 22 4 14 h2h4
1 22 6 39 h2h4
2 7 6 78 h2h4 b8d7
2 12 12 230 d1b3 b7b5
2 12 12 246 d1b3 b7b5
3 17 21 368 d1b3 b7b5 h2h4
3 17 28 1500 d1b3 b7b5 h2h4
4 7 45 3541 d1b3 d8c7 h2h4 b8d7
4 7 79 5825 d1b3 d8c7 h2h4 b8d7
5 12 121 9522 d1b3 d8c7 h2h4 b8d7 a2a4
5 12 139 15133 d1b3 d8c7 h2h4 b8d7 a2a4
6 7 190 23677 d1b3 d8c7 h2h4 b8d7 a2a4 d7c5
6 9 293 38345 h2h4 b8d7 a2a4 a7a6 f2f4 h7h5
6 9 339 64469 h2h4 b8d7 a2a4 a7a6 f2f4 h7h5
7 10 440 97805 h2h4 b8d7 a2a4 a7a5 b2b3 d7c5 f2f4
7 10 640 172694 h2h4 b8d7 a2a4 a7a5 b2b3 d7c5 f2f4
8 7 817 243868 h2h4 b8d7 a2a4 a7a5 f2f4 d7c5 f4e5 d6e5
# hash probes: 628514
# hash hits: 58960 (9.38%)
# hash collisions: 31636 (5.03%)
# fail highs: 32795 (5.22%)
# fail lows: 2422 (0.39%)
# exact scores: 37 (0.01%)
move h2h4
-----SNIP-----
# parsing: result 1-0 {Pepo2424 resigns}
This improvement was a lot of fun to implement and it’s a lot of fun to watch. chess4j’s online opponents seem to like it too:
Ara61 says: Thank you for the great game!
I’ve also made some great improvements to the engine’s evaluation routine, but that will be the subject of another post very soon! And, as always the latest development code is available from the project website. These changes are not included in the last official “release” but will be in the 3.0 release coming within the next few weeks.