Over the last month I’ve continued to experiment with different network structures with little to show for it. It’s been a little frustrating. I can’t even get a model that fits the Hand Crafted Evaluation. 🙁 This just reinforces what I had concluded in my last update though- I just don’t have enough training data. I’m still using the ~10m position dataset by Andrew Grant. From reading online (such as this post on talkchess.com), hundreds of millions or even a billion training positions is not uncommon. So, how do I get my hands on that many training positions?
I could most likely find a source online if I looked around; perhaps some other authors would share. But, that doesn’t seem very satisfying to me. I’d rather generate them myself, if that’s practical. So, just to do some napkin math: let’s say I want a set of 100m training positions. Let’s also assume that each training game will yield 5 training positions. That means I’d need on the order of 20m unique games! If I wanted a set of 250m training positions (which doesn’t seem to be an unreasonable number at all), I’d need something like 50m unique games. That’s a lot of games!
I’m not philosophically opposed to scraping *games* from online sources, because the games themselves still have to be translated into training positions, and even in that there seems to be some room for experimentation. Another possibility is to play very fast games from random starting positions. I’m not entirely sure how fast I can produce these on my hardware without some experimentation though. I think I will do some experiments before deciding on my “game gathering” strategy.
Another area that needs some work: my home grown machine learning library is far too slow for this. Judging by its performance on the MNIST image classification problem, the library works just fine, but chess seems to be a very extreme use case. I don’t think there’s a way around using hardware level instructions (SIMD, AVX) to have speeds that are acceptable for a chess engine. I think I’m a pretty good bit twiddler, but hardware level stuff isn’t my specialty. If it were just a matter of training faster, I could look into using some other training library (maybe nnue-pytorch, I’m not sure). But, the engine still needs to be able to run a forward pass through the network to make predictions during the game, so it seems there’s really no way around it — I need to figure this out. On the Java front, there is JEP 438 which is part of the latest JDK 20 release. That seems like a promising direction for chess4j. Or, perhaps there is already some Java based linear algebra library that would handle that job for me. I don’t think I’ll ever port the training code to Prophet, but I would like to do the forward pass code, so I’ll have to figure that out in C as well.
Lots to do! In summary, I think the plan for now is to get my machine working on building training data, then while it works I’ll do my homework on making my Java based machine learning library more accurate and faster.