Prophet’s sparring partners

Several months ago I decided to get a little more rigorous about how I test changes to Prophet. With previous versions of Prophet, I would make a change, run some test suites consisting of a few hundred or a thousand tactical positions, play a few games online to convince myself the change was good, and that was it. That doesn’t really cut it any more though. Fruit and other engines that have followed suit have shown the importance of a having a solid testing methodology that accurately measures how effective changes are. So, to that end, I found some sparring partners and took my first measurements:

RankNameElogamesscoreoppo.draws
1matheus101736067%-2016%
2 prophet286 7360 64% -17 14%
3 lime35 7360 56%-712%
4 gerbil -25 7360 46% 5 15%
5 prophet3-20160903 -54 8000 41% 12 17%
6 elephant -138 7360 28% 27 11%

Then Christmas happened, and I found myself with a little spare time to work on Prophet. I implemented bitboards, and then magic bitboards, and a few other speed optimizations. Suddenly it looked like I needed some new sparring partners!

Rank Name Elo games score oppo. draws
1 prophet3-20170118 91 24436 68% -42 17%
2 matheus 65 10648 58% 5 20%
3 prophet2 23 10649 51% 10 16%
4 lime -24 10649 44% 17 13%
5 gerbil -81 10643 35% 24 14%
6 elephant -191 10648 21% 39 10%

Here is what I ended up with. I like this a lot, as Prophet3 is right in the middle with some engines below and above. That seems like a pretty good cross section. As Prophet3 continues to improve, I’ll just add newer strong engines to the mix and drop the bottom ones off.

Rank Name Elo games score oppo. draws
1 myrddin 84 4811 63% -7 18%
2 tcb 46 4810 57% -4 18%
3 Horizon 30 4812 55% -2 17%
4 jumbo 19 4972 53% -2 18%
5 prophet3 7 12733 51% -2 22%
6 madeleine -26 4972 46% 4 20%
7 matheus -36 4812 44% 4 20%
8 Beowulf -67 4812 39% 7 20%
9 prophet2 -71 4812 39% 7 18%