Friday, June 19, 2009

23 lines

I was thinking about how little opening theory can be applied to my games, so I conducted a little exercise with Chessbase and a database of about 7000 of my games (most from ICC). I constructed an opening tree consisting only of positions that I would expect to face as White in 1% or more of my games.

The result is shockingly small: 23 variations, with none longer than 6 moves.

Because I'm in the middle of an online tournament, I don't want to reveal too much of my repertoire, but just breaking down the percentages after 1.e4:

1...e5 44.6% (My mainest of main lines continues past move 5 just under 1% of the time!)
1...c5 16.0%
1...e6 9.0%
1...d5 5.9%
1...c6 4.6%
1...d6 3.3%
1...g6 1.7%
1...Nf6 1.6%
1...b6 1.4%
1...Nc6 1.1%

The data is somewhat skewed. My database of 7000 or so games goes back 12 years or so, and I only counted positions that I would face in my current repertoire. I've been quite faithful to my repertoire over those years, but there are notable exceptions (I used to play the exchange Spanish, and I avoided the open Sicilian with c3 or Bg5 systems for much of that period). In my current repertoire, no Sicilian line has been seen past move 5 more than 1% of the time! I need to see if I can do some sophisticated filtering to separate out the games that agree with my current repertoire.

The data also shows that at lower levels of play, "minor" variations actually become major. For example, in the Ruy Lopez the Steinitz, Cozio and Bird variations are now "major" lines to contend with.

My repertoire database has been getting a bit weedy, so I may start a leaner, meaner repertoire database using just these 23 lines, and making sure that I map each main line out several moves further. I can repeat the exercise for my Black games, although there I expect an even stubbier tree of variations because White gets to vary with the first move. The goal is to declare a chunk of theoretical turf where I know anything outside its boundaries is encountered less than 1% of the time.

10 comments:

Blue Devil Knight said...

This is a great post making key points. It is amazing what the books written by GMs based on top GM games.

Chess publishers need to acquire a much vaster database, one that includes players with ratings that the books are actually written for. For instance, in probably a third of my caro-kann games, white plays 1 e4 c6 2 Nf3?!

This is not treated in any books, except for one weird line that I have literally never seen.

What counts as 'main line' and too many chess book authors either are ignorant of this or (more likely) too lazy to take this into account in their books. It sure makes it easier for them when they only look at what the top GMs play.

Exception: Greet's book on the Ruy Lopez. Everyone should use his book as the model IMO (unless the book is truly written for GMs).

Blue Devil Knight said...

Oops. Sentence fragment. I said:
"It is amazing what the books written by GMs based on top GM games. "

I meant to say:
It is amazing what the books written by GMs based on top GM games leave out.

Lauri said...

I don't see what is dubious about 2.Nf3? Can you explain? 2.Nf3 actually one of the lines I've looked at to play myself. Usually 2..d5 3.exd5 cxd5 4.d4 Nf6 5.c4 Nc6 and we have transposed to Panov Botvinnik Nf3 line.

Lauri said...

I got curious and I checked my database and 1.e4 c6 2.Nf3 has been played by Anand, Short, Timman, Morozevich...

Blue Devil Knight said...

Lauri: My point was that I see this all the time but it is in none of my caro books, so you make my point for me. My ?! annotation was tongue-in-cheek.

And often it doesn't transpose to the PB attack.

Andrew Greet is an exception. He looks at lines people actually play.

Grandpatzer said...

Greet's book is a great example.

I was thinking that if the ICC kept a log of games played and their results that it'd be a goldmine for authoring opening references targeted at different rating tiers. I found in my "1 percent" repertoire that a lot of unworthy moves were main variations. For example, the Steinitz variation of the Spanish is the main line, with a lot of Cozio and Bird lines as well.

Blue Devil Knight said...

GP: yes, I emailed them about this. They save all the games from high rated players, and you can download them (something like help database or some such at ICC), but one game at a time via mouse click! (ridiculous).

If they saved all the games and made them available it would be a freaking gold mine. It's not like pgn files are particularly large or hard to store, and we certainly pay ICC enough money! They could even have a separate server for the database if they are worried about compromising the efficiency of the chess play.

Blue Devil Knight said...

PS I think they even have an opening stats page you can click through, for those higher-rated games they have. It's better than nothing, but if you could go through an opening tree with frequencies, as a function of rating (e.g., enter a rating window filter), it would be a huge benefit to club players.

Blue Devil Knight said...

ICC database link, and opening survey (not searchable or anything).

The thing is, how helpful is it for them to compile something that we can easily find in our database? Much more useful for the vast majority of ICC users would be such stats for normal (i.e., non-titled) players!

OK enough beating of a dead horse. You've inspired me we should think about the best way to convince ICC to do something like this. They make a LOT of money from us every year, so it's not like they can't afford to buy a new server or something (I once calculated it to over a million bucks in subscription fees alone, not including money they make from their checkel system).

Grandpatzer said...

I'll send some inquiries towards ICC as well then :D