August 06, 2012
In the last couple of blogs I have mentioned Confidence Intervals and at the time I don’t think I properly understood the way to interpret them. When we collect a sample of say 25k cash hands we can calculate the sample mean and the sample stdev of these, usually we batch up the data in blocks of 100 games, so we have 25000/100 = 250 individual measurements in this data.
It seems for a good player at 6 max cash nl scores of 10bb/100 with a 90bb/100 stdev is a reasonably likely and believable result.
I am now fairly confident that once you have a 100+ of these 'per 100 game' results the mean of any 100+ sample will show a very, very nice Gaussian type curve. It worked for very spiked tournament results and here the basic individual case (of 100 games) will be nearing Gaussian shape anyway before collecting 100’s more of these. So if I produce samples taken from a Gaussian distribution with parameters mu = 10 and sigma = 90, this will be a very good simulation of a good and very consistent poker player playing 1000’s of games. This is an idealised case and here we actually do know the ‘true’ winrate of this idealised player, 10bb/100. Pretty much in the blink of an eye I can generate the equivalent of 25000 hands so I’ll do this hundreds of times and each time I’ll produce a CI range the way would if we didn’t actually know the ‘true’ values – like we would have to if using our trackers.
The above is one run for 25000 hands, the sample mean of this run is 11.43bb/100 and the sample stdev = 90.4881, if we used this data we would obtain a Confidence Interval of 1.9829bb/100 to 20.8801bb/100
So I can just repeat the above a hundred times and so produce a nice graph of the 100 CI’s generated. This is easy with a simulation but each bar would be a block of 25000 hands in the real poker world.
(click on any graph to expand)
In this run 92 out of the 100 did capture the ‘true’ mean of 10bb/100 (the horizontal line) within the interval but you can also see how wildly different the sample means of each of these samples are. Each one line on the above graph is the equivalent to what we do when we analyse our block of 25000 hands. It will depend on whether we have been on a heater or downer as to the range we obtain. On a heater we might get a range of 10 to 27 and yet we could have hit a downer and had a range as low as –16 to +5. All we know is that probably the actual ‘true’ line will cross this confidence interval 9 times out of ten (in this case it happened 92 times out of 100). To show this better here is a graph of the above but sorted from low to high, with the individual sample mean marked as the central dot.
So I think you can see when we get a 90% confidence interval of 10bb/100 to 27bb/100 that we should not imply that our ‘true’ winrate 5% of the time is above 27. All we really know is most of the time (90%) the ‘true’ rate is somewhere in this range and this is a subtle difference in meaning.
To get a probability range for our ‘true’ winrate lying behind a sample we have to use a different approach and start trying to calculate confidence intervals of the confidence intervals or do some Bayesian type analysis.
The usual warning stands - I am not a statistician, I have just read a few things and I might be wrong. Please feel free to let me know if I am.
(Credit to where credit is due – when I started to think about how to display this using a maths package I found the perfect examples in a book by Heikke Ruskeepaa, ‘Mathematica Navigator’)