General Poker Discussion Poker Forums

Page 3: HUD Statistics: Probability Questions

or track by Email or RSS


pasita

Avatar for pasita

1105 posts
Joined 09/2009

Trying the figure out the worth of trouble -effect here:

If you have a 100 hand (opportunity) sample on VPIP (any stat), how far would the Bayesian CI be from from the statistical CI be (I guess you need some a priori knowledge here, please feel free to pick the ones of your choosing)? Or put another way, how far would the approximations I made in my stat be from the Bayesian equivalent of such a stat?

Posted 10 months ago

sthief09

Avatar for sthief09

2339 posts
Joined 07/2007

if someone is feeling particularly inspired or just bored at work one day, you can have HEM or PT export hands into excel. you can add up VPIP true or false (1 or 0). you can try to break it into 100, 500, 1000, etc. hand samples. calculate VPIP for each 100 (or 500 or 1000) hand block. make a scatter plot for each. if it's a normal distribution, you should be able to calculate the standard deviation. with that, you can compare that to the standard deviation of a binomial distribution over the same sample(s). that would at least give us some sort of idea as to whether this converges to a binomial distribution over time.

I think that would be the correct way to go about it at least. there are obviously some more advanced mathy ppl ITT so maybe you all can refine that process.

Posted 10 months ago

improva

Avatar for improva

3833 posts
Joined 02/2008

I'm seriously considering writing some software to simulate this and try to find out how quick these stats converge.

Out of curiosity have you tried doing this? If you have, was there anything in particular I should be aware of before I write any code?



You need to find a way to pick priors. I simply used my database and calculated the mean for each stat I was interested in. I have not given it a lot of thought and I can be convinced that is not the best approach.

Posted 10 months ago

billrata

Avatar for billrata

126 posts
Joined 01/2011

hayes13

Avatar for hayes13

857 posts
Joined 12/2008

You need to find a way to pick priors. I simply used my database and calculated the mean for each stat I was interested in. I have not given it a lot of thought and I can be convinced that is not the best approach.


Think stheif09 is more on the spot with this. People describe distributions with both a mean and variance.

As a disclaimer I am going into forth year engineering and have done statistics work in a lab all summer. I am also doing a forth year thesis in statistics.

As an aside. Whitelime I believe has a Phd or masters in stats. He simple knows the statistical answer to this.

Ok in full disclosure, completely destroyed right now, vacation starts tomorrow and am hammered from playing/gambling on beer pong.

Now, small sample size can always be used to generate statistics. your trust that they are representative of the underlying population is a judgement call. Norman (http://xa.yimg.com/kq/groups/18751725/1039265037/name/Likert%2Bscales,%2Blevels%2Bof%2Bmeasurement%2Band%2Bthe%2B%25E2%2580%2598%25E2%2580%2598laws%25E2%2580%2599%25E2%2580%2599.pdf)

This paper has a little bit on it. Thought is is specific to ordinal data, you can ignore this.

Secondly, my summer research assistant job involved data generation and testing of linear models.
We are looking into the correctness of human judgement in areas of uncertainty.
It is possible to get NO statistical significant result and still have player improve their performance.
This can happen for a number of reasons.

Firstly: TLDR but I haven't in skimming seen ANY time series analysis. This can be used to help take into account psychological effects. Of course small sample size can also mean run hot.

Further more, Nowhereman defends the position that 80/20 doesn't converge to 15/13 ever because the gap is so large. The stats can't really tell this, it is more about all the players in the game and their mental model.

If anyone wants to learn stats, download R. It is open source and amazing for statistical analysis.

Cheers

Posted 10 months ago

improva

Avatar for improva

3833 posts
Joined 02/2008

Think stheif09 is more on the spot with this. People describe distributions with both a mean and variance.

If anyone wants to learn stats, download R. It is open source and amazing for statistical analysis.

Cheers



I don't think I understand your post. I agree that people should use R.

Posted 10 months ago

threads13

Avatar for threads13

1811 posts
Joined 03/2008

I'm sad that I wasn't invited to this party. Wink This is right up my alley.

I started to do a video on showing how to use Bayes' Theorem to answer precisely these questions about a year ago. I got distracted so I never finished it (live play videos, moving to Vegas, cheeseburgers, beer, sleep, etc).

I already have done a few spreadsheets on it based on asking questions like "ok, he's folded to 3/4 3-bets, what is the probability that this TAG has a high fold to 3bet?" It applies to VPIP just as well. Sounds like that is exactly what you are asking. It's actually not too hard to do once you have a feel for Bayes I have several spreadsheets laying around collecting dust that I've shown to students as it has came up.

I could put the sheets in here, but without context it probably would just look like a maze of numbers. I think it would be best if I just laid it all out in a few shorts. Bayes' Theorem is easy to understand, but I haven't found good explanations on google. It's pretty intuitive. Josh's "Applied Math" series helped me understand it and then I applied it to sort my population samples.

Rob, I emailed you about maybe doing some videos on this stuff.

Posted 10 months ago

BaseMetal

Avatar for BaseMetal

2060 posts
Joined 01/2010

I'm sad that I wasn't invited to this party. Wink This is right up my alley.


You are very cordially invited - don't forget to bring a bottle.
What do you folks think about the priming the data with say equivalent to one 'true' sample length. I think this will, on average, pull the result toward the 'true' mean similar to Bayes but it is really very easy to achieve.

Here is an example of a sample count of a stat that was generated randomly with a 'true' 10% frequency.
{0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,5,5}
this would produce a running standard Hud stat of:
{ 0.0, 50.0, 33.3, 25.0, 20.0, 16.7, 14.3, 12.5, 11.1, 10.0, 9.1, 8.3, 7.7, 7.1, 6.7, 12.5, 11.8, 11.1, 10.5, 10.0, 9.5, 9.1, 8.7, 12.5, 12.0, 11.5, 11.1, 10.7, 10.3, 10.0, 9.7, 9.4, 9.1, 8.8, 8.6, 8.3, 10.8, 10.5, 12.8, 12.5}
If we prime the Hud stat with the equivalent of one sample length of data with one success we get:
{ 9.1, 16.7, 15.4, 14.3, 13.3, 12.5, 11.8, 11.1, 10.5, 10.0, 9.5, 9.1, 8.7, 8.3, 8.0, 11.5, 11.1, 10.7, 10.3, 10.0, 9.7, 9.4, 9.1, 11.8, 11.4, 11.1, 10.8, 10.5, 10.3, 10.0, 9.8, 9.5, 9.3, 9.1, 8.9, 8.7, 10.6, 10.4, 12.2, 12.0}
This looks much better to me and I would prefer this.

In the above the 'player' happened to have the true mean for this stat, if I generate a 'player' with twice this frequency ie, a player with a 20% stat, but I still prime it with the same 'true' population 10%, I get results that look like this:
{1,2,2,2,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,8,8,8,9,10} << about 20% now
this would produce a running standard Hud stat of:
{ 100.0, 100.0, 66.7, 50.0, 60.0, 50.0, 57.1, 50.0, 44.4, 40.0, 36.4, 41.7, 38.5, 35.7, 33.3, 31.3, 35.3, 33.3, 31.6, 30.0, 28.6, 27.3, 26.1, 25.0, 24.0, 26.9, 25.9, 25.0, 24.1, 23.3, 22.6, 21.9, 21.2, 20.6, 20.0, 22.2, 21.6, 21.1, 23.1, 25.0}
If we prime the Hud stat with the equivalent of one success in sample length of the population 'true' data, exactly the same priming as above, we get:
{ 18.2, 25.0, 23.1, 21.4, 26.7, 25.0, 29.4, 27.8, 26.3, 25.0, 23.8, 27.3, 26.1, 25.0, 24.0, 23.1, 25.9, 25.0, 24.1, 23.3, 22.6, 21.9, 21.2, 20.6, 20.0, 22.2, 21.6, 21.1, 20.5, 20.0, 19.5, 19.0, 18.6, 18.2, 17.8, 19.6, 19.1, 18.8, 20.4, 22.0}

This still looks much better to me and I would prefer this.

All you would need to do this is to 'know' the mean of the population (if you use your own db the median may be better), to do Bayes well you would really need to know more about the underlying distrubtion and it would get a lot more complicated.

I haven't seen this done before for poker so can I call it the "BaseMetal Approach" and sell it to PT or HEM Wink. Doh! I suppose DC owns it now I've posted.

Posted 10 months ago

udownwithvpp

Avatar for udownwithvpp

1143 posts
Joined 04/2008




HomePoker ForumsGeneral Poker Discussion → HUD Statistics: Probability Questions