Tuesday, November 6, 2012

Polls, Statistic and Nate Silver

There are so many stupid, or at best ignorant things being said about statistical analysis of polls in this election, and their use in predicting the outcome, I thought I'd make a quick post about a few mistakes.

1) Don't claim analysts using statistical analysis are gaming the polls if you don't understand how statistics work (and I mean really understand and have taken multiple college and grad school level courses at least in stats analysis.  Your high school math class where you learned about rolling dice does not count).  Maybe someone is gaming things, but if you don't understand enough to comment then don't waste people's time talking about it.

2) If you do know stats, or even if you don't, Nate and others explain the details of their methodology so anyone can reproduce it.  You can disagree with some details of his choices, but basically its all just straight math.  The details are here http://fivethirtyeight.blogs.nytimes.com/methodology/ and here http://www.fivethirtyeight.com/2008/03/frequently-asked-questions-last-revised.html .

3) I hear a lot of idiot pundits saying thing like "nate's calling a blowout of 93% when the polls show everything is super close".  Well, pundit, you are stupid.  Polls can be very close, but if they are also very accurate (which they should be when you have many well deigned polls and combine their results) then you can still call the race as 93% likely to go Obama.  If you're extremely confident (say 93% confident) that Obama will win Ohio, then it doesn't matter if he wins 51-49 or 60-40.  He still wins.  This also doesn't mean 93% odds of an Obama win means a blowout in terms of actual vote count or electoral vote count.  In fact, the nature the statistical model means that the sum of the probabilty of every possible way Obama could win (including 270-268) add up to his total win probability.

4) Obama's high probability of winning does not mean its guaranteed, nor does a Romney win mean Nate was "wrong".  Again, the nature of statistics means that unlikely things can and will occur.  If you flip a coin 1000 times, you will probably get a long run of heads or tails somewhere along the line.  Its unlikely to get 10 heads in a row in a truly random flip, but IT WILL HAPPEN with enough trials.  If you're heard of "the long tail" or black swan events, this is kind of what that means.  Extremely unlikely events do occur and it does not mean statistic are wrong, it means they are correct.

5) If Nate and the other stats guys predictions are VERY far off, then it once again doesn't mean their methods are incorrect or they are "wrong".  The statistical methodology they use is standard math and proven correct when the data is good.  So if their predictions are very far off if likely means the polling itself is methodologically flawed either in key swing states or across the board.  This may seem like a cop out to say "even if we're wrong we're right", but if you understand the math, you know its true. Ask a math nerd you know.  Or ask 100 and you'll get a more likely to be correct answer (FYI, that's actually not true statistically since the data set would not be random statistical analysis doesn't apply.  They'll actually probably just all agree).

No comments:

Post a Comment