Tuesday, November 27, 2012

Murder, Single Mothers and Data

So there's an article in The Atlantic today pretty much debunking the myth that a rise in children of single mothers caused the a rise in murder rates and other violent crimes around the 80's.  This same idea was famously propounded by Steven Levitt of Freakonomics fame, in which he argued the drop in crime rates in the 90's was due to more abortions of unwanted children (debuked in part here).

Now, its not only problematic that both these arguments were wrong, but that there was very good data explaining the real cause of the spike and subsequent drop in crime rates.  Look at this chart of single mothers and crime rates v time.


Now look at this chart.

You can see the crime rate and the lead usage rate are very similar in shape, just time shifted by about 20 years.  Lead usage peaks about 1975, while crime rates peak about 1993.  In other words, you can see a correlation between infant lead exposure and violent crime rates when those children reach their late teens and 20's.  Now, this could of course be coincidental.  Except you can see almost exactly the same correlation in every country which instituted lead abatement around the world (at different times), basically filtering out any other confounding factors.  See http://www.washingtonpost.com/wp-dyn/content/article/2007/07/07/AR2007070701073.html , http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050101 and http://www.bridges4kids.org/news/10-02/Sun5-9-02.html .

The effects of lead on brain development and its links to lower IQ, antisocial behavior and impulse control are well documented and the link to criminal behavior has been known for well over 2 decades.  This makes it particularly embarrassing for Levitt, who published his abortion research in 2001.  He admits now that his statistical methods were flawed, but not that he basically ignored a much more robust data set and theory which pointed to lead as the key factor.  Though clearly Levitt has done some good work, I've found his popular writing to be driven by sensationalism, ego and money.  "Freakonomics" is a brand more than any kind of science.

But the real message here is if we want to decrease crime rates and increase academic achievement continued effort into urban lead abatement is probably one of the most effective ways.  And also, a clever narrative explanations can be very convincing and quickly become conventional wisdom accepted by all the Very Serious People while still being completely wrong.

Wednesday, November 7, 2012

Nate (and of course I) was right

I'm not going to crow about Obama winning (I really do just want the country to come together and the political parties to reach some compromises) but I will crow about Nate Silver's PERFECT election predictions and call out all the idiots who attacked him and his methodology.  This is a win for science and reason over mindless punditry and sophistry. Nate's model called all 50 states and the popular vote with, shocker, polling data and math.   So fuck you unskewed polls guy, fuck you joe scarborough, fuck you George Will and fuck you all the morons who reject science and reason in favor of "their gut".  Sadly, Scarborough and the rest won't pay any price for their idiocy.  They'll just keep babbling and keep getting paid.

Tuesday, November 6, 2012

Polls, Statistic and Nate Silver

There are so many stupid, or at best ignorant things being said about statistical analysis of polls in this election, and their use in predicting the outcome, I thought I'd make a quick post about a few mistakes.

1) Don't claim analysts using statistical analysis are gaming the polls if you don't understand how statistics work (and I mean really understand and have taken multiple college and grad school level courses at least in stats analysis.  Your high school math class where you learned about rolling dice does not count).  Maybe someone is gaming things, but if you don't understand enough to comment then don't waste people's time talking about it.

2) If you do know stats, or even if you don't, Nate and others explain the details of their methodology so anyone can reproduce it.  You can disagree with some details of his choices, but basically its all just straight math.  The details are here http://fivethirtyeight.blogs.nytimes.com/methodology/ and here http://www.fivethirtyeight.com/2008/03/frequently-asked-questions-last-revised.html .

3) I hear a lot of idiot pundits saying thing like "nate's calling a blowout of 93% when the polls show everything is super close".  Well, pundit, you are stupid.  Polls can be very close, but if they are also very accurate (which they should be when you have many well deigned polls and combine their results) then you can still call the race as 93% likely to go Obama.  If you're extremely confident (say 93% confident) that Obama will win Ohio, then it doesn't matter if he wins 51-49 or 60-40.  He still wins.  This also doesn't mean 93% odds of an Obama win means a blowout in terms of actual vote count or electoral vote count.  In fact, the nature the statistical model means that the sum of the probabilty of every possible way Obama could win (including 270-268) add up to his total win probability.

4) Obama's high probability of winning does not mean its guaranteed, nor does a Romney win mean Nate was "wrong".  Again, the nature of statistics means that unlikely things can and will occur.  If you flip a coin 1000 times, you will probably get a long run of heads or tails somewhere along the line.  Its unlikely to get 10 heads in a row in a truly random flip, but IT WILL HAPPEN with enough trials.  If you're heard of "the long tail" or black swan events, this is kind of what that means.  Extremely unlikely events do occur and it does not mean statistic are wrong, it means they are correct.

5) If Nate and the other stats guys predictions are VERY far off, then it once again doesn't mean their methods are incorrect or they are "wrong".  The statistical methodology they use is standard math and proven correct when the data is good.  So if their predictions are very far off if likely means the polling itself is methodologically flawed either in key swing states or across the board.  This may seem like a cop out to say "even if we're wrong we're right", but if you understand the math, you know its true. Ask a math nerd you know.  Or ask 100 and you'll get a more likely to be correct answer (FYI, that's actually not true statistically since the data set would not be random statistical analysis doesn't apply.  They'll actually probably just all agree).