Archive for the 'data analysis' Category

Unfortunate Obituaries: The Case of David Freedman

Tuesday, December 2nd, 2008

One of my colleagues at Berkeley didn’t return library books. He kept them in his office, as if he owned them. He didn’t pay bills, either: He stuck them in his desk drawer. He was smart and interesting but after he failed to show up at lunch date — no explanation, no apology — I stopped having lunch with him. He died several years ago. At his memorial service, at the Berkeley Faculty Club, one of the speakers mentioned his non-return of library books and non-payment of bills as if they were amusing eccentricities! I’m sure they were signs of a bigger problem. He did no research, no scholarly work of any sort. When talking about science with him — a Berkeley professor in a science department — it was like talking to a non-scientist.

David Freedman, a Berkeley statistics professor who died recently, was more influential. He is best known for a popular introductory textbook. The work of his I found most interesting was his comments on census adjustment: He was against adjusting the census to remove bias caused by undercount. This was only slightly less ridiculous than not returning library books — and far more harmful, because his arguments were used by Republicans to block census adjustment. The  undercounted tended to vote Democrat. The similarity with my delinquent colleague is the very first line in Freedman’s obituary: He “fought for three decades to keep the United States census on a firm statistical foundation.” Please. A Berkeley statistics professor, I have no idea who, must have written or approved that statement!

The obituary elaborates on this supposed contribution:

“The census turns out to be remarkably good, despite the generally bad press reviews,” Freedman and Wachter wrote in a 2001 paper published in the journal Society. “Statistical adjustment is unlikely to improve the accuracy, because adjustment can easily put in more error than it takes out.”

There are two kinds of error: variance and bias. The adjustment would surely increase variance and almost surely decrease bias. The quoted comments ignore this. They are a modern Let Them Eat Cake.

Few people hoard library books, but Freedman’s misbehavior is common. I blogged earlier about a blue-ribbon nutrition committee that ignored evidence that didn’t come from a double-blind trial. Late in his career, Freedman spent a great deal of time criticizing other people’s work. Maybe his critiques did some good but I thought they were obvious (the assumptions of the statistical method weren’t clearly satisfied — who knew?) and that it was lazy the way he would merely show that the criticized work (e.g., earthquake prediction) fell short of perfection and fail to show how it related to other work in its field — whether it was an improvement or not. As they say, he could see the cost of everything and the value of nothing. That he felt comfortable spending most of his time doing this, and his obituary would praise it (”the skeptical conscience of statistics”), says something highly unflattering about modern scientific culture.

For reasonable comments about census adjustment, see Eriksen, Eugene P., Kadane, Joseph B., and Tukey, John W. (1989). Adjusting the 1980 census of population and housing. JASA, 84, 927-943.

Is Your Milk Safe? A Statistical Fable

Monday, October 20th, 2008

This recently happened in a class at the Beijing Language and Culture University:

TEACHER Your milk is safe if you buy it at a supermarket.

STUDENT What do you mean, “supermarket”? Where else could you buy it?

TEACHER That’s a good question, I don’t know the answer. They told us to say that.

When analyzing their data, a vast number of scientists more or less blindly do what a statistics book told them to do, just as this teacher said what she’d been told to say. Even worse, a vast number of statistics textbook writers simply copy other textbooks (not word for word, just the ideas and recommendations). The scientists and the textbook writers take refuge in false certainty. They fail to grasp that although the recommendations are black and white, the world is not — just as it isn’t black and white what milk is safe. Unlike this particular classroom, no one questions this.

Thanks to Sally McGregor.

How to Spot Incompetence

Tuesday, October 14th, 2008

Nassim Taleb says, “When someone says he’s busy, he means that he’s incompetent.” I think he also distrusts anyone wearing a tie. In college, I wrote an essay called “The Scientific _______” in which I argued that any writer who uses the term scientific without explaining what it means is incompetent and you should stop reading immediately.

I still believe that. Now, for the first time, I am going to update my list of incompetence giveaways: Plotting something on a raw scale that should be on a log scale. Size-versus-time data should usually have the size axis on a log scale.

This presentation by someone at Sequoia Capital, the Silicon Valley venture capital firm, is full of examples. The Dow Jones Industrial Average (from the 1960s to now) is on a raw scale (where the distance from 5 to 10 equals the distance from 10 to 15), should be on a log scale (where the distance from 5 to 10 equals the distance from 10 to 20). Same for an index of housing prices. Same for the Nikkei. Many other examples. You can still believe the data, of course; just don’t trust what’s concluded from the data. Given the ubiquity of this practice (plotting on a raw scale what should be on a log scale), especially among financial supposed-experts, Taleb and I are not far apart.

More Taleb makes a similar point in his online notebook. Writing about a debate with Charles Murray:

Finally I showed a graph of the rise of the US stock market since 1900, on a regular (non-log) plot. Without logarithmic scaling we see a huge move in the period after1982 –the bulk of the variation comes from that segment, which dwarfs the previous rises. It resembles Murray’s graph about the timeline of the quantitative contributions of civilization, which exhibits a marked jump in 1500. Geometric (i.e. multiplicative) growth overestimates the contribution of the ending portion of a graph.