Archive for the 'statistics' Category

The Problem with Evidence-Based Medicine

Sunday, July 11th, 2010

In a recent post I said that med school professors cared about process (doing things a “correct” way) rather than result (doing things in a way that produces the best possible outcomes). Feynman called this sort of thing “cargo-cult science“. The problem is that there is little reason to think the med-school profs’ “correct” way (evidence-based medicine) works better than the “wrong” way it replaced (reliance on clinical experience) and considerable reason to think it isn’t obvious which way is better.

After I wrote the previous post, I came across an example of the thinking I criticized. On bloggingheads.tv, during a conversation between Peter Lipson (a practicing doctor) and Isis The Scientist (a “physiologist at a major research university” who blogs at ScienceBlogs), Isis said this:

I had an experience a couple days ago with a clinician that was very valuable. He said to me, “In my experience this is the phenomenon that we see after this happens.” And I said, “Really? I never thought of that as a possibility but that totally fits in the scheme of my model.” On the one hand I’ve accepted his experience as evidence. On the other hand I’ve totally written it off as bullshit because there isn’t a p value attached to it.

Isis doesn’t understand that this “p value” she wants so much comes with a sensitivity filter attached. It is not neutral. To get it you do extensive calculations. The end result (the p value) is more sensitive to some treatment effects than others in the sense that some treatment effects will generate smaller (better) p values than other treatment effects of the same strength, just as our ears are more sensitive to some frequencies than others.

Our ears are most sensitive around the frequency of voices. They do a good job of detecting what we want to detect. What neither Isis nor any other evidence-based-medicine proponent knows is whether the particular filter they endorse is sensitive to the treatment effects that actually exist. It’s entirely possible and even plausible that the filter that they believe in is insensitive to actual treatment effects. They may be listening at the wrong frequency, in other words. The useful information may be at a different frequency.

The usual statistics (mean, etc.) are most sensitive to treatment effects that change each person in the population by the same amount. They are much less sensitive to treatment effects that change only a small fraction of the population. In contrast, the “clinical judgment” that Isis and other evidence-based-medicine advocates deride is highly sensitive to treatments that change only a small fraction of the population — what some call anecdotal evidence. Evidence-based medicine is presented as science replacing nonsense but in fact it is one filter replacing another.

I suspect that actual treatment effects have a power-law distribution (a few helped a lot, a large fraction helped little or not at all) and that a filter resembling “clinical judgment” does a better job with such distributions. But that remains to be seen. My point here is just that it is an empirical question which filter works best. An empirical question that hasn’t been answered.

Does Lithium Slow ALS?

Friday, July 9th, 2010

In 2008, an article in Proceedings of the National Academy of Sciences (PNAS) reported that lithium had slowed the progression of amyotrophic lateral sclerosis (ALS), which is always fatal. This article describes several attempts to confirm that effect of lithium. Three studies were launched by med school professors. In addition, patients at PatientsLikeMe also organized a test.

One of Nassim Taleb’s complaints about finance professors is their use of VAR (value at risk)  to measure the riskiness of investments. It’s still being taught at business schools, he says. VAR assumes that fluctuations have a certain distribution. The distributions actually assumed turned out to grossly underestimate risk. VAR has helped many finance professionals take risks they shouldn’t have taken. It would have been wise for finance professors to wonder how well VAR does in practice, thereby to judge the plausibility of the assumed distribution. This might seem obvious. Likewise, the response to the PNAS paper revealed two problems that might seem obvious:

1. Unthinking focus on placebo controls. It would have been progress to find anything that slows ALS. Anything includes placebos. Placebos vary. From the standpoint of those with ALS, it would have been better to compare lithium to nothing than to some sort of placebo. As far as I can tell from the article, no med school professor realized this. No doubt someone has said that the world can be divided into people focused on process (on doing things a certain “right” way) and those focused on results (on outcomes). It should horrify all of us that med school professors appear focused on process.

2. Use of standard statistics (e.g., mean) to measure drug effects. I have not seen the ALS studies, but if they are like all other clinical trials I’ve seen, they tested for an effect by comparing means using a parametric test (e.g., a t test). However, effects of treatment are unlikely to have normal distributions nor are likely to be the same for each person. The usual tests are most sensitive when each member of the treatment group improves the same amount and the underlying variation is normally distributed. If 95% of the treatment group is unaffected and 5% show improvement, for example, the usual tests wouldn’t do the best job of noticing this. If medicine A helps 5% of patients, that’s an important improvement over 0%, especially with a fatal disease. And if you take it and it doesn’t help, you stop taking it and look elsewhere. So it would be a good idea to find drugs that only help a fraction of patients, perhaps a small fraction. The usual analyses may have caused drugs that help a small fraction of patients to be considered worthless when they could have been detected.

All the tests of lithium, including the PatientsLikeMe test, turned out negative. The PatientsLikeMe trial didn’t worry about placebo effects, so my point #1 isn’t a problem. However, my point #2 probably applies to all four trials.

Thanks to JR Minkel and Melissa Francis.

Unlikely Data

Thursday, July 8th, 2010

Connoisseurs of scientific fraud may enjoy David Grann’s terrific article about an art authenticator in the current New Yorker and this post about polling irregularities. What are the odds that two such articles would appear at almost the same time?

I suppose I’m an expert, having published several papers about data that was too unlikely. With Saul Sternberg and Kenneth Carpenter, I’ve written about problems with Ranjit Chandra’s work. I also wrote about problems with some learning experiments.

Beijing Street Vendors: What Color Market?

Wednesday, June 9th, 2010

Black market = illegal. Grey market = “the trade of a commodity through distribution channels . . . unofficial, unauthorized, or unintended.”

In the evening, near the Wudaokou subway station in Beijing (where lots of students live), dozens of street vendors sell paperbacks ($1 each), jewelry, dresses, socks, scarves, electronic accessories, fruit, toys, shoes, cooked food, stuffed animals, and many other things. No doubt it’s illegal. When a police car approaches, they pick up and leave. Once I saw a group of policemen confiscate a woman’s goods.

What’s curious is how far vendors move when police approach. Once I saw the vendors on a corner, all 12 of them, each with a cart, move to the middle of the intersection — the middle of traffic — where they clustered. At the time I thought the traffic somehow protected them. Now I think they wanted to move back fast when the police car went away. Tonight, like last night, there’s a police car at that corner, the northeast corner of the intersection. No vendors there. The vendors who’d usually be there were now at the northwest corner. In other words, if a policeman got out of his car and walked across the street, he’d encounter all the vendors that he’d displaced.

Can John Gottman Predict Divorce With Great Accuracy?

Sunday, June 6th, 2010

Andrew Gelman blogged about the research of John Gottman, an emeritus professor at the University of Washington, who claimed to be able to predict whether newlyweds would divorce within 5 years with greater than 90% accuracy. These predictions were based on brief interviews near the time of marriage. Andrew agreed with another critic who said these claims were overstated. He modified Gottman’s Wikipedia page to reflect those criticisms. Andrew’s modifications were removed by someone who works for the Gottman Institute.

Were the criticisms right or wrong? The person who removed reference to them in Wikipedia referred to a FAQ page on the Gottman Institute site. Supposedly they’d been answered there. The criticism is that the “predictions” weren’t predictions: they were descriptions of how closely a model fitted after the data were collected could fit the data. If the model were complicated enough (had enough adjustable parameters), it could fit the data perfectly, but that would be no support for the model — and not “100% accurate prediction” as most people understand it.

The FAQ page says this:

Six of the seven studies have been predictive—each began with a hypothesis about factors leading to divorce. [I think the meaning is this: The first study figured out how to predict. The later six tested that method.] Based on these factors, Dr. Gottman predicted who would divorce, then followed the couples for a pre-determined length of time. Finally, he drew conclusions about the accuracy of his predictions. . . . This is true prediction.

This is changing the subject. The question is not whether Gottman’s research is any help at all, which is the question answered here; the question is whether he can predict at extremely high levels (> 90% accuracy), as claimed. Do the later six studies provide reasonable estimates of prediction accuracy? Presumably the latest ones are better than the earlier ones. The latest one (2002) was obviously not about accurate prediction estimates (its title used the term “exploratory”) so I looked at the next newest, published in 2000. Here’s what its abstract says:

A longitudinal study with 95 newlywed couples examined the power of the Oral History Interview to predict stable marital relationships and divorce. A principal components analysis of the interview with the couples (Time 1) identified a latent variable, perceived marital bond, that was significant in predicting which couples would remain married or divorce within the first 5 years of their marriage. A discriminant function analysis of the newlywed oral history data predicted, with 87.4% accuracy, those couples whose marriages remained intact or broke up at the Time 2 data collection point.

The critics were right. To say a discriminant function “predicted” something is to mislead those who don’t know what a discriminant function is. They don’t predict, they fit a model to data, after the fact. To call this “true prediction” is false.

To me, the “87.4%” suggests something seriously off. It is too precise; I would have written “about 90%”. It is as if you asked someone their age and they said they were “24.37 years old.”

Speaking of overstating your results, reporting bias in medical research. Thanks to Anne Weiss.

viagra stopped working
Viagra Sale
cheap free free viagra viagra