Tuesday, June 6, 2017

Stats con



Buy this book!


The current craze known as evidenced-based medicine denigrates widespread clinical experience as a synonym for anecdotal and therefore not really scientificAnecdotal evidence in this sense is an individual practitioner's experience using a certain drug or procedure on a certain patient. 

If the drug seems to work or not work in an individual instance, that does not mean that the drug led to the observed results.  The results could be a placebo effect, due to the tincture of time, a fantasy of the observer, or caused by some other factor besides the drug.  A single clinical anecdote may suggest experiments, but by itself cannot be considered good evidence of the efficacy of the drug or procedure.

Of course, if a lot of different practitioners tend to get the same sort of results in a wide variety of different clinical populations, then the differences between that phenomenon and a single anecdote are markedly obvious.

Still, in the evidence-based medicine craze, the gold standard is a randomized, double blind, placebo controlled clinical trial, or RCT.  That is supposed to be the mark of true science.  But not so fast.

A book called Stats.con by James Penston, which I highly recommend reading, makes the case that, while RCT's are important, they are hardly the end all and be all of science.  They have a lot of issues.  And sometimes widespread clinical experience is far better evidence for the efficacy of a drug or procedure than an RCT. 

Lies, damn lies, and statistics.

Penston points out that surgeons have for decades operated on various organs in the body with the assistance of effective anesthetics and muscle relaxants - "all without a statistician in sight."

In this post I would like to cite several lines from the book that I think are important in evaluating medical "evidence."

"An individual instance may not logically refute a statistical study, but it cannot be dismissed as being irrelevant to the matter."  (p. 4).

"No wonder politicians are rarely fazed by statistical data.  Presented with a study that challenges their position, they simply bring into question the authors or the data." (p. 7).  Because it is very easy to introduce bias into an RCT, and because any study will have weaknesses, they can always be challenged.  And studies often contradict one another.  I notice some of my critics apply their criteria for judging a study far differently for studies they agree with than for those they do not.  You can always cherry pick studies in this manner to prove whatever argument you are making.

"The simplicity of [randomizing patients to active treatment and placebo groups in an RCT] hides the practical difficulties of selecting two groups of patients equally matched in terms of all relevant factors." (p. 13).

"Numerous errors [in technical aspects of a study] may occur that threaten the integrity of the findings yet these are far from easy to detect."  (p. 19). In other words, practitioners reading the study might easily be oblivious to major weaknesses of the study.

The size of a study sample is "inversely proportional to knowledge of the subject matter, the size of the treatment effects, the value of the results to individual patients, and the overall importance of the study." (p. 23).  This has to do with the difference between statistical significance and clinical significance. 

Say that a given drug caused a reduction in the incidence of gallstones by only one percent or in only one percent of patients.  A study may show that this difference is statistically significant, but in terms of its relative value to you when balanced against the cost and/or the side effects, it may be next to worthless.  In order to find such a small statistically significant differences between one drug and another, or between a drug and a placebo, one needs a very large sample.  Significant results in studies with smaller samples tend to be much more dramatic from a clinical standpoint.

"Confounding is present when both the supposed cause and the supposed effect [of a clinical result] are associated with a third factor which is responsible - in whole or in part - for the difference in outcome detected between the groups...no methods are available for correcting for the presence of unknown confounding variables." (p. 30).

"Yet, surprisingly, trials that are being reported as being randomized yield groups with equal numbers of patients more often than would be expected by chance." (p. 63).  This is strong evidence that the randomization process in a lot of studies is being manipulated to influence the results.

In re subjective symptoms - like almost all symptoms seen in psychiatry: "Such changes, even with the assistance of validated scales of symptoms, depend entirely on the account given by each individual patient." (p.69).  There is strong evidence that patients frequently lie to doctors for a variety of reasons.  I will address this issue in more detail in a future post.

On subgroup analysis:  Critics of the literature refer to a process that pharmaceutical companies use called data mining.  If the main result of the study is not to their liking, they will divide the subjects into several different subgroups based on some criteria like gender‎ or age.  The more of these subgroup analyses they do, the more likely they will find a significant result.  Unfortunately, the statistical odds in this situation indicate that the result is most likely just a coincidence:

"[The pitfalls or running analyses on subgroups of subjects in a randomized controlled study was illustrated by one study in which the] ...overall results showed a reduction in mortality from myocardial infarction with aspirin yet subgroup analysis suggested that the drug was of no benefit to those born under the sign of Gemini or Libra." (p. 75).

"Strictly speaking, statistical data apply to groups, not to individual patients.  But clinicians treat individual patients....In every case, we can legitimately ask whether the findings [of an RCT] are applicable to that particular individual." (p. 89).

"Estimates suggest that less than 1% of patients will be recruited to trials. Thus, from this measure alone, it's highly unlikely that participants will be representative of the broader population of patients with the disease." (p.91).

"Patients excluded from [RCT's] tend to have a worse prognosis than those who are recruited." (p. 95) and "Most clinical research is carried out in teaching hospitals by medical staff with both a particular interest in the disease and considerable expertise, supported by nurses with specialist skills and well qualified junior doctors.  Under these circumstances, the standard of care is expected to be ...superior to the average general hospital." (p.96).   [This second point may in many cases not be true in the United States due to the proliferation of Contract Research Organizations or CRO's].  The chances study patients will get better can be much better than for your average Joe, making the results of the study less generalizable to the population of all patients with a disease.

Another big point made by Penston concerns what he calls the relative risk deception.  Let us say that a cancer drug reduces the percentage of recurrences from 5% to 4%.  The authors may then claim they have reduced the chances of recurrences by 1 in 5, or 20%.  That is highly misleading.  This cited rate of risk reduction, as old Albert Einstein might say, is relative.  The absolute risk reduction is only 1% - which may even be within the margin of error of the study - since 95% of the sample would not have had a recurrence regardless of whether or not they took the drug!

And finally:

‎"We may say, for example, that the probability of the clutch [of a certain car] surviving 50,000 miles based on the analysis of a large sample is 0.6, but this hardly applies to the one owned by a driving instructor who, day in and day out, witnesses assaults on the clutch of his car." (p. 120).

No comments:

Post a Comment