Protocol and Interpretation Issues and IBD Trials (Scientific Cooking Part 2)
The last post dealt with some of the statistical issues that can be associated with IBD clinical trials. There are not only statistical issues, however, that can occur – protocol errors can produce misleading results, study design can be poor, and the analysis of the results can be taken to the extreme. As with the statistical anomalies, these flaws can be either intentional or inadvertent. Sometimes the error is not with the protocol itself, but with drawing inappropriately broad or unjustified conclusions from a study. This post looks at a few common things you can use to evaluate research studies.
Small Sample Size
Though this is also a statistical problem, small sample size studies are always suspect. Generally, smaller sample size studies are used to do a pilot proof-of-concept that a particular treatment can be effective. Unfortunately, they are not good at doing even that. Let’s assume a new drug cures 20% of the IBD population, and has no effect on the remaining 80%. If we do a study in 5 individuals, there is a 1 in 3 chance that none of them get better (which would be a false positive). Flipping it around, if 1 in 5 people enter spontaneous remission in a 1 year period, taking any drug would result in a 2 in 3 chance that at least one study participant shows improvement. Any study that states that there is an improvement that cannot be shown to be statistically significant is really saying that the data is too poor to draw any conclusions.*
No Control Group
Double blind studies, but definition, have a control group. Frequently, purveyors of less than rigorous science will do a study without a control group. This is sometimes used to tease out environmental factors in retrospective studies, but has no place in prospective studies.** In any prospective study, however, all it provides is a way for the researchers to claim results that are unbalanced. If there is no control, researchers can cite any individuals in the study that improve as being attributed to the treatment they are proposing. In reality, with a large enough group, some individuals will enter remission from a flare with no treatment at all. Others will have clinical improvement on their CDAI or other testing, many times as a small peak in an otherwise downward trend (why we all have “good days and bad days”). A large cohort with infrequent testing is likely to turn up some number of individuals who show a positive movement – without a control group, there is no way to know if that improvement is due to the drug, is due to external factors, or is the same improvement that could be expected from a group of individuals with no treatment at all.
The Wrong Control Group
Just like having no control group, having the wrong control group may unfairly bias the view of a new treatment. In the United States, one of the things that the Food and Drug Administration looks at when evaluating a new therapy is the efficacy compared to existing therapies. Generally, a new drug will only be approved if:
· It shows a higher degree of efficacy than existing treatments or;
· It has lesser efficacy (but still clinically significant) but greatly reduced side effects or;
· It treats a segment of the population for which the extant treatments do not cover (a particular genetic subset of individuals, a particular age group, pregnant mothers, etc.)
· The efficacy would is the same as existing treatments, but other factors (like cost and availability) make it advantageous to the public.
Because of the FDA’s scrutiny, new treatments are generally tested against a comparable, existing treatment. They may still have a small effect when tested against a group of untreated individuals, but it may be orders of magnitude less than current regimes or have greater side effects. If only tested against an untreated control group, the new treatment may look very promising, but in reality be a much lesser option than the current options.
Cherry Picking Evidence
This is generally done in retrospective studies where an individual starts out with a premise that XYZ causes Ulcerative Colitis or Crohn’s Disease. They then look back through epidemiological data and cherry pick specific instances where the data matches their assertion, discarding those places where it doesn’t.
Another place this crops up is with individuals looking to cite the efficacy of a particular treatment. This is especially troubling when researchers look at a large number of studies and, instead of looking at the commonality in the higher quality ones, pick the worst controlled studies that show what they are looking for. This is very common practice for purveyors of pseudoscience – though 100 studies might show that their method is ineffective, they highlight the single, biased study that has never been able to be replicated but shows the results they were hoping for.
Mistaking Causation v. Correlation
Another flaw of epidemiological studies is mistaking correlation for causation. If a researcher chooses enough variables, eventually there will be a statistical correlation between two pairs (the birthday paradox makes this more likely than it would appear at first glance). Just because two trends are correlated, however, doesn’t mean there is a causal relationship. Unfortunately, we are all quick to ascribe causation. Read any IBD message board and search for threads that start as “XYZ caused my Ulcerative Colitis”. You’ll find everything from sugar to antibiotics to mercury to MSG blamed. In many of the cases, the individuals are recalling an event that they have associated proximally with the onset of their diagnosis – “I had just been given doxycycline for an infection, and a week later my stomach started hurting”. This is a poor application of logic, and researchers are not immune to this way of thinking. Two variables, A and B that are strongly, correlated could mean:
· A causes B
· B causes A
· A and B are both caused by an unknown variable, C
A simple example of correlation not being equal to causation would be looking at diarrhea. Individuals with IBD have a strong correlation between the diagnosis of their disease and diarrhea. That does not mean that diarrhea causes IBD, but that same logic is applied to epidemiological data.
An example frequently cited with IBD is the correlation between a Western diet with Crohn’s disease is high, but that does not mean that a Western diet causes Crohn’s (there are also counter examples where the correlation does not hold). People with Crohn’s might be attracted to a Western diet unknowingly. Western diets may be eaten by individuals with better access to diagnostics. Environmental factors associated with Western diets may affect Crohn’s. Individuals who do not eat a Western diet may have higher infant mortality in those who would later develop Crohn’s. Teasing out causation in non-trivial, and implying that the correlation is causative is disingenuous.
Applicability of Animal Models
This tends to be a result of the media vice researchers, but is related to how researchers present their findings. A preliminary study in rats may show mucosal healing due to the ingestion of some new wonder drug. Even more tenuous, cells removed from the intestines may show changes when a new drug is introduced to them. Inevitably, the headlines read “Cure for Ulcerative Colitis in the Works”.
In reality, most treatments that show some efficacy in animals never matriculate as feasible options for humans. The reasons are numerous, ranging from differences in animal anatomy (even with our closest analogues) to bioavailability to toxicity (to quote XKCD, a handgun kills cancer in a petri dish also).(1)
All of the above are just a few of the ways that good researchers can go off track. There are a myriad of other things that can go wrong in research, and viewing any new press release with a critical eye will help the interested reader spot poor assumptions, statistical misuse, or outright fraud.
· Even honest researchers can fall victim to their desire to have a successful treatment by creating poor studies that show what they expect, then overstating the results.
* One place that small samples are sometimes used effectively is not for efficacy trials but for toxicity trials. An investigational new drug is given to healthy individuals in a small sample (at first) to see if there are side effects. The sample size is increased with later rounds of testing so that the negative effect (if present) impacts the fewest individuals.
** The possible exception is with end-stage terminal patients, where withholding a drug that will likely cure them may be considered unethical.