Why Testimonials Aren’t Enough

The business of “natural health” rests heavily on the use of testimonials. They are used in advertisements by people selling therapeutic products and services, and you’ll hear them as anecdotes from people that you know telling you what worked for them. Intuitively, it makes sense to trust in this sort of experience, but unfortunately testimonials and personal experience are not good ways of evaluating a treatment option.

I don’t expect you to take my word for this. Maybe you were told by a doctor that you’d need an operation, then you had reiki therapy and after that your doctor said the problem was no longer there. Perhaps your first child had terrible teething troubles, but on your second child you used a Baltic amber teething necklace and they didn’t have the same problems, but you swear if you forget to put it on them they become agitated. Or maybe you’ve been spraying a colloidal silver solution onto the back of your throat whenever you feel a cold coming on and you haven’t been sick in years. Who am I to doubt or deny your experience?

These are all testimonials that I have heard personally, not from advertisements but from individual people relating their own experiences to me. But still, I remain unconvinced that reiki is any more than an exotic twist on faith healing (that is just as ineffective), that Baltic amber teething necklaces are anything but expensive yet inert jewellery, and that colloidal silver is much good for anything other than causing argyria.

In this series of blog posts, I intend to explain to you why I don’t consider anecdotes like these to be useful in drawing any conclusions about therapeutic interventions. But first, I’d like to point out that I am not trying to be dismissive of personal experience. I don’t think anecdotes are all lies, or anything of that nature, and personal experience can certainly be useful in drawing all sorts of conclusions in everyday life. The only conclusion I am arguing for here is that anecdotes are not useful for evaluating the efficacy of therapeutic interventions.


In searching for any truth, we have to be very careful not to jump to conclusions. There will always be a vast number of potential explanations for any observation, and if we really care about the truth then we can’t just pick the explanation that we like the most, or even the one that we think is most likely. Some possible explanations can be ruled out right from the start, if they’re impossible to test, but the explanations that can be tested are known as hypotheses. If we want to determine whether or not one particular hypothesis is correct, we should design and carry out a test that will rule out every other potential cause of our observation.

Note that this method of testing does not prove anything. Instead, it focuses on ruling out everything else, until only one idea is left standing. The key to designing a good test of an intervention is to make sure anything you observe is as unlikely as possible to be due to anything other than the intervention. This means that, in order to design a good test of an intervention, it is important to have a good understanding of what these other potential causes are.


After This, Therefore Because of This

There’s a formal logical fallacy that’s usually known by its latin name post hoc ergo propter hoc, which translates to “After this, therefore because of this”. The fallacy is of the form:

  1. A happened, then B happened
  2. Therefore A caused B

Of course, the reason why this is a logical fallacy is that it’s entirely possible that something other than A was the cause of B. This doesn’t mean that the conclusion is false, but it does mean that it is not necessarily true.

Anecdotes take the same form as the above example: “I tried treatment X and I got better”. Although experiences like this can result in strong beliefs, the fact that the improvement happened after the treatment does not mean the treatment necessarily helped at all. Instead, the improvement could have been due to a few different things.

Self-Limiting Conditions

Many common health conditions are self-limiting. This means that, left to their own devices, they will almost always go away in time. The common cold is an example of a self-limiting illness. Unless you are seriously immunocompromised, if you catch a cold you will be fine again after a few days. This includes things like the flu, teething, colic, and acne. Pretty much everything that isn’t a chronic illness and won’t kill you is self-limiting.

Regression to the Mean

Even when nothing external seems to be changing, your health is not constant. Instead, it fluctuates over time around a baseline level of health that itself changes over longer amounts of time. This baseline is basically your average health over a certain period of time; the mean. The tendency for your wellbeing to return to this mean after a fluctuation is known as regression to the mean.

This is a picture of 300 random data points generated in Microsoft Excel. Starting with 0, I added a random number between -0.5 and 0.5 to the running total 310 times, and then took a 10 point running average to smooth the resulting curve.

Regression to the mean

As you can see, even though the changes are all random, trends do form and the data oscillate around a particular mean. Especially over longer periods of time, the data will tend to return to that mean.

I’ve indicated the 2 most prominent downward trends with arrows. As you might imagine, such low points in a person’s health could motivate a person to take a therapeutic intervention in order to reverse this trend. After the intervention, they’ll likely start to feel better, but as you can see by this graph such variations can happen randomly, and it can be very hard to say whether an improvement was caused by something in particular or if it was just the result of regression to the mean.

For example, I get frequent headaches. However, the frequency and intensity of those headaches varies from day to day, just due to random chance. I’d be more likely to decide to seek a therapeutic intervention on a particularly bad day. However, considering that my wellbeing is fluctuating around a mean value I’d expect my headaches to return to their “normal” level, unless of course something has changed to make them worse on average. If I take an intervention and then the next day my headaches are better, how can I know whether it’s due to the intervention or regression to the mean?

Spontaneous Remission

Even with illnesses that are not self-limiting, spontaneous remission that has no obvious cause is something that does happen occasionally. I’m not familiar with the data on this, so I won’t go into it in too much depth, but it is worth knowing that even some serious illnesses can get better on their own, so even some sudden recoveries from serious illnesses can happen on their own, whether an intervention has recently been used or not.


As you may have noticed, these things all have a common theme. They describe ways in which health can improve on its own, which make it difficult to tell whether a particular improvement is due to an intervention or if it would have happened anyway. Ideally, in order to tell the difference, we’d travel back in time in order to try without the intervention and see what would have happened in that case, but unfortunately that’s not an option. The next best method is to have what is known as a control that has the same problem but doesn’t get the treatment.

However, as I discussed earlier, health fluctuates on its own. If the person receiving the intervention improves and the person acting as the control stays the same or gets worse, we still can’t be too sure that the intervention was helping. Variations between different people can make outcomes difficult to interpret as well. Like how random fluctuations will tend to return to the mean over longer periods of time, testing more people will smooth over these random variations. The more people we include in both the treatment group and the control group, the better, as having more observations will help us to tell whether any effect we observe is due to random variation or due to the intervention itself.

Having a control group and a large sample size are 2 aspects of a good test of a therapeutic intervention, but that’s not all there is to it. In my next post, I’ll discuss some other potential confounding factors, and how we can modify our test in order to account for them.

Advertisements