Hi readers! This is my second promised, the devil is in the detail, post focusing on MND/ALS research.
Hot on the heels of placebo ethics, I now take on the highly complementary subject of statistics: how they are presented, their meaning and finally how they can/can’t be used in research. This is a big post, and as it’s holiday time, you can take your time reading and get real vacation value! Make sure you pack my read along with your bucket and spade!
Before I get started though, what exactly are statistics? Statistics are often thought to be facts. They are not. They are, strictly speaking, an interpretation of factual information.
So this post is going to discuss:
Interpreting an interpretation of facts!
Jeesh! My job just got harder!
Statistics are obviously important to me, with my blog being titled on the lifetime risk statistic that about 1 in 300 of all of us will develop MND. It is a powerful figure that brings home the urgency of finding a therapy.
Take a read of my Facts page to get a bit more about this use of a most appropriate statistic. And that is lesson 1, appropriate statistics are vital!
Hopefully after today’s ‘gripping’ read you will feel the burning desire to dig deeper into media headlines, and, perhaps, even on subjects beyond disease research! For those of us with MND this précis might help you assess any ongoing drug trial results and what they really might mean to our community.
So what do the numbers often quoted as statistics mean and are they always what they seem?
To get started, let’s take a gander and examination of my post’s title!
This little unassuming and familiar icon appears to inform us that there is a 50%, 50/50 or a 1 in 2 chance that the dreaded wet stuff will fall upon us from the sky today. Or does it?
This at first sight is a totally useless piece of information. It appears to imply it will either rain or not? Weather forecasters get paid for this remarkable piece of earth shattering wisdom, readers!!!
I am, of course, being a bit pedantic, but let’s examine this example further to see the true value, but also the frailty, of numbers.
What the reader might not know is that the percentage displayed is actually more typically a prediction of precipitation conditions developing over a given area and not if it will rain. Yes it’s a technical difference, but critical. Without going into detail it means that the actual chance of rain is probably less than indicated! Those cunning meteorologists!
But let’s put this slight complication aside for the moment and examine how the meaning might alter as the value changes. If there was a 51% chance of precipitation, for example, that would mean something potentially more positive. It would indicate that, based on available data, it is more likely to lead to conditions that might lead to rain. Not much more likely, only just. I wouldn’t bet my house on it! A 95% chance would mean it was almost certainly going to rain, but still not definitely. If it didn’t rain, we would treat that as a bit odd, but understand it was just pure chance! If, however, it happened every day for a week, I suspect you, me and the dogs would get a bit suspicious of the data/and/or the analysis!
What my silly little example is trying to say is that statistics often have their specific context that is not obvious. We, the public, might potentially assume the meaning of the stat based on our pre-conceptions. At extreme values, such nuances become less important, but in the marginal cases, the detail is vital if the statistic is to be correctly interpreted.
I don’t mention the magic 95% chance figure accidentally by the way. This is the, almost, arbitrary figure that is commonly used to indicate significance. You will see it quoted a lot, and often it is used to validate trials. However, if it was only 90%, and it was a repeated experiment then we might still conclude there was some effect. So we have to beware of rejecting things too easily as well as trying to prove.
“My goodness just how do we do this?”
Well fortunately, big minds over the last 300 years or so, have contributed to what is known as the scientific method. In this post I will touch upon on some of the key tenants of this ever evolving process.
Now on with the post! and I don’t think it is raining yet folks!
How are statistics used in medical drug trials?
Statistics are calculated to prove, or more precisely lend weight to, a drug’s safety and efficacy. Typically you might hear that the improvement with drug x was 20% better than drug y and that there was a 95% chance it was not by chance. But do note, there is still the 5% possibility that the results were just a pure fluke, ie that random event.
Reducing this fluke case is a key principle of science and something researchers are constantly focused on.
How can the chance of a fluke be reduced?
There are two key approaches:
1) the increasing of what is known as the power of data.
2) the repetition of key results.
I will discuss repetition later, but first, what is this magical power you speak of Lee?
Typically, power is raised by increasing the number of patients/subjects included in a trial. Size really does matter folks!!
When I wrote about the beauty of mathematics in an old blog, Saturn’s Rings and MND, I rattled on for a whole article how numbers are central to the universe and a truly wonderful endeavour. It is actually possible, to accurately estimate the minimum necessary size of trials based on various criteria and the effect you are trying to detect/measure. The number of participants, believe it or not, is not simply conjured out of thin air! Of course, the finer details are beyond this short blog, but I include this interesting reference from the TRICALS team if you want to read more about the very specific requirements for sample sizes in MND trials. You might wish to take a look at the online tool at reactive.tricals.org, referenced from this study, that really highlights the issues with identifying ‘effects’ in ALS/MND therapies and why trial sample sizes are higher than in other diseases.
Let me elucidate with a totally unrelated and very simple example (with no hidden complexities) to show you just how important sample size is and what you are trying to measure or conclude with it.
Imagine a common-a-garden dice, with its six sides. Let’s then imagine a new varnish for dice, and each batch of dice needs to be tested to check whether the new coating has affected the randomness of a roll. The following question will spring to mind….
What is the smallest number of rolls of each dice required to ensure with 95% statistical significance (only 5% chance) that each dice is unaffected by my new super dooper varnish?
Obviously 4 rolls wouldn’t be enough as there is no chance for all six numbers to appear. How about 12 times? Hummm… How about 10,000? Yes that would be enough, but it’s certainly not practical and cost efficient for my team to do before packaging the dice!
Using mathematics we can find the most efficient number to bring the desired conclusion. The answer is about 52.
There are several other factors about this example that make it not only a very simple test to carry out but make you highly confident of the results and the predicted sample size needed. We know the absolute range of possible results, ie a number of 1 to 6, there are no partial numbers, and it is very easy to measure, ie the number is the number that is displayed visibly on the upturned dice.
Now let’s turn to our disease! This is when it starts to get tricky.
We have no biological measure of the disease yet, unlike cancers or HIV, and no effective treatment. It is a disease which, although typically brutal in its progression, has enormous variability, with periods of plateaus that range from days to years!
As a trial designer, this is an absolute nightmare scenario.
- How many participants do I need to show any discernible effect of a drug?
- What exactly is that discernible effect?
- How do I select the participants?
- How long should I run the tests for considering the difficulty in measuring the disease?
- When and how often should I take measurements?
- What actually is a significant improvement?
- How can I remove bias?
- What do I actually measure?
- How do I present the results?
Not so easy is it now?
But it is actually feasible, although extremely complicated, again with mathematics, making assumptions for some elements of the data, and taking into account the horrendous variability to make good estimates of trial sizes, placebo cases, length of trial and specific outcomes.
Based on current technology, disease understanding and accepted and proven methodologies for MND Research, the following is roughly the requirement:
A trial to prove efficacy (A phase 2 or 3 as it is known) of a typical drug/treatment for MND requires about 200 participants and to run for about 18 months.
This is difficult to hear if you have MND, but an unfortunate reality. Typically an MND trial will currently require more participants than a disease with more precise measurements. Anything below this can lead to very misleading or false conclusions, unless the drug was a miracle drug. As discussed in my previous post most drugs are not such wonderous items. It would be rather like claiming proof of a dice’s reliability with just 4 throws!
Now that this doesn’t mean it will always be this way, but at this current time and moment in history, it is. Researchers and advocates are promoting and actively researching new methods. It is VITAL to be aware that these will not happen overnight, and of course in the meantime we must keep on testing. I often observe angry people on social media lambasting authorities, researchers and others to introduce these methods sooner. I urge you when reading such comments please consider the wider picture. We need researchers entirely focused.
So how do we interpret statistics in light of this?
Let’s analyse a real example to reveal some of the pitfalls
I will use a recent MND trial that made some significant media headlines. The trial of a drug known as cuatsm. It shows how easily a reader can be led to mis-interpret the findings.
This phase 1 trial produced two major headlines. First it stated that there was a 70% improvement in disease progression, and secondly significant improvements in respiratory function. You know what’s coming, though, don’t you!? Let’s take a look at the publicly available data, trial size and the precise outcomes measured.
It was a phase 1 trial, and only had 32 participants. It had no placebo arm, and everyone knew they were getting the drug (known as open label). And finally, the trial was only 6 months long! To put it further into context, the singular purpose of a phase 1 trial is to prove initial safety of drug over that period and to simply justify a full trial, a phase 2.
But what about the 70% improvements described, and how it was presented, Lee?
Using percentages can be useful, but they can also unwittingly mask the real story. 70% sounds a lot doesn’t it? But lets look at what the “improvement” actually was.
It meant that on average (within 32 patients) a patient lost 0.29 ALSFRS-R points per month as opposed to 1.02 points per month in the ‘historical’ placebo cases. ie 1.8 versus 6.2 over the 6 months. A drug participant still lost significant function, there was NO improvement, just a hint at a potential slow down.
Ok, you may feel that 1.8 versus 6.2 is indeed quite significant, and 70% certainly gives you that gut feel doesn’t it? However, the sample size, no placebo, the huge variability in rates of progress and the fact that it was early on in the disease meant that this statement was very bold indeed, and in fact simply can’t be justified. The standard deviation* of both groups would be almost certainly significant, although I cannot comment as it is NOT published in the results.
* standard deviation gives an indication of the spread of data from the average. Put simply, there can be values both lower or higher in the treatment group when compared to members of the placebo group. The average can hide the true variability.
The only proper conclusion from the study is that there was a potential hint at slow down.
“But Lee, why did a scientist write such things?”
Yes I must admit it might be difficult to marry up scientific precision with such presentation. However, there does need to be some motivation for moving to the next stage, ie phase 2. This will jump to about 100/200 participants, have a placebo arm, and run for longer.
There are indications that cuatsm might be the real deal, but at the same time it could still be nothing at all. And of course, long term safety can always scupper a treatment.
I am going to end on a very positive note for this drug. It is extremely promising because of another key reason, and that is the extensive pre-clinical (ie non-human tests) results. Very few drugs have got to this stage with such good evidence before. I am most certainly watching this like a hawk.
I now move on to using statistics in a bid to find further clues.
Using innovative statistics to justify a new human trial of a drug/treatment
This is a subject that has developed over the decades and a recent new addition to the toolkit is the extreme analysis of earlier trials. This is often known as data dredging. Now this sounds awful doesn’t it, but it is a very cunning method of using this already existing data to search for hidden clues.
Typically, trials target a measurement of a positive change of disease in a group of about 200 people who meet certain criteria to take part. Even if the final result of the group appears to show “no effect”, the data is often now “dredged” in an attempt to try and find some interesting themes. This is made possible by the large variety of information now collected, including genetic profiles and computer analysis.
As an example: perhaps one re-analysis of an apparent failed trial’s data might hint at there being an apparent positive effect for women between ages 45 and 55, but no one else. So is the drug useful for this group, and should it be approved? Maybe, but cast your mind back to my discussion regarding 95% chance of being correct? It is just possible that such trends are found within the data by pure chance.
To overcome this limitation, such data dredging is considered poor science without the second key element of the scientific method, which I hinted at earlier, and that is repetition. Remember, the rain stat over several days?
You might often see this called, in the media or scientific press, as a confirmatory study. Such a study is strictly required, designed with the new hypothesis in mind, and compares a now (new) larger collection of the “selected group” against a similar control sample of the same group.
So always beware of data dredging trial results, and be prepared to wait for the results of the confirmatory study. There are numerous examples in all avenues of research that have both succeeded and failed after such seemingly found effects.
Repetition of an original study is always required, and is probably the most singular important aspect of trials, ie can someone else, an independent group, reproduce the results.
Another way to use statistics that can make things appear not what they seem, or to highlight an issue
I will change diseases for this short section, and refer to a recent rather interesting headline on the BBC concerned with statins for heart disease.
Take a read of BBC News – Statins don’t work well for 1 in 2.
This headline caught my eye before I even read the detail. “Don’t work well!” Just what is this supposed to mean? It’s most certainly not a scientific term, is it?
First of all let us just establish a fact. Oooops! There I go and start to make an unjustifiable statement.
Let me start again! Statins are, based on all probability, some of the most proven and powerfully effective drugs in history. With over 15 years of data of hundreds of thousands of patients, we can see the real quantitive evidence of heart attacks that they have been avoided. To add to this, statins have a very minor adverse effects risk profile, which is almost totally containable through patients switching through different types/doses of statins. They really are near miracle drugs.
Digging into the headline further, what did “not work well” actually mean? Hold your breath folks!
The researchers categorised “works well” as reducing LDL Cholesterol by 40% or more!! 39% I will tell you is a great reduction! The thing we have to be careful of here is the framing (or selection) of the cases, ie under 40% reduction highlighted about half of the population, but what would under 37% reveal, only a third of the population? Can you see how this technique can be used to great effect?
When I read yet further into the report I found that these lower responders were taking lower doses in many cases!! In fact the original paper was focused on “why” there was a reduced effect in some lower responders, rather than do statins work or not! So in my view the media headline, not the research, was misleading. The last thing we need is such potentially harmful headlines. Here is a proper and precise account of the research for those of you, like me, who prefer detail, produced by the UK’s NHS.
Sorry readers that sort of reporting just gets my hackles up! If you read the BBC Website report, and the comments by the experts, you can just sense their anger at being questioned in the tone of their responses. It’s almost, “Oh my goodness another reporter has misread something!”
Ok, so the statistics show a drug is worth looking at further, and now what?
Imagine a trial that indicates that a drug has shown a statistically therapeutic effect. What next? Hopefully, there would be moves to take it further and potentially make it widely accessible. There are, however, still many further hurdles to get over before a drug will finally make it to market. I won’t cover these in this post, but suffice it to say they are at least as complex as the trials themselves.
However, I will now leave you to get on with sunbathing with this conundrum to further show that statistics are not the be all and end all.
Even though an improvement in a patient can be statistically significant within a trial, is this improvement of truly effective significance for a patient!? These two statements are different, and again need to be understood in the context of trial results. We often hear about this with the side effects of cancer treatments and their effect on quality of life for minimal extension of life. The same can be said of any treatment for MND/ALS. What is the burden of the treatment? A pill taken with coffee and cornflakes in the morning is slightly different to 2 weeks a month on a drip, or worse. So an effect has to be measured against burden as well.
That’s it for this update folks.
I will return soon with another research topic, as yet undecided, but most probably on the inevitable push for access to unapproved drugs or the interesting subject of Stratification as a critical future improvement in MND trials.