I was part of a team that conducted a clinical trial, and we submitted the final report to a well-known journal. The intervention looked as though it had little effect on the nominated primary outcome, but had beneficial effects on important secondary outcomes. Here are some of the comments that came back from the journal’s editor:
“The study interpretation needs to be based on the primary analysis of the primary endpoint, which was null. Secondary endpoints need to be interpreted as only exploratory and hypothesis generating.”
“…the study conclusions should be focused on the null primary result…”
“To whatever extent the secondary endpoint findings are addressed in the Discussion, it should be brief, and should highlight that the interpretation of positive findings should be considered exploratory and hypothesis generating.”
Let’s consider this a bit. The editor’s comments reflect a common view of trials:
- There is a binary classification of completed trials, into those that are “positive” (they show that one treatment is better than another) and those that are “null” (or “negative” – they don’t show a difference).
- This classification should be based only on one specified primary outcome.
- Whether a trial is “positive” or “null” should be determined by a significance test.
Not surprisingly, I take issue with all of this. First, dichotomisation of results. It’s clearly nonsensical to insist that all results should have to be either “positive” or “null,” and statisticians have been arguing against this practice for years. No, decades. But anyway, results can be strongly positive, a bit positive, unsure but looking promising, definitely useless, or anything else. It does nobody any favours to shoehorn everything into the same two boxes.
And of course, in lots of situations there will be several outcomes that should play a role in our interpretation of the trial, because they are important to patients and clinicians. Most treatments don’t have just one effect, and patients and clinicians are rightly interested in the effects on all aspects of their lives. So it makes sense for the interpretation of the results, and theeir implications for clinical practice, to draw on all of the relevant information. Hanging the entire interpretation of the trial on one outcome seems … potentially misleading?
And should interpretation of secondary outcomes just be exploratory and hypothesis-generating? That doesn’t make a lot of sense either. Imagine you run a trial with three important outcomes, one of which you nominate as “primary” in the traditional way. Now suppose you don’t find any difference in that outcome, but for the other two outcomes, the treatment is clearly beneficial. Most people would say that provided pretty good evidence that the intervention under test was helpful to patients. But if you follow the editor’s logic, the evidence of benefit from the “secondary” outcomes is just exploratory, so you now need to do another trial, nominating a different outcome as primary, so that you can test that one, and definitively allocate the result to the “positive” or “null” bin. You might have to do a third trial too, to test the third outcome, before you can conclude that the treatment is beneficial. This obviously does not make a lot of sense.
In the background, behind all of this, is significance testing. The whole idea of having a single primary outcome arose through concerns about multiple testing and lack of prespecification. If you’re declaring your trial positive based on achieving statistical significance, it’s much easier to do that if you test lots of outcomes, or better still, decide which one you’re basing your conclusions on after getting the results. So prespecification of outcomes guards against that, and having a single primary outcome is intended to avoid getting misleading answers through multiple testing.
Of course, lots of people recently have pointed out that basing decisions on a significance test is a bad idea (notably the American Statistical Association). It seems to me that this is an example where the statistical method has ended up dictating the answers to the clinical questions. Because the statistical methods seem to demand use of a single primary outcome, and/or adjustment for multiple testing to control error rates, we have tended to alter the clinical questions that are addressed to fit into this framework, rather than ensuring that we use the right statistical methods to answer the important clinical questions.