This is about the THAPCA trial, published in NEJM in 2015 (paper here) . It’s a trial of hypothermia after paediatric cardiac arrest, randomising children aged between 2 days and 18 years to either normothermia or hypothermia, started within 6 hours of return of circulation. First of all, let me say that’s a challenging trial, and randomising 295 patients in a bit over 3 years was no mean achievement.
I wouldn’t be writing about it if I didn’t think there were some interesting issues, though.
The first is the rationale for the trial. Adult hypothermia trials have been conducted and the treatment is included in clinical guidelines (I’m avoiding getting into discussion of the adult evidence). The THAPCA investigators say:
“There are significant differences between adult and pediatric populations with out-of-hospital cardiac arrest, and results cannot be generalized between age groups”.
Really? Two points. First, the population eligible for THAPCA was heterogeneous; what they called “children” ranged from 18 year olds, who are probably indistinguishable from adults, to 2 day old babies, who might be quite different. So the similarity of paediatric and adult populations might depend very largely on who you recruit. I’m not really sure about this, but it would be good if the populations for trials were defined on the basis of physiology rather than administrative convenience. Perhaps it is – I’d like to see this explained by someone who knows about this. Second, even if they aren’t the same, the responses of adults and children are surely likely to be related? It seems unlikely that they would have very different treatment effects. Unlikely to me, anyway. I may be wrong. So that argues that information from adults might be informative about children to some extent, and some form of hierarchical modelling or incorporation of existing knowledge would be appropriate.
Another issue was the choice of primary outcome. This was defined as being alive at 12 months with a score >70 on the Vineland Adaptive Behavior Scales. The problem is that some of the participants scored below this at baseline, so they were excluded from the analysis (“modified intention to treat”). It might not have made a huge difference to the results, but it’s a bit of a design failure.
And the results:
…there was no significant difference in the primary outcome between the hypothermia group and the normothermia group (20% vs. 12%; relative likelihood, 1.54; 95% confidence interval [CI], 0.86 to 2.76; P=0.14).
Maybe not “significant,” but 20% versus 12% is a pretty substantial difference.
1-year survival was similar (38% in the hypothermia group vs. 29% in the normothermia group; relative likelihood, 1.29; 95% CI, 0.93 to 1.79; P=0.13)
Personally, I wouldn’t say 38% was similar to 29% at all! I guess it’s a way of not saying “not significant.”
…therapeutic hypothermia, as compared with therapeutic normothermia, did not confer a significant benefit in survival with a good functional outcome at 1 year.
So the familiar story of not finding “significant” differences and concluding no benefit. Yes, the results for 1-year survival and for being alive with Vineland score > 70 were compatible with the true effect being zero – but they were also compatible with the true effect being lots of other things, some of them very substantial benefit.
If you look at what the sample size was based on, it’s really not at all surprising that the difference was not significant. It was based on 85% power to get statistical significance with an absolute 20% increase in the primary outcome (from a control group rate of 15% to 35% – the paper isn’t explicit about what was used). That is an absolutely massive effect, and very unlikely to be true. So even if this was correct, the trial’s result would be non-significant 15% of the time, and with a smaller true benefit, non-significance is much more likely. So it would seem a bit mad just to rely on statistical significance as the sole guide to whether this might be a useful treatment.
I think a big part of the misunderstanding of statistics is that people have an expectation that if there is a “real” difference (and I’m not sure what that means) then their experiment will yield “significance.” Part of the expectation that RCTs should deliver certainty.