Via Twitter, I came across a blog post by Dr John Mandrola (here) on the VEST trial, whose results have recently been presented at the American College of Cardiology Annual Scientific Session. The trial evaluated a wearable cardioverter-defibrillator in patients after myocardial infarction (conference abstract (without results) here).
Dr Mandrola seems not to like the whole approach of wearable cardioverter defibrillators, and he raises a number of objections to the way the trial has been interpreted and the spin put on the results in the press release. I haven’t seen the press release but it would be no surprise if it distorted the results to tell a nice and positive story. I don’t have any opinions about wearable medical technology, but I was interested in the issues of interpretation of trial results that the blog post raised.
Mandrola’s criticism is that the trial did not find a significant difference in the primary outcome, sudden death, which is what the wearable cardioverter-defibrillator (WCD) seeks to prevent (1.6% in the WCD group and 2.4% in the control group, p=0.18), but it did find a significant difference in overall deaths (3.1% versus 4.9%, p=0.04). This seems to have led to over-positive conclusions in presentations and the press release, and unwarranted speculation about misclassification and other things that might have caused a difference in overall death but not sudden death – the outcome the intervention was intended to improve.
So what am I objecting to here?
Well, it gets off to a bad start in the first sentence: “The VEST trial was negative.” The habit of dividing trials neatly into “negative” and “positive” based on statistical significance (and usually statistical significance of just one outcome) has been criticised over and over but just won’t go away. So, one more time: in the real world, some trials will have a clear result that the treatment is beneficial or harmful (rare). Others will clearly show that there is no practically important difference (even rarer). The majority are to some extent in a grey area where there might be some benefit and possibly some harm, but definitely some uncertainty. It doesn’t do us any favours to pretend that we can divide things up neatly into positive and negative trial. We can’t. It’s the old issue of expecting certainty.
The interpretation of the results seems to look just at the statistical significance and nothing else, leading to statements like “the WCD did not reduce its primary endpoint.” Yes it did; 1.6% is lower than 2.4%. Whether that’s a good estimate of any underlying true effect, how uncertain we are about it, whether it could just be caused by chance and whether it’s clinically important are separate questions. The habit of equating non-significance with no difference has been criticised for decades, but is deeply ingrained and shows no sign of going away. Non-significant results just don’t mean that.
A fairly obvious point is that the incidence of outcomes is really low (which is a good thing for the patients) but it means that you’re much more likely to see a statistically significant difference in overall deaths than in sudden deaths, because there are just more of them. In VEST, the risk ratios for sudden death and overall death were both about 0.63, but one had p=0.18 and one had p=0.04, because there were more events. So it’s misleading to say there was an effect on one outcome but not the other. In fact it probably makes more sense to look at non-sudden deaths rather than overall deaths, because overall deaths includes sudden deaths so the two outcomes are inevitably related. The numbers of sudden and non-sudden deaths were:
Sudden death: Vest 24/1524 (1.6%), control 19/778 (2.4%)
Non sudden death: Vest 23/1524 (1.5%), control 19/778 (2.4%).
[overall deaths 47/1524 (3.1%) versus 38/778 (4.9%)]
So virtually the same. It’s true that you probably wouldn’t expect the intervention to reduce non-sudden deaths, but maybe it isn’t beyond the bounds of possibility that it could? The numbers are small and uncertainty large, so it’s hard to be very definitive about what is going on here.
The point is really that by just focussing on whether results are significant or not we create a misleading impression that one outcome is reduced but the other isn’t.