Some time ago I wrote a post complaining that the New England Journal of Medicine was promoting testing of baseline characteristics in randomised trials, in their Instructions for Authors (post is here). After I wrote that I wondered if I was being fair; it could possibly refer to comparisons of outcomes, not baseline characteristics.
The good news is that NEJM have updated their instructions and clarified this. Here’s what it says now:
For tables comparing treatment groups at baseline in a randomized trial (usually the first table in the manuscript), significant differences between or among groups (i.e., P<0.05) should be identified in a table footnote and the P value should be provided in the format specified above.
So the bad news is that they are unambiguously telling people to include significance tests for basseline characteristics. Perhaps they should read CONSORT:
“Unfortunately significance tests of baseline differences are still common…. Such significance tests assess the probability that observed baseline differences could have occurred by chance; however, we already know that any differences are caused by chance. Tests of baseline differences are not necessarily wrong, just illogical. Such hypothesis testing is superfluous and can mislead investigators and their readers.”
Pretty amazing that they don’t get this really.
PS I should have included a link to the NEJM author instructions here. It’s in Guidelines for Statistical Methods, 12th (second last) bullet point. Amusingly, the last point says you have to provide all the information specified in the CONSORT checklist.
First of all, I at not anstatistician but forever trying to learn more. And I have done just what you write about, namely significance test of baseline characteristics. I also get that we test whether differences have occurred by chance. However, how do we know that differences did not occur by chance? Maybe faults by design? Should differences not cause interest. Thank you.
LikeLike
Hi Ann, thanks for commenting. I’m certainly not suggesting that we shouldn’t look at the balance of baseline characteristics in randomised trials. Just that, if the randomisation works, then a null hypothesis test is pretty pointless, because we already know the answer. So the common procedure of using hypothesis tests to detect important differences in baseline characteristics makes no sense.
If you suspect the randomisation isn’t working, though, it’s more interesting. You’d then expect to see data (i.e.baseline characteristics) that depart from randomness more than expected. How to deetect this is not straightforward; P-values might be a part of it, as they give you a measure of the unusualness of the data if the null hypothesis is true, so you’d expect to see more low p-values if the randomisation was being subverted. You’d have to think about correlations between baseline variables though e.g. if one group got older patients, they might also be sicker, because of a correlation between age and sickness, so it wouldn’t be right to think about age and sickness as independent baseline characteristics.
LikeLike