One of the (wrong) explanations that you often see of what a p-value means is “the probability that data have arisen by chance.” I think people may struggle to see why this is wrong, as I did for a long time. A p-value is the probability of getting the data (or more extreme data) if the null hypothesis (no difference) is correct – right? So that would mean the specific result you got must have been due to chance variation, doesn’t it? So why isn’t the p-value the probability that the result was due to chance?
The problem is that there are two ways of interpreting “the probability that a result is due to chance.”
1. The probability that chance or random variation was the process that produced the result;
2. The probability of getting the specific data (or more extreme data) that you got in your experiment, if chance was the only process operating.
The second of these is what the p-value tells you; but the first is the interpretation that most people give it. The p-value tells you nothing about the process that produced the result, because it is calculated on the assumption that the null hypothesis is correct.
Original post: http://blogs.warwick.ac.uk/simongates/entry/8220the_probability_that/ 12 November 2016