Critical evaluation: Validity and chance

Last updated: Tuesday, July 14, 2015

In critically evaluating a paper you also need to ask whether the results of a trial occurred by chance. Statistical tests are used to assess this. The most commonly encountered terms are P values and confidence intervals.

A P value is the probability that a difference will be seen between two interventions in a trial when, in fact, there is no actual difference between the two interventions.

Probability is measured on a scale of 0 to 1 where an impossible event is given 0 and an event that is certain to happen is given 1. In drug trials, by convention, P<0.05 is regarded as being statistically significant. It means that there is a less than 1 in 20 chance that you have observed a difference between your study drug and placebo when there is no actual difference between them.

When evaluating trial data, it’s important not to rely solely on P-values, but to consider whether the results are important. For instance, a trial might show that an antihypertensive drug improved blood pressure readings in 2 people per year, but would this be clinically important, even if it was statistically significant?

We also need to consider that P-values are based only on data from a sample of people, and the results you get for that sample may not be the results you would get with a different sample.

Because of these limitations we should look at other statistical values.

    Courtesy of Tsyplakov, Wikimedia Commons

Confidence intervals can give us a measure of the certainty of a result. They are used to describe how sure we are the result we have obtained from studying our small sample of research participants would still hold true if we were able to study the whole population. They are expressed as a range of possible results, within which we expect the actual result to lie - the narrower the range, the more reliable the results. By convention, 95% confidence intervals are normally used in drug trials, but you may also encounter 90 or 99%. A confidence interval at 95% means that you can be 95% sure that the true result lies within the range quoted, or, expressed another way, that there is a 1 in 20 chance (i.e. 5%) that the true value lies outside the range quoted.

Confidence intervals also show if the difference between interventions is statistically significant or not. When dealing with results which are expressed as ratios (e.g. relative risk, hazard ratio, odds ratio), if the confidence intervals do not contain 1.0 then the result is statistically significant.

For example consider the following results of 2 studies comparing Drug A and Drug B in reducing the risk of stroke.

If you have a result expressed as a number (e.g. a difference in body weight) then if the confidence intervals do not contain zero then the result is statistically significant.

For example consider the following results of 2 studies investigating a new medicine (Drug A) versus an established medicine (Drug B).

Information on how to calculate confidence intervals can be found in the bulletin ‘Statistics in Divided Doses’ number 3 and number 8.