Critical evaluation: Validity and bias

Last updated: Monday, July 16, 2018

To evaluate validity we need to consider whether the research was done properly. All possible measures should be used to reduce the risk of bias in a clinical trial. The methods section of a paper should explain exactly what steps have been taken and the results should be fully and unambiguously reported. The CONSORT (CONsolidated Standards of Reporting Trials) 2010 checklist is a useful tool for assessing the validity of a trial. 

For simplicity, the following points describe a trial of placebo (control) versus drug treatment, but they also apply to trials that compare drug treatments (e.g. drug A versus drug B).

Courtesy of the

The number of participants (sample size) needs to be planned carefully. There are usually restrictions on numbers because of ethical, cost and time considerations. However the trial should be large enough to be adequately ‘powered’. The power of a trial is the likelihood that it will detect a difference between 2 groups when one genuinely exists. 

The ideal power for a trial is at least 80%, so that if the trial was repeated 100 times a statistically significant treatment effect would be seen in 80 of them.  The power of a study increases with sample size. Ideally authors should describe their power calculation. This should specify the primary endpoint (the main result being measured at the end of the trial) and take into account the expected outcome. If the difference between treatment and placebo is expected to be small, a larger sample size will be needed compared to a study looking for a bigger difference. 

The calculation will also need to include an allowance for participants dropping out. For example: “To detect a reduction in hospital stay of 3 days with a power of 80%, we calculated a sample size of 75 patients per group was needed, given an anticipated dropout rate of 10%”.

Trials that are underpowered or have a small sample size don’t have to be discounted, as they can still be useful, but larger trials or a meta-analysis may be needed to be more confident in the results. We also need to watch out for trials that are ‘overpowered’. If the sample size is too big, small effects of little clinical importance can be reported as statistically significant (see p-values).

Selection of volunteers from a patient population should be random. This stops the researcher from choosing their preferred patient population, so affecting the outcome of their trial favourably. For example, a researcher may approach every third patient that comes to a clinic to ask them participate, rather than choosing who they think will do well.

Allocation of volunteers to placebo and treatment groups should be concealed from the researchers. This is a different concept to ‘blinding’. It is recommended for all trials, including unblinded (open-label) trials. It prevents selection bias by ensuring the researchers do not influence which patients get the study treatment. For example, the early studies of diphtheria vaccine showed that more patients in the vaccine group died compared to placebo. This was because the sickest patients were chosen to receive the vaccine and the healthier patients were given a placebo. The best way of ensuring allocation concealment is to use a centralised service, where randomisation is carried out independently at a site away from the trial location (e.g. hospital pharmacy).

Ideally as many people as possible involved in the trial should be 'blind' (or masked) to whether volunteers are receiving placebo or treatment. The opposite is ‘open-label’, when everyone knows what the volunteer is receiving. ‘Double-blind’ usually means the investigators and the volunteers do not know which arm of the study each volunteer is in, and ‘triple-blind' means the committee monitoring the data also do not know.

However blinding is not always possible – for example with drugs that cause distinct side effects (e.g. peppermint oil capsules cause rectal burning) or if the treatment has a complicated dosage regime (e.g. warfarin dosed according to INR results). One way around this is to use something called a ‘PROBE’ design: prospective, randomised, open-label, blinded endpoint evaluation where the people doing the evaluation of the endpoints do not know which group the volunteers have been assigned to.

Courtesy of Simon Wills

The baseline characteristics of the groups under study should be as similar as possible. This helps to ensure that any effect seen in the treatment group is due to the treatment and not to pre-existing differences between the groups. The demographics of the groups should be described in the paper. If the baseline characteristics of the groups are very similar this can also be used as an indicator that allocation to groups was truly random.

Apart from the treatment or placebo, patients should be treated identically during the trial; they should receive the same number of blood tests, X-rays, and clinic appointments.

Participant flow should be clearly reported, showing whether and why volunteers did not receive the treatment allocated, or were lost to follow-up or excluded after the trial had started. If this leads to imbalances between the groups it is known as attrition bias. It is important to know which and how many trial participants were included in the final analysis. If only those available for follow-up are included it is known as ‘on-treatment’ or ‘per protocol’ analysis. ‘Intention-to-treat’ analysis includes all participants who underwent randomisation in their originally allocated groups, no matter what happened during the trial. This is generally favoured because it reduces bias and is more like real life, where people change their minds, or change or stop treatments.