Critical evaluation: Validity and bias

Last updated: Sunday, January 18, 2026

To evaluate validity we need to consider whether the research was done properly. All possible measures should be used to reduce the risk of bias in a clinical trial. The methods section of a paper should explain exactly what steps have been taken and the results should be fully and unambiguously reported. 

There are several useful tools available to help you assess the validity of a trial including;


For simplicity, the following points describe a trial of placebo (control) versus drug treatment, but they also apply to trials that compare drug treatments (e.g. drug A versus drug B).

The number of participants (sample size) needs to be planned carefully. It's a balancing act between how many participants can be enrolled considering ethical, cost and time constraints, but making sure there are enough so the study is adequately 'powered'. The power of a trial is the likelihood that it will detect a difference between 2 groups when one genuinely exists. 

The ideal power for a trial is at least 80%. This means that if the trial was repeated 100 times a statistically significant treatment effect would be seen in 80 of them. The power of a study increases with sample size. 

A power calculation is essential for determining the right number of participants and ideally authors should describe theirs. This calculation should specify the primary endpoint (the main result being measured) and account for the expected outcome. If the difference between treatment and placebo is expected to be small, a larger sample size will be needed compared to a study looking for a bigger difference. 

The calculation will also need to include an allowance for participants who might drop out. For example: “To detect a reduction in hospital stay of 3 days with a power of 80%, we calculated a sample size of 75 patients per group was needed, given an anticipated dropout rate of 10%”.

Trials that are underpowered or have a small sample size can still be useful, but their results might not be definitive. Larger trials or a meta-analysis may be needed to be more confident in the results. We also need to watch out for trials that are ‘overpowered’. If the sample size is too big, even very small effects of little clinical importance can be reported as statistically significant (see p-values). This can give a misleading impression of a treatment's effectiveness. 

Selection of volunteers from a patient population should be random. This stops the researcher from choosing their preferred patient population, which could make the results look better than they really are.For example, a researcher may approach every third patient that comes to a clinic to ask them to participate, rather than choosing the ones they think will do well.

Allocation
of volunteers to placebo and treatment groups should be concealed from the researchers. This is a different concept to ‘blinding’. It is recommended for all trials, including unblinded (open-label) trials. It prevents selection bias by ensuring the researchers do not influence which patients get the study treatment. For example, early studies of diphtheria vaccine showed a higher death rate in patients in the vaccine group than in the placebo group. This was because the sickest patients were chosen to receive the vaccine and the healthier patients were given a placebo. The best way of ensuring allocation concealment is to use a centralised service, where randomisation is carried out independently at a site away from the trial location (e.g. hospital pharmacy).

Ideally as many people as possible involved in the trial should be 'blinded' (or masked) so they don't know who is getting which treatment or placebo. The opposite is ‘open-label’, when everyone knows what the volunteer is receiving. ‘Double-blinded’ usually means the investigators and the volunteers do not know which arm of the study each volunteer is in, and ‘triple-blinded' means the committee monitoring the data also do not know.


However blinding is not always possible – for example with drugs that cause distinct side effects which will make it easier fo the participant or researcher to figure out if they're on the drug (e.g. peppermint oil capsules cause rectal burning) or if the treatment has a complicated dosage regimen (e.g. warfarin dosed according to INR results). One way around this is to use something called a ‘PROBE’ design: prospective, randomised, open-label, blinded endpoint evaluation where the people doing the evaluation of the endpoints do not know which group the volunteers have been assigned to. This ensures the data are not influenced by the expectattions or knowledge of the researchers or the participants. 





The baseline characteristics of the groups under study should be as similar as possible. This helps to ensure that any effect seen in the treatment group is due to the treatment and not to pre-existing differences between the groups.  When the baseline characteristics of the groups are very similar, its is also a good sign that allocation to groups was truly random. The demographics of the froups should be clearly described in the study paper. 

Apart from the treatment or placebo, patients should be treated identically during the trial; they should receive the same number of blood tests, X-rays, and clinic appointments.

Participant flow should be clearly reported, showing exactly what happened to every participant. A good report should explain if and why volunteers did not receive the treatment allocated, or if they were lost to follow-up, dropped out or were excluded after the trial began. If this leads to imbalances between the groups, it is known as attrition bias. It’s also important to know which and how many trial participants were included in the final analysis. There are two main ways to analyse the data: ‘on-treatment’ or ‘per protocol’ analysis where only those available for follow-up are included, or ‘intention-to-treat(ITT) analysis where all participants who underwent randomisation are included in the groups they were originally assigned to, no matter what happened during the trial. ITT analysis is generally favoured because it reduces bias and is more like real life, where people change their minds, or change or stop treatments. This gives a more realistic picture of the treatment’s effectiveness.

PAGE 2 OF 8.  NEXT PAGE  ›