Critical evaluation: Validity and bias
There are several useful tools available to help you assess the validity of a trial including;
For simplicity, the following points describe a trial of placebo (control) versus drug treatment, but they also apply to trials that compare drug treatments (e.g. drug A versus drug B).
The ideal power for a trial is at least 80%. This means that if the trial was repeated 100 times a statistically significant treatment effect would be seen in 80 of them. The power of a study increases with sample size.
A power calculation is essential for determining the right number of participants and ideally authors should describe theirs. This calculation should specify the primary endpoint
(the main result being measured) and account for the expected outcome. If the difference between treatment and placebo is
expected to be small, a larger sample size will be needed compared to a study looking for a bigger difference.
The calculation will also
need to include an allowance for participants who might drop out. For example: “To
detect a reduction in hospital stay of 3 days with a power of 80%, we
calculated a sample size of 75 patients per group was needed, given an
anticipated dropout rate of 10%”.
Trials that are underpowered or have a small sample size can still be useful, but their results might not be definitive. Larger trials or a meta-analysis may be needed to be more confident in the results. We also need to watch out for trials that are ‘overpowered’. If the sample size is too big, even very small effects of little clinical importance can be reported as statistically significant (see p-values). This can give a misleading impression of a treatment's effectiveness.
Trials that are underpowered or have a small sample size can still be useful, but their results might not be definitive. Larger trials or a meta-analysis may be needed to be more confident in the results. We also need to watch out for trials that are ‘overpowered’. If the sample size is too big, even very small effects of little clinical importance can be reported as statistically significant (see p-values). This can give a misleading impression of a treatment's effectiveness.
Selection of volunteers from a patient population should be random. This stops the researcher from choosing their preferred patient population, which could make the results look better than they really are.For example, a researcher may approach every third patient that comes to a clinic to ask them to participate, rather than choosing the ones they think will do well.
Ideally as many people as possible involved in the trial should be 'blinded' (or masked) so they don't know who is getting which treatment or placebo. The opposite is ‘open-label’, when everyone knows what the volunteer is receiving. ‘Double-blinded’ usually means the investigators and the volunteers do not know which arm of the study each volunteer is in, and ‘triple-blinded' means the committee monitoring the data also do not know.
However blinding is not always possible – for example with drugs that cause distinct side effects which will make it easier fo the participant or researcher to figure out if they're on the drug (e.g. peppermint oil capsules cause rectal burning) or if the treatment has a complicated dosage regimen (e.g. warfarin dosed according to INR results). One way around this is to use something called a ‘PROBE’ design: prospective, randomised, open-label, blinded endpoint evaluation where the people doing the evaluation of the endpoints do not know which group the volunteers have been assigned to. This ensures the data are not influenced by the expectattions or knowledge of the researchers or the participants.
Apart from the treatment or placebo, patients should be treated identically during the trial; they should receive the same number of blood tests, X-rays, and clinic appointments.
Participant flow should be clearly reported, showing exactly what happened to every participant. A good report should explain if and why volunteers did not receive the treatment allocated, or if they were lost to follow-up, dropped out or were excluded after the trial began. If this leads to imbalances between the groups, it is known as attrition bias. It’s also important to know which and how many trial participants were included in the final analysis. There are two main ways to analyse the data: ‘on-treatment’ or ‘per protocol’ analysis where only those available for follow-up are included, or ‘intention-to-treat’ (ITT) analysis where all participants who underwent randomisation are included in the groups they were originally assigned to, no matter what happened during the trial. ITT analysis is generally favoured because it reduces bias and is more like real life, where people change their minds, or change or stop treatments. This gives a more realistic picture of the treatment’s effectiveness.






