The methods and terminology of cancer trials: A review

UNDERSTANDING STATISTICS

Statisticians are an intricate and essential part of designing a Phase III trial. They must prospectively determine how many patients need to be compared to show a clinical significance. This is often referred to as the power of the trial. Power will be described in terms of the percent chance a trial has of showing a certain percentage of difference. A trial having a 90% chance of demonstrating a 20% difference is a common goal of a trial’s power.

The P value is also a common term discussed in Phase III trials and refers to the statistical measure of whether the trial results can demonstrate that there is a different outcome among the compared groups. A P value less than .05 (P <.05) is often the goal and means that there is less than a 5% chance that the results could have occurred by chance.

For example, suppose that researchers want to study whether chemotherapy drug A is more effective than chemotherapy drug B. Before the trial is started, statisticians must determine how many patients the study must include to show with reasonable (90%) certainty that the treatments have a 20% difference in outcome. This would be the power of the study. After the study is completed, statistical analysis of the data will determine whether the difference seen between chemotherapy drugs A and B had less than a 5% chance of occurring by chance (P < .05).3,4

Another term is confidence interval or CI. The CI is the spread of results that should be considered as the potential actual impact of a treatment. The desire is to have a CI of a narrow range and to have ranges of results that do not demonstrate any overlap between the studied groups or arms. If the CI does overlap, it means that the two groups may not be different in some situations despite the final results demonstrating a difference.

To return to our example study comparing chemotherapy drugs A and B, 95% of the individual patient outcomes would be included in the CI. If only a few patients are studied, the 95% CI might include results that are very different from one another, producing a CI of wide range; in a large trial, however, many of the patients are likely to have similar results, and thus the CI will be of narrower range. If CI overlap is seen when the two treatments are compared, this means that for some patients, the two treatments showed no difference in outcome. If only a small number of patients are included in a trial comparing two treatments, the results may show a significant difference; but the CI may be so broad that the two results overlap, and thus the results may not be different at all for several patients.

Clinical trials can have what are called alpha (type I) and beta (type II) errors. An alpha error is present when a trial shows a difference that really does not exist. This is referred to as a false positive. A beta error is just the opposite and is present when a trial shows no difference in the treatments when one actually exists. This is referred to as a false negative. These errors are simply the results of outcomes defying the odds of numbers that were determined to be necessary in order to carry out a comparison between two treatments or outcomes.4 The only way to minimize the chances of both types of errors is to increase sample size, which may or may not be feasible.

In clinical trials, relative risk (RR) is the risk of an event (or of developing a disease) relative to exposure. Relative risk is a ratio of the likelihood of the event occurring in the control group versus the experimental (or noncontrol) group. If the event is less likely in the experimental group, then the RR is less than 1; if the event is more likely, the RR is more than 1. An outcome is more likely if the RR is greater than 1 and less likely if the RR is less than 1. The outcome might be a bad thing (such as a toxicity reaction) or a good thing (a more effective treatment).

Another often used term in statistics, and similar to RR, is the hazard ratio (HR). In cancer trials, HR typically refers to survival differences between two or more groups being compared. The HR is useful when the risk is not constant with respect to time as it uses information collected at different times. The term is typically used in the context of survival over time.

As an example, if the HR is 0.5, then the relative risk of dying for one group is half the risk of dying for the other group. HR may refer to overall survival or to other forms of survival, such as disease-free or progression-free survival. An HR of 1 corresponds to equal treatments; an HR of 2 implies that at any time, twice as many patients in the active group are having an event proportionately compared with the comparator or control group. An HR of 0.5 means that half as many patients in the active group have an event at any point in time compared with placebo.