The Flawed "Smart" Designs of Cancer Drug Dosing

What if the sophisticated algorithms we use to find safe dosages for new cancer drugs are no more reliable than the "simplistic" methods they were designed to replace? For three decades, the oncology world has moved away from traditional "3+3" dose-finding designs toward complex, computer-heavy models known as Bayesian Phase I (BP1) designs.

The Promise of Advanced Models

These systems use cumulative data to pinpoint the Maximum Tolerated Dose (MTD)—the highest dose a patient can take without unacceptable side effects. They are prized for their perceived efficiency and mathematical elegance.

A Troubling Instability

However, a rigorous analysis of these "smart" designs reveals a troubling instability. When tested in the small patient groups typical of early trials (often 10 to 40 subjects), these high-tech models frequently succumb to "order sensitivity."

The Problem in Plain Terms

The final recommendation for a life-saving drug might depend less on the drug’s actual chemistry and more on the random order in which patients happen to walk through the clinic door.

Why This Flaw Matters

This is critical because Phase I trials are the gatekeepers of medicine. Getting the dose wrong has severe consequences for patients and the development of new therapies.

Consequences of Dose Selection Errors

Dose Settles Too Low: A promising drug may appear ineffective and be abandoned, depriving patients of a potential cure.
Dose Settles Too High: Future patients in later trials are put at severe risk of unacceptable side effects.

Key Findings from the Analysis

Researchers analyzed four published clinical trials and ran extensive simulations to track how these models behave in the real world. The results were startlingly inconsistent.

Alarming Inaccuracy in a Key Scenario

In one specific simulation (the Gamma scenario), the most frequent outcome for a leading Bayesian model was treating zero patient cohorts at the true MTD—even when the trial started at the correct dose.

A Negligible Advantage in Accuracy

The study found that MTD-selection success rates were remarkably similar across both simple and complex designs.

Head-to-Head Comparison

For a trial with 25 patients and 7 dose levels:

The complex Bayesian CRM model hit the mark 53.0% of the time.
The much simpler Up-and-Down (U&D) design was right 51.3% of the time.

This negligible difference suggests the added complexity of these algorithms offers no meaningful advantage in accuracy.

The Danger of Premature "Settling"

The danger lies in how quickly these models lock onto a final answer.

The Premature Lock-In

The study found that 80–90% of Bayesian runs locked onto a single dose by the 12th patient cohort. While this looks like efficiency, the researchers discovered that early settling did not correlate with better accuracy.

Instead, the models are often "fooled" by early toxicities, leading to a "winner-take-all" error where the trial gets stuck on the wrong dose before it has enough data to self-correct.

A Sign of Unreliable Logic

The researchers also noted that some modern designs exhibited "incoherent transitions."

A Critical Design Flaw

These models would perform irrational actions, such as raising the dose immediately after a patient suffered a toxic reaction. In some models, this occurred in 73.5% to 86.6% of runs. In contrast, traditional U&D designs had a 0% incoherence rate by their fundamental design rules.

A Call for Prioritizing Reliability

While this data offers a sobering reality check, the researchers’ findings are based on numerical simulations rather than a formal mathematical proof. They suggest a clear path forward.

The Researchers' Recommendation

Until these complex models can be made more stable, the medical community should prioritize the reliability of simpler, more transparent designs. For now, the "black box" of drug dosing remains more volatile than we realized.

Reference:
Small-Sample Behavior of Novel Phase I Cancer Trial Designs. Assaf P. Oron and Peter D. Hoff. (arXiv:1202.4962v2 [stat.ME])