The entire paradigm of therapeutic and medical device development in the modern era is predicated on demonstrating safety and efficacy within parameters deemed appropriate by regulators like the Food and Drug Administration (FDA) and European Medicines Agency (EMA). This is proven in a series of clinical trials, known as Phase I to Phase IV studies, with each phase generally expected to demonstrate additional levels of data. This requirement for evidence generation via clinical trials contributes to the event driven nature of biopharma/medtech, creating somewhat predictable time points when sponsors can demonstrate added value in their programs with successful studies. The significance of a positive study to a share price or company valuation has had the unfortunate side effect of leading some sponsors to be less than 100% transparent in reporting the read-outs of trials, lest a hint of negative findings taint the remaining dataset. Below we lay out the archetypal road map of clinical trials and ways to be savvy to interpreting these.
General structure of clinical development programs
Phase I trials, sometimes known as first-in-man
studies, are primarily meant to show the safety and pharmacokinetic profile
(the onset, duration, and intensity of effect) of a novel therapeutic. These
studies are traditionally done in healthy volunteers, and are typically not
able to demonstrate efficacy. In areas like oncology, however, there may be
phase I studies in actual patients that can look for preliminary signs of efficacy
as well.
Phase II trials are conventionally dose-finding
studies, and will test different dose levels, ideally to find the right
therapeutic window of the drug, with the best combination of efficacy and
safety. This can be done by taking an initial group of patients and slowly
titrating up the dose of the drug over time until they experience adverse
events, or by creating specified trial cohorts at different dose levels. Phase
II results should importantly inform if it is even worthwhile to pursue Phase
III, the costliest phase of development, or otherwise scrap a program. Once the
best doses are selected from Phase II, sponsors then move on to Phase III.
The gold standard of Phase III trials is randomized,
controlled, and double-blinded studies. Randomized means that patients who are
enrolled in the study are divided, in a completely random fashion by someone
other than the treating physician, into different arms or cohorts of the study.
Controlled means that there is a comparator to the drug of interest.
Comparators could be a placebo, a specific drug or regimen, or “best
alternative care,” which may be any one of a series of different options
determined at the physician’s discretion. The latter would be more common in
studies where patients have already had multiple previous treatments. The
double blinded nature of the study means that neither patient nor physician
know what the dose of the administered drug (or placebo). This avoids
introducing bias into the study. In the case of medical devices where a
surgical procedure is required, trial protocols may call for “sham” procedures
to maintain the blinded nature of the study. With any luck, a Phase III trial
or trials will demonstrate the appropriate levels of safety and efficacy that
allow the sponsor to submit for regulatory approval. There may be multiple
Phase II and Phase III trials in a given development program, depending on the
therapeutic area. Any one negative trial could kill an entire development
program.
Sometimes, sponsors may have less than perfect data
packages that they submit for regulatory approval. When there may be lingering
questions about certain aspects of the drug or device, especially around things
like long term safety, regulators may still approve the application, contingent
on the completion of Phase IV or post-marketing studies, to make sure the worst
fears are not borne out in use among larger populations. Also of note, trials
are often conducted in “perfect” patients, who often look clinically very
different from patients treated in real world settings that may have other
co-morbidities, and be generally less healthy than trial patients. This makes
the collection of longer term evidence more pressing, as either the number of
severity of adverse events may vary in a real-world population compared to the
trial population.
Understanding the general formats of data read outs
Sponsors have many outlets along the course of
development to convey the results of trials to patients, health care providers,
and investors. These range from something as simple as a press release, through
poster and podium presentations at medical congresses, up to peer-reviewed
publications in academic journals. Moving along that spectrum, there is
increasingly more granular data, ideally shining a light on not only the good,
but also the bad and the ugly.
The terse nature of press releases leaves them
fraught with potential for ambiguity. At worst, sponsors may simply report
positive efficacy trends, without noting the statistical significance as defined
by a p-value, and with no mention of adverse events. The measure of efficacy
can also be poorly defined; some questions a savvy reader should ask are[1]:
- Is a positive signal from a primary endpoint or
secondary endpoint? The primary endpoint is the most important hypothesis the
trial sought to test.
- Was the p-value and form of statistical analysis
associated with that endpoint prespecified? A prespecified statistical analysis
is less vulnerable to selective p-value hacking once data has been unblinded.
- Were patients excluded from the efficacy
analysis? If so, why? For example, is it just a particular subgroup or
subgroups and not the entire enrolled population? Excluding patients may have
the effect of only cherry picking the best-performing patients for the data
analysis.
Moving up the chain, abstracts that are submitted to
medical congresses as either poster or podium presentations are commonly
peer-reviewed, albeit not with the same scrutiny that would come with a full
journal article. These presentations typically have more details on the methods
and results as compared to press releases. The public presentation of data by
ether principal investigators or the scientists/clinicians driving a
development program also provides at least some public venue for questions and
answers.
There is a smattering of other outlets for partial
data disclosure – these include corporate investor decks presented at investor conferences,
so called “R&D days” hosted by companies, and sponsored satellite symposia
at relevant scientific and medical conferences. These should generally be
approached with the same caveats as above given the selective disclosure.
While it depends on the journal, the final
publication of trial data is typically a much more robust and trustworthy
source than the original press releases of the same data. Not only are there
multiple authors on a given manuscript, many of whom are unlikely to be
employees of the sponsor, there are also multiple peer reviewers, whom are
experts in the same field and required as part the journal’s submission and
editorial process. In addition, there are the journal editors themselves who
are the final gatekeepers of the manuscript. In the event of particularly novel
or noteworthy findings, journals may invite another key opinion leader in the
same field to write an accompanying editorial to help put the findings, both
good and bad, into perspective as part of the same journal issue. Often newly
published articles may be open to comments from anyone interested in submitting
them, which sometimes serve the purpose of poking holes in analyses or
conclusions.
Full statistical plans and trial protocols may be
included as appendices to the main journal article, which can be rich sources
of information, but can be dense with jargon for non-experts. It is best to do
a full read of not only the full manuscript but also those appendix items, and
not simply rely on only the abstract or a third party summary, as these might
be no better than the original press release. It is often a matter of years
from the first press release to the final publication of the same data set.
Other trial designs
All of the above is based on very conventional clinical program designs, and there are some exceptions. For example, certain late stage cancers with a precision medicine biomarker, or other genetically defined rare diseases, have utilized much smaller clinical programs to gain regulatory approval, moving from a combined Phase I/II trial to a relatively small Phase III. This is due to the inability to recruit patients for large trials from a natural small pool, and in some cases, the highly targeted nature of the therapy allows for statistically significant demonstration of efficacy with a small trial size. There is also some interest in the use of basket trials to test the effect of one drug that targets a single mutation but in a variety of tumor types, or umbrella trials, that have many different arms within one trial. These designs come with different statistical considerations. Though the designs of clinical programs may vary, the platforms for presenting data are otherwise no different, and the same healthy dose of skepticism, if not more given the smaller trial sizes, should be applied to the interpretation of those results from press releases up to and including journal articles.
[1]
https://www.sciencedirect.com/science/article/pii/S2451865416301132