Interpreting clinical trial results with a healthy dose of skepticism

The entire paradigm of therapeutic and medical device development in the modern era is predicated on demonstrating safety and efficacy within parameters deemed appropriate by regulators like the Food and Drug Administration (FDA) and European Medicines Agency (EMA). This is proven in a series of clinical trials, known as Phase I to Phase IV studies, with each phase generally expected to demonstrate additional levels of data. This requirement for evidence generation via clinical trials contributes to the event driven nature of biopharma/medtech, creating somewhat predictable time points when sponsors can demonstrate added value in their programs with successful studies. The significance of a positive study to a share price or company valuation has had the unfortunate side effect of leading some sponsors to be less than 100% transparent in reporting the read-outs of trials, lest a hint of negative findings taint the remaining dataset. Below we lay out the archetypal road map of clinical trials and ways to be savvy to interpreting these.

General structure of clinical development programs

Phase I trials, sometimes known as first-in-man studies, are primarily meant to show the safety and pharmacokinetic profile (the onset, duration, and intensity of effect) of a novel therapeutic. These studies are traditionally done in healthy volunteers, and are typically not able to demonstrate efficacy. In areas like oncology, however, there may be phase I studies in actual patients that can look for preliminary signs of efficacy as well.

Phase II trials are conventionally dose-finding studies, and will test different dose levels, ideally to find the right therapeutic window of the drug, with the best combination of efficacy and safety. This can be done by taking an initial group of patients and slowly titrating up the dose of the drug over time until they experience adverse events, or by creating specified trial cohorts at different dose levels. Phase II results should importantly inform if it is even worthwhile to pursue Phase III, the costliest phase of development, or otherwise scrap a program. Once the best doses are selected from Phase II, sponsors then move on to Phase III.

The gold standard of Phase III trials is randomized, controlled, and double-blinded studies. Randomized means that patients who are enrolled in the study are divided, in a completely random fashion by someone other than the treating physician, into different arms or cohorts of the study. Controlled means that there is a comparator to the drug of interest. Comparators could be a placebo, a specific drug or regimen, or “best alternative care,” which may be any one of a series of different options determined at the physician’s discretion. The latter would be more common in studies where patients have already had multiple previous treatments. The double blinded nature of the study means that neither patient nor physician know what the dose of the administered drug (or placebo). This avoids introducing bias into the study. In the case of medical devices where a surgical procedure is required, trial protocols may call for “sham” procedures to maintain the blinded nature of the study. With any luck, a Phase III trial or trials will demonstrate the appropriate levels of safety and efficacy that allow the sponsor to submit for regulatory approval. There may be multiple Phase II and Phase III trials in a given development program, depending on the therapeutic area. Any one negative trial could kill an entire development program.

Sometimes, sponsors may have less than perfect data packages that they submit for regulatory approval. When there may be lingering questions about certain aspects of the drug or device, especially around things like long term safety, regulators may still approve the application, contingent on the completion of Phase IV or post-marketing studies, to make sure the worst fears are not borne out in use among larger populations. Also of note, trials are often conducted in “perfect” patients, who often look clinically very different from patients treated in real world settings that may have other co-morbidities, and be generally less healthy than trial patients. This makes the collection of longer term evidence more pressing, as either the number of severity of adverse events may vary in a real-world population compared to the trial population.

Understanding the general formats of data read outs

Sponsors have many outlets along the course of development to convey the results of trials to patients, health care providers, and investors. These range from something as simple as a press release, through poster and podium presentations at medical congresses, up to peer-reviewed publications in academic journals. Moving along that spectrum, there is increasingly more granular data, ideally shining a light on not only the good, but also the bad and the ugly.

The terse nature of press releases leaves them fraught with potential for ambiguity. At worst, sponsors may simply report positive efficacy trends, without noting the statistical significance as defined by a p-value, and with no mention of adverse events. The measure of efficacy can also be poorly defined; some questions a savvy reader should ask are[1]:

  • Is a positive signal from a primary endpoint or secondary endpoint? The primary endpoint is the most important hypothesis the trial sought to test.
  • Was the p-value and form of statistical analysis associated with that endpoint prespecified? A prespecified statistical analysis is less vulnerable to selective p-value hacking once data has been unblinded.
  • Were patients excluded from the efficacy analysis? If so, why? For example, is it just a particular subgroup or subgroups and not the entire enrolled population? Excluding patients may have the effect of only cherry picking the best-performing patients for the data analysis.

Moving up the chain, abstracts that are submitted to medical congresses as either poster or podium presentations are commonly peer-reviewed, albeit not with the same scrutiny that would come with a full journal article. These presentations typically have more details on the methods and results as compared to press releases. The public presentation of data by ether principal investigators or the scientists/clinicians driving a development program also provides at least some public venue for questions and answers.

There is a smattering of other outlets for partial data disclosure – these include corporate investor decks presented at investor conferences, so called “R&D days” hosted by companies, and sponsored satellite symposia at relevant scientific and medical conferences. These should generally be approached with the same caveats as above given the selective disclosure.

While it depends on the journal, the final publication of trial data is typically a much more robust and trustworthy source than the original press releases of the same data. Not only are there multiple authors on a given manuscript, many of whom are unlikely to be employees of the sponsor, there are also multiple peer reviewers, whom are experts in the same field and required as part the journal’s submission and editorial process. In addition, there are the journal editors themselves who are the final gatekeepers of the manuscript. In the event of particularly novel or noteworthy findings, journals may invite another key opinion leader in the same field to write an accompanying editorial to help put the findings, both good and bad, into perspective as part of the same journal issue. Often newly published articles may be open to comments from anyone interested in submitting them, which sometimes serve the purpose of poking holes in analyses or conclusions.

Full statistical plans and trial protocols may be included as appendices to the main journal article, which can be rich sources of information, but can be dense with jargon for non-experts. It is best to do a full read of not only the full manuscript but also those appendix items, and not simply rely on only the abstract or a third party summary, as these might be no better than the original press release. It is often a matter of years from the first press release to the final publication of the same data set.

Other trial designs

All of the above is based on very conventional clinical program designs, and there are some exceptions. For example, certain late stage cancers with a precision medicine biomarker, or other genetically defined rare diseases, have utilized much smaller clinical programs to gain regulatory approval, moving from a combined Phase I/II trial to a relatively small Phase III. This is due to the inability to recruit patients for large trials from a natural small pool, and in some cases, the highly targeted nature of the therapy allows for statistically significant demonstration of efficacy with a small trial size. There is also some interest in the use of basket trials to test the effect of one drug that targets a single mutation but in a variety of tumor types, or umbrella trials, that have many different arms within one trial. These designs come with different statistical considerations. Though the designs of clinical programs may vary, the platforms for presenting data are otherwise no different, and the same healthy dose of skepticism, if not more given the smaller trial sizes, should be applied to the interpretation of those results from press releases up to and including journal articles.

[1] https://www.sciencedirect.com/science/article/pii/S2451865416301132