Building a drug development database: Challenges in reliable data availability


Policy and legislative efforts to improve the biomedical innovation process must rely on a detailed and thorough analysis of drug development and industry output.


As part of our efforts to build a publicly-available database on the characteristics of drug development, we present work undertaken to test methods for compiling data from public sources. These initial steps are designed to explore challenges in data extraction, completeness, and reliability. Specifically, filing dates for Investigational New Drugs (IND) applications with the U.S. Food and Drug Administration (FDA) were chosen as the initial objective data element to be collected.

Materials and methods

FDA’s [email protected] database and the Federal Register (FR) were used to collect IND dates for the 587 NMEs approved between 1994 and 2014. When available, the following data were captured: approval date, IND number, IND date, source of information.


At least one IND date was available for 445 (75.8%) of the 587 NMEs. The [email protected] database provided IND dates for 303 (51.6%) NMEs and the Federal Register contributed with 297 (50.6%) IND dates. Out of the 445 NMEs for which an IND date was obtained, 274 (61.6%) had more than one date reported.


Key finding of this paper is a considerable inconsistency in reliably available or reported data elements, in this particular case IND application filing dates as assembled from publicly-available sources.


Our team will continue to focus on finding ways to collect relevant information to measure impact of drug innovation.

Keywords: Biomedical innovation, Investigational New Drug, Food and Drug Administration, Federal Register, Drug development, Clinical trials,