Biomedical research has always been a data-intensive endeavor, leading (more than a decade ago) to the entire field of Bioinformatics. However, the amount of information was much more manageable just a decade ago. Today’s researchers not only have to stay abreast of the latest publications in their fields, but they also increasingly need to use existing data to help generate or test hypotheses in silico, compare their data against reference or benchmark data, and contribute their own data to various ‘commons’ to help health sciences move faster and be more easily reproducible.
Data, software and systems (for example, analytical pipelines) are essential components of the ecosystem of contemporary biomedical and behavioral research. There are numerous databases focused, for example, on different communities, types of data or types of research. Whereas these databases may be indexed and searchable, the indices usually do not interconnect –– making it difficult to search for data across different research communities. Enabling more broadly focused searches for biomedical data was a key recommendation to the Director of the NIH, according to the report of the Data and Informatics Working Group [see attached article]. The work described herein was funded to provide the NIH with practical experience in fulfilling that recommendation.
As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, authors worked with an international community of researchers, service providers, and knowledge experts –– to develop and test a data index and search engine (which they are calling “DataMed”) that are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics — along with inter-operability and reusability — compose the four FAIR principles (FINDABLE, ACCESSIBLE, INTER-OPERABLE AND RE-USABLE) to facilitate knowledge discovery in today’s big data–intensive science landscape.
Nat Genet June 2o17; 49: 816–819