Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
Author(s)
Mercado, Rocío; Kearnes, Steven M; Coley, Connor W
DownloadPublished version (1.934Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily imbalanced toward high-yielding reactions, which influences the types of models that can be successfully trained. In this Perspective, we analyze several data curation and sharing initiatives that have seen success in chemistry and molecular biology. We discuss several factors that have contributed to their success and how we can take lessons from these case studies and apply them to reaction data. Finally, we spotlight the Open Reaction Database and summarize key actions the community can take toward making reaction data more findable, accessible, interoperable, and reusable (FAIR), including the use of mandates from funding agencies and publishers.
Date issued
2023-07-24Department
Massachusetts Institute of Technology. Department of Chemical Engineering; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Journal of Chemical Information and Modeling
Publisher
American Chemical Society
Citation
Rocío Mercado, Steven M. Kearnes, and Connor W. Cole. Journal of Chemical Information and Modeling 2023 63 (14), 4253-4265.
Version: Final published version