Dataset Design for Building Models of Chemical Reactivity
Author(s)
Raghavan, Priyanka; Haas, Brittany C; Ruos, Madeline E; Schleinitz, Jules; Doyle, Abigail G; Reisman, Sarah E; Sigman, Matthew S; Coley, Connor W; ... Show more Show less
DownloadPublished version (2.400Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.
Date issued
2023-12-27Department
Massachusetts Institute of Technology. Department of Chemical Engineering; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
ACS Central Science
Publisher
American Chemical Society
Citation
Priyanka Raghavan, Brittany C. Haas, Madeline E. Ruos, Jules Schleinitz, Abigail G. Doyle, Sarah E. Reisman, Matthew S. Sigman, and Connor W. Coley. ACS Central Science 2023 9 (12), 2196-2204.
Version: Final published version