A physics-inspired approach to the understanding of molecular representations and models
Author(s)
Dicks, Luke; Graff, David E; Jordan, Kirk E; Coley, Connor W; Pyzer-Knapp, Edward O
DownloadPublished version (1.247Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
The story of machine learning in general, and its application to molecular design in particular, has been a tale of evolving representations of data. Understanding the implications of the use of a particular representation – including the existence of so-called ‘activity cliffs’ for cheminformatics models – is the key to their successful use for molecular discovery. In this work we present a physics-inspired methodology which exploits analogies between model response surfaces and energy landscapes to richly describe the relationship between the representation and the model. From these similarities, a metric emerges which is analogous to the commonly used frustration metric from the chemical physics community. This new property shows state-of-the-art prediction of model error, whilst belonging to a novel class of roughness measure that extends beyond the known data allowing the trivial identification of activity cliffs even in the absence of related training or evaluation data.
Date issued
2024Department
Massachusetts Institute of Technology. Department of Chemical Engineering; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Molecular Systems Design & Engineering
Publisher
Royal Society of Chemistry
Citation
Dicks, Luke, Graff, David E, Jordan, Kirk E, Coley, Connor W and Pyzer-Knapp, Edward O. 2024. "A physics-inspired approach to the understanding of molecular representations and models." Molecular Systems Design & Engineering, 9 (5).
Version: Final published version