MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Accelerating Autonomous Molecular Discovery through Automated Quantum Chemistry and Artificial Intelligence

Author(s)
Wu, Haoyang (Oscar)
Thumbnail
DownloadThesis PDF (55.99Mb)
Advisor
Green, William H.
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
The grand challenges in medicine, energy, and materials science are fundamentally molecular discovery problems. However, the vastness of chemical space renders traditional experimental exploration inefficient and insufficient. Autonomous Molecular Discovery promises to accelerate this process by integrating artificial intelligence (AI), computation, and automation in chemistry, but it faces a critical trilemma: balancing accuracy, speed, and scalability. This thesis documents a systematic effort to alleviate this tension by developing and integrating novel computational frameworks that synergize the first-principles rigor of quantum mechanics (QM) with the predictive efficiency of machine learning (ML) and the scalable automation enabled by AI. This thesis began by focusing on developing the first ab initio kinetic models for the liquid-phase oxidative degradation of Active Pharmaceutical Ingredients. This demonstrates the feasibility and predictive power of automated mechanistic modeling in complex chemical environments, and highlights the acute need for more accurate thermochemical and kinetic data to handle real-world complexity. To address this, we developed a framework for computing systematic thermochemical corrections, and conducted an extensive benchmark of 284 model chemistries, establishing protocols to efficiently achieve chemical accuracy (~1 kcal/mol) from QM simulations. Recognizing the limitations of speed and data scarcity, we engineered physics-informed ML architectures, notably the QM-GNN, which fuses Graph Neural Networks (GNN) with QM descriptors. This approach significantly improves predictive performance and data efficiency, particularly for reaction regioselectivity in low-data regimes. Finally, to deploy these advances at scale, we designed QuantumPioneer, an automated, high-throughput platform for generating large-scale, high-fidelity QM thermo-kinetic datasets. This platform has produced an extensive database for oxidation reactions, enabling the development of novel ML models for predicting molecular stability and solvation energies. Collectively, this thesis provides a cohesive framework for accelerating molecular discovery, demonstrating that the strategic integration of first-principles simulation and data-driven intelligence can overcome key bottlenecks hindering autonomous chemical design and discovery.
Date issued
2026-02
URI
https://hdl.handle.net/1721.1/165527
Department
Massachusetts Institute of Technology. Department of Chemical Engineering; Massachusetts Institute of Technology. Center for Computational Science and Engineering
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.