dc.description.abstract | Materials discovery is critical for dealing with societal problems, but is a tedious process requiring substantial time and energy to accumulate knowledge. Computational techniques have accelerated understanding of material structure and properties, answering the question "What" materials to make for a specific application. These techniques have shifted the bottleneck in materials design to the synthesis and processing of materials, posing the question "How" to make a specified material. Zeolites are microporous, crystalline aluminosilicates described by this paradigm. Their relevance for chemical and "green" applications has led to sustained interest for many decades with substantial progress made in predicting hypothetical zeolites with databases of thousands of energetically favorable structures. However, only 255 of these structures have been synthesized and far fewer, approximately 20, are commercially viable pointing to synthesis as the major bottleneck in zeolite discovery and design. This thesis aims to improve the understanding of synthesisstructure relationships in zeolite materials through the use of data driven synthesis tools. It is guided by three questions: 1) How can zeolite synthesis data be automatically extracted on a large scale? 2) How can coupling of data-driven, first principles, and experimental approaches accelerate understanding of structure and processing relationships in zeolite materials? 3) In what ways can this data and discovered relationships be used to engineer improved zeolite materials?
Data driven synthesis planning requires large amounts of data to develop hypotheses about underlying trends and train machine learning (ML) models. The zeolite literature provides thousands of records of synthesis routes and the resulting zeolite structure but requires advanced information extraction techniques to obtain. This thesis utilizes and builds upon a natural language processing (NLP) pipeline to extract and format this data on realistic timescales. Algorithmic improvements for this pipeline along with additional components targeted specifically to unique linguistic components of zeolite literature are developed along with a researcher-computer interaction framework designed to optimize both extraction accuracy and efficiency by fixing mistakes made by the extraction algorithm. This extraction algorithm results in five, highly curated datasets related to zeolite synthesis representing the largest collection of zeolite synthesis routes to the author’s knowledge.
These datasets are used to study zeolite synthesis starting with organic structure directing agent (OSDA) design. Determining which OSDA molecule templates which zeolite structure is a difficult problem. The author extracts a dataset of known OSDA-zeolite pairs from the literature to study these relationships. Using an advanced featurization schemes for the OSDA, relationships between OSDAs and certain zeolite structures can be established. These relationships help answer thesis question two. A generative model is trained on the extracted data and validated through simulation to suggest potential OSDAs for a given zeolite structure providing tools to accelerate OSDA design addressing thesis question 3.
OSDAs are very important in zeolite formation but the rest of the hydrothermal variables also play a large role. This thesis utilizes failed experiment data to study the probability of zeolite crystallization and interprets the model results through Shapley values to determine impacts of specific hydrothermal synthesis variables. Using multi-fidelity data and Bayesian inference, zeolite crystallization curves are studied to determine nucleation and crystal growth behavior. Both of these tasks are done in pursuit of thesis question two. An additional generative model that predicts hydrothermal synthesis conditions given an OSDA-zeolite pair is developed presenting another tool to guide zeolite development looking to answer thesis question 3.
Finally, the thesis suggests high potential areas for future research and further exploration using the extracted data. It concludes with a brief commentary on the publication process and the necessity of data extraction. | |