Machine Learning and Biosecurity in the Age of Pandemics: Advancing Biological Research and Safeguarding Synthetic Biology
Author(s)
Siddiqui, Sameed Muneeb
DownloadThesis PDF (2.248Mb)
Advisor
Welsch, Roy
Collins, Jim
Terms of use
Metadata
Show full item recordAbstract
This thesis explores the dual imperatives of enhancing biosecurity and accelerating outbreak response. The research addresses two key areas. First, the thesis analyzes the implications of a national nucleic acid synthesis screening framework on outbreak response agility. A first-hand perspective is provided, identifying potential bottlenecks stemming from lagging customer verification and sequence screening approaches. Concrete solutions, such as pre-verification of first responders, priority processing channels, pre-approval of standard countermeasure sequences, and optimized computational screening, are proposed to mitigate these challenges and ensure rapid response capabilities without compromising biosecurity. Second, a machine learning architecture for biological sequence modeling, “Lyra” is presented. Lyra is grounded in the biological principle of epistasis and leverages state space models (SSMs) combined with projected gated convolutions to efficiently capture both local and long-range sequence interactions. We demonstrate new mathematical theory to connect SSMs with the approximation of polynomial functions - key to predicting epistatic effects. This subquadratic architecture achieves state-of-the-art performance on diverse biological tasks, including protein fitness landscape prediction, RNA function prediction, and CRISPR guide design, while utilizing substantially fewer parameters and computational resources than existing foundation models like transformers. The thesis concludes by highlighting the synergistic potential of advanced machine learning and thoughtful policy to significantly improve pandemic preparedness.
Date issued
2025-05Department
Sloan School of ManagementPublisher
Massachusetts Institute of Technology