MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

scPhen: Single-Cell Phenotype Predictor for Alzheimer’s Disease

Author(s)
Guo, Sophie J.
Thumbnail
DownloadThesis PDF (6.509Mb)
Advisor
Kellis, Manolis
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Advances in artificial intelligence (AI) and generative AI for representation learning have transformed our ability to model complex biological systems. Single-cell RNA sequencing (scRNA-seq) provides unprecedented resolution into cellular heterogeneity, offering a powerful substrate for modeling disease circuitry. However, predicting patient-level phenotypes from scRNA-seq remains challenging due to limited sample sizes, variable cell counts, and the computational burden of modeling long-context dependencies. We present scPhen, a flexible, parametric deep-learning framework for phenotype prediction from single-cell transcriptomic data, applied here to Alzheimer’s disease (AD) as a paradigm of complex, heterogeneous pathology. scPhen consists of a cell embedding module and a patient embedding module, designed to capture both fine-grained molecular patterns and higher-order cell–cell relationships. The framework supports multiple architectural backbones, including Transformers, Graph Neural Networks (GNNs), and state-space models such as Mamba, Mamba2, and BiMamba2, allowing exploration of tunable components for optimized performance. Across classification and regression tasks, state-space models, and in particular BiMamba2, demonstrated superior predictive accuracy and computational efficiency compared to Transformer-based and hybrid approaches. We further integrated attention-based multiple instance learning to enable variable cell counts per patient and to prioritize phenotype-informative cellular subsets. Interpretability analyses using Integrated Gradients and cell-level attention scores revealed gene programs and cell populations associated with AD progression, highlighting known neuroinflammatory signatures and suggesting novel molecular targets. By unifying cutting-edge sequence modeling architectures with scalable single-cell analysis, scPhen provides a generalizable, high-resolution approach to phenotype prediction. While demonstrated here in AD, this framework is readily extensible to other complex diseases and multi-modal cellular datasets, bridging computational innovation and biological discovery.
Date issued
2025-09
URI
https://hdl.handle.net/1721.1/164825
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.