Show simple item record

dc.contributor.advisorKellis, Manolis
dc.contributor.authorGuo, Sophie J.
dc.date.accessioned2026-02-12T17:12:48Z
dc.date.available2026-02-12T17:12:48Z
dc.date.issued2025-09
dc.date.submitted2025-09-15T14:56:27.937Z
dc.identifier.urihttps://hdl.handle.net/1721.1/164825
dc.description.abstractAdvances in artificial intelligence (AI) and generative AI for representation learning have transformed our ability to model complex biological systems. Single-cell RNA sequencing (scRNA-seq) provides unprecedented resolution into cellular heterogeneity, offering a powerful substrate for modeling disease circuitry. However, predicting patient-level phenotypes from scRNA-seq remains challenging due to limited sample sizes, variable cell counts, and the computational burden of modeling long-context dependencies. We present scPhen, a flexible, parametric deep-learning framework for phenotype prediction from single-cell transcriptomic data, applied here to Alzheimer’s disease (AD) as a paradigm of complex, heterogeneous pathology. scPhen consists of a cell embedding module and a patient embedding module, designed to capture both fine-grained molecular patterns and higher-order cell–cell relationships. The framework supports multiple architectural backbones, including Transformers, Graph Neural Networks (GNNs), and state-space models such as Mamba, Mamba2, and BiMamba2, allowing exploration of tunable components for optimized performance. Across classification and regression tasks, state-space models, and in particular BiMamba2, demonstrated superior predictive accuracy and computational efficiency compared to Transformer-based and hybrid approaches. We further integrated attention-based multiple instance learning to enable variable cell counts per patient and to prioritize phenotype-informative cellular subsets. Interpretability analyses using Integrated Gradients and cell-level attention scores revealed gene programs and cell populations associated with AD progression, highlighting known neuroinflammatory signatures and suggesting novel molecular targets. By unifying cutting-edge sequence modeling architectures with scalable single-cell analysis, scPhen provides a generalizable, high-resolution approach to phenotype prediction. While demonstrated here in AD, this framework is readily extensible to other complex diseases and multi-modal cellular datasets, bridging computational innovation and biological discovery.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titlescPhen: Single-Cell Phenotype Predictor for Alzheimer’s Disease
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record