| dc.description.abstract | Advances in artificial intelligence (AI) and generative AI for representation learning have transformed our ability to model complex biological systems. Single-cell RNA sequencing (scRNA-seq) provides unprecedented resolution into cellular heterogeneity, offering a powerful substrate for modeling disease circuitry. However, predicting patient-level phenotypes from scRNA-seq remains challenging due to limited sample sizes, variable cell counts, and the computational burden of modeling long-context dependencies. We present scPhen, a flexible, parametric deep-learning framework for phenotype prediction from single-cell transcriptomic data, applied here to Alzheimer’s disease (AD) as a paradigm of complex, heterogeneous pathology. scPhen consists of a cell embedding module and a patient embedding module, designed to capture both fine-grained molecular patterns and higher-order cell–cell relationships. The framework supports multiple architectural backbones, including Transformers, Graph Neural Networks (GNNs), and state-space models such as Mamba, Mamba2, and BiMamba2, allowing exploration of tunable components for optimized performance. Across classification and regression tasks, state-space models, and in particular BiMamba2, demonstrated superior predictive accuracy and computational efficiency compared to Transformer-based and hybrid approaches. We further integrated attention-based multiple instance learning to enable variable cell counts per patient and to prioritize phenotype-informative cellular subsets. Interpretability analyses using Integrated Gradients and cell-level attention scores revealed gene programs and cell populations associated with AD progression, highlighting known neuroinflammatory signatures and suggesting novel molecular targets. By unifying cutting-edge sequence modeling architectures with scalable single-cell analysis, scPhen provides a generalizable, high-resolution approach to phenotype prediction. While demonstrated here in AD, this framework is readily extensible to other complex diseases and multi-modal cellular datasets, bridging computational innovation and biological discovery. | |