MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Probabilistic Programming with Low-Level, High-Performance GPU Programmable Inference

Author(s)
Chung, Karen
Thumbnail
DownloadThesis PDF (4.270Mb)
Advisor
Mansinghka, Vikash K.
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
GPU-compatible probabilistic programming languages (PPLs) have enabled high-performance, data-parallel programmable inference. However, these systems face fundamental trade-offs between expressiveness and performance, as their GPU code generation is automated and black-boxed, limiting optimization opportunities and imposing restrictions on program expressivity. This thesis introduces GenCUDA, a probabilistic programming system that addresses this limitation by embedding the CUDA GPU programming language directly into a C++/CUDA frontend, enabling GPU programmable inference with fine-grained control over runtime and memory profiles. GenCUDA extends the Gen probabilistic programming architecture by providing a dynamic modeling language (DML) that allows users to write performance-critical sections of generative functions as CUDA kernels while maintaining automatic trace management and the generative function interface (GFI). The system supports both sequential and parallel execution contexts through specialized effect handlers that seamlessly compose CPU and GPU code paths. Key technical contributions include: (1) a high-performance GPU distributions library achieving 10-100× speedups over TensorFlow-Probability, (2) memory-efficient trace management via template-optimized parallel effect handlers, and (3) vectorized generative functions that enable massive parallelization of inference algorithms. We demonstrate GenCUDA’s capabilities through comprehensive benchmarks on inference algorithms applied to diverse models including factor graphs, mixture models, and Hidden Markov Models. Results show significant performance improvements over JAX-based implementations: up to 3× speedup for importance sampling on a hierarchical model, 5.7× speedup for parallel Gibbs sampling on factor graphs, and memory efficiency improvements for large-scale mixture models supporting up to 6× as many clusters compared to existing frameworks’ limits. The system maintains the composability and expressiveness of probabilistic programming while unlocking GPU performance optimization techniques such as kernel fusion and memory hierarchy exploitation that are inaccessible to higher-level frameworks. GenCUDA demonstrates that embedding low-level GPU programming within automated probabilistic inference workflows can achieve both performance gains and algorithmic expressivity without sacrificing the modularity of probabilistic programming paradigms.
Date issued
2025-09
URI
https://hdl.handle.net/1721.1/164823
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.