Show simple item record

dc.contributor.advisorMansinghka, Vikash K.
dc.contributor.authorChung, Karen
dc.date.accessioned2026-02-12T17:12:42Z
dc.date.available2026-02-12T17:12:42Z
dc.date.issued2025-09
dc.date.submitted2025-09-15T14:56:31.150Z
dc.identifier.urihttps://hdl.handle.net/1721.1/164823
dc.description.abstractGPU-compatible probabilistic programming languages (PPLs) have enabled high-performance, data-parallel programmable inference. However, these systems face fundamental trade-offs between expressiveness and performance, as their GPU code generation is automated and black-boxed, limiting optimization opportunities and imposing restrictions on program expressivity. This thesis introduces GenCUDA, a probabilistic programming system that addresses this limitation by embedding the CUDA GPU programming language directly into a C++/CUDA frontend, enabling GPU programmable inference with fine-grained control over runtime and memory profiles. GenCUDA extends the Gen probabilistic programming architecture by providing a dynamic modeling language (DML) that allows users to write performance-critical sections of generative functions as CUDA kernels while maintaining automatic trace management and the generative function interface (GFI). The system supports both sequential and parallel execution contexts through specialized effect handlers that seamlessly compose CPU and GPU code paths. Key technical contributions include: (1) a high-performance GPU distributions library achieving 10-100× speedups over TensorFlow-Probability, (2) memory-efficient trace management via template-optimized parallel effect handlers, and (3) vectorized generative functions that enable massive parallelization of inference algorithms. We demonstrate GenCUDA’s capabilities through comprehensive benchmarks on inference algorithms applied to diverse models including factor graphs, mixture models, and Hidden Markov Models. Results show significant performance improvements over JAX-based implementations: up to 3× speedup for importance sampling on a hierarchical model, 5.7× speedup for parallel Gibbs sampling on factor graphs, and memory efficiency improvements for large-scale mixture models supporting up to 6× as many clusters compared to existing frameworks’ limits. The system maintains the composability and expressiveness of probabilistic programming while unlocking GPU performance optimization techniques such as kernel fusion and memory hierarchy exploitation that are inaccessible to higher-level frameworks. GenCUDA demonstrates that embedding low-level GPU programming within automated probabilistic inference workflows can achieve both performance gains and algorithmic expressivity without sacrificing the modularity of probabilistic programming paradigms.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleProbabilistic Programming with Low-Level, High-Performance GPU Programmable Inference
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record