Show simple item record

dc.contributor.authorRingoot, Evelyne
dc.contributor.authorAlomairy, Rabab
dc.contributor.authorChuravy, Valentin
dc.contributor.authorEdelman, Alan
dc.date.accessioned2026-01-13T19:27:39Z
dc.date.available2026-01-13T19:27:39Z
dc.date.issued2025-12-20
dc.identifier.isbn979-8-4007-2074-1
dc.identifier.urihttps://hdl.handle.net/1721.1/164525
dc.descriptionICPP ’25, San Diego, CA, USAen_US
dc.description.abstractThis paper presents a portable, GPU-accelerated implementation of a QR-based singular value computation algorithm in Julia. The singular value decomposition (SVD) is a fundamental numerical tool in scientific computing and machine learning, providing optimal low-rank matrix approximations. Its importance has increased even more in large-scale machine learning pipelines, including large language models (LLMs), where it enables low-rank adaptation (LoRA). The implemented algorithm is based on the classic two-stage QR reduction, consisting of successive matrix reduction to band form and bidiagonal form. Our implementation leverages Julia’s multiple dispatch and metaprogramming capabilities, integrating with the GPUArrays and KernelAbstractions frameworks to provide a unified type and hardware-agnostic function. It supports diverse GPU architectures and data types, and is, to our knowledge, the first GPU-accelerated singular value implementation to support Apple Metal GPUs and half precision. Performance results on multiple GPU backends and data types demonstrate that portability does not require sacrificing performance: the unified function outperforms most linear algebra libraries (MAGMA, SLATE, rocSOLVER, oneMKL) for matrix sizes larger than 1024 × 1024, and achieves 80%-90% of the performance of cuSOLVER for large matrices.en_US
dc.publisherACM|54th International Conference on Parallel Processingen_US
dc.relation.isversionofhttps://doi.org/10.1145/3754598.3754667en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titlePerformant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precisionen_US
dc.typeArticleen_US
dc.identifier.citationThis paper presents a portable, GPU-accelerated implementation of a QR-based singular value computation algorithm in Julia. The singular value decomposition (SVD) is a fundamental numerical tool in scientific computing and machine learning, providing optimal low-rank matrix approximations. Its importance has increased even more in large-scale machine learning pipelines, including large language models (LLMs), where it enables low-rank adaptation (LoRA). The implemented algorithm is based on the classic two-stage QR reduction, consisting of successive matrix reduction to band form and bidiagonal form. Our implementation leverages Julia’s multiple dispatch and metaprogramming capabilities, integrating with the GPUArrays and KernelAbstractions frameworks to provide a unified type and hardware-agnostic function. It supports diverse GPU architectures and data types, and is, to our knowledge, the first GPU-accelerated singular value implementation to support Apple Metal GPUs and half precision. Performance results on multiple GPU backends and data types demonstrate that portability does not require sacrificing performance: the unified function outperforms most linear algebra libraries (MAGMA, SLATE, rocSOLVER, oneMKL) for matrix sizes larger than 1024 × 1024, and achieves 80%-90% of the performance of cuSOLVER for large matrices.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2026-01-01T08:49:20Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2026-01-01T08:49:20Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record