Model Stealing for Any Low-Rank Language Model

Liu, Allen; Moitra, Ankur

dc.contributor.author	Liu, Allen
dc.contributor.author	Moitra, Ankur
dc.date.accessioned	2026-01-20T21:42:07Z
dc.date.available	2026-01-20T21:42:07Z
dc.date.issued	2025-06-15
dc.identifier.isbn	979-8-4007-1510-5
dc.identifier.uri	https://hdl.handle.net/1721.1/164607
dc.description	STOC ’25, Prague, Czechia	en_US
dc.description.abstract	Model stealing, where a learner tries to recover an unknown model via carefully chosen queries, is a critical problem in machine learning, as it threatens the security of proprietary models and the privacy of data they are trained on. In recent years, there has been particular interest in stealing large language models (LLMs). In this paper, we aim to build a theoretical understanding of stealing language models by studying a simple and mathematically tractable setting. We study model stealing for Hidden Markov Models (HMMs), and more generally low-rank language models. We assume that the learner works in the conditional query model, introduced by Kakade, Krishnamurthy, Mahajan and Zhang. Our main result is an efficient algorithm in the conditional query model, for learning any low-rank distribution. In other words, our algorithm succeeds at stealing any language model whose output distribution is low-rank. This improves upon the previous result which also requires the unknown distribution to have high “fidelity” – a property that holds only in restricted cases. There are two key insights behind our algorithm: First, we represent the conditional distributions at each timestep by constructing barycentric spanners among a collection of vectors of exponentially large dimension. Second, for sampling from our representation, we iteratively solve a sequence of convex optimization problems that involve projection in relative entropy to prevent compounding of errors over the length of the sequence. This is an interesting example where, at least theoretically, allowing a machine learning model to solve more complex problems at inference time can lead to drastic improvements in its performance.	en_US
dc.publisher	ACM\|Proceedings of the 57th Annual ACM Symposium on Theory of Computing	en_US
dc.relation.isversionof	https://doi.org/10.1145/3717823.3718220	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	Model Stealing for Any Low-Rank Language Model	en_US
dc.type	Article	en_US
dc.identifier.citation	Allen Liu and Ankur Moitra. 2025. Model Stealing for Any Low-Rank Language Model. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing (STOC '25). Association for Computing Machinery, New York, NY, USA, 1755–1761.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Mathematics	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-08-01T08:41:48Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-08-01T08:41:48Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3717823.3718220.pdf
Size:: 727.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record