Show simple item record

dc.contributor.authorGoldwasser, Shafi
dc.contributor.authorShafer, Jonathan
dc.contributor.authorVafa, Neekon
dc.contributor.authorVaikuntanathan, Vinod
dc.date.accessioned2026-01-22T15:38:12Z
dc.date.available2026-01-22T15:38:12Z
dc.date.issued2025-06-15
dc.identifier.isbn979-8-4007-1510-5
dc.identifier.urihttps://hdl.handle.net/1721.1/164614
dc.descriptionSTOC ’25, Prague, Czechiaen_US
dc.description.abstractAs society grows more reliant on machine learning, ensuring the security of machine learning systems against sophisticated attacks becomes a pressing concern. A recent result of Goldwasser, Kim, Vaikuntanathan, and Zamir (FOCS ’22) shows that an adversary can plant undetectable backdoors in machine learning models, allowing the adversary to covertly control the model’s behavior. Backdoors can be planted in such a way that the backdoored machine learning model is computationally indistinguishable from an honest model without backdoors. In this paper, we present strategies for defending against backdoors in ML models, even if they are undetectable. The key observation is that it is sometimes possible to provably mitigate or even remove backdoors without needing to detect them, using techniques inspired by the notion of random self-reducibility. This depends on properties of the ground-truth labels (chosen by nature), and not of the proposed ML model (which may be chosen by an attacker). We give formal definitions for secure backdoor mitigation, and proceed to show two types of results. First, we show a “global mitigation” technique, which removes all backdoors from a machine learning model under the assumption that the ground-truth labels are close to a Fourier-heavy function. Second, we consider distributions where the ground-truth labels are close to a linear or polynomial function in ℝn. Here, we show “local mitigation” techniques, which remove backdoors with high probability for every input of interest, and are computationally cheaper than global mitigation. All of our constructions are black-box, so our techniques work without needing access to the model’s representation (i.e., its code or parameters). Along the way we prove a simple result for robust mean estimation.en_US
dc.publisherACM|Proceedings of the 57th Annual ACM Symposium on Theory of Computingen_US
dc.relation.isversionofhttps://doi.org/10.1145/3717823.3718245en_US
dc.rightsCreative Commons Attribution-Noncommercial-ShareAlikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleOblivious Defense in ML Models: Backdoor Removal without Detectionen_US
dc.typeArticleen_US
dc.identifier.citationShafi Goldwasser, Jonathan Shafer, Neekon Vafa, and Vinod Vaikuntanathan. 2025. Oblivious Defense in ML Models: Backdoor Removal without Detection. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing (STOC '25). Association for Computing Machinery, New York, NY, USA, 1785–1794.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2025-08-01T08:44:04Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-08-01T08:44:04Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record