MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Automated and Provable Privatization for Black-Box Processing

Author(s)
Xiao, Hanshen
Thumbnail
DownloadThesis PDF (4.460Mb)
Advisor
Devadas, Srinivas
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
This thesis initiates a study on universal leakage quantification and automated privacy-preserving solutions. To minimize assumptions on leakage generation and symbiotically accommodate cutting-edge advances in both algorithms and their implementations, a framework is established that models leakage as the output of a black-box processing function and produces rigorous privacy analysis based entirely on end-to-end simulation. At a high level, we demonstrate the following results: Given access to the underlying black-box secret generation, through mechanized evaluations of the black-box processing function, the hardness of adversarial inference can be provably quantified and controlled through properly selected perturbations. The detailed contributions can be summarized from three perspectives: a). Privacy Definition: We propose a new and semantic notion, called ProbablyApproximately-Correct (PAC) Privacy. This concept describes privacy intuitively as an impossible inference task for a computationally-unbounded adversary and supports expression of a universal privacy concern that is accessible to a general audience. b). Black-Box Leakage Quantification: We introduce randomization optimization and noise smoothing tricks and develop a set of information-theoretical tools based on f-divergence to characterize privacy risk through a statistical mean estimation. Provided sufficient sampling, one can approach this objective risk bound arbitrarily closely, which thus leads to a high confidence proof. The established theory also connects algorithmic stability and generalization error, demonstrating win-win situations in machine learning that simultaneously improve PAC Privacy and learning performance. c). Automated Privacy-Preserving Solutions: Theoretically, we characterize the tradeoff between required privacy guarantees (privacy budget), approximation error of the optimal perturbation strategy (utility loss), and simulation budget (computation power) to automatically construct a perturbation-based privacy solution from black-box evaluations. Operationally, we establish a series of tools to efficiently optimize the noise distribution in high-dimensional or constrained support spaces, and study their online versions with adversarially-adaptive composition. Concrete applications are presented, ranging from formal privacy proof for heuristic obfuscations, to privacy-preserving statistical learning, to response privacy in deep learning with vision models and large language models (LLM), such as ResNet and GPT-2, and hardware security, such as side-channel cache-timing leakage control.
Date issued
2024-09
URI
https://hdl.handle.net/1721.1/159202
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.