Automated and Provable Privatization for Black-Box Processing
Author(s)
Xiao, Hanshen
DownloadThesis PDF (4.460Mb)
Advisor
Devadas, Srinivas
Terms of use
Metadata
Show full item recordAbstract
This thesis initiates a study on universal leakage quantification and automated privacy-preserving solutions. To minimize assumptions on leakage generation and symbiotically accommodate cutting-edge advances in both algorithms and their implementations, a framework is established that models leakage as the output of a black-box processing function and produces rigorous privacy analysis based entirely on end-to-end simulation. At a high level, we demonstrate the following results: Given access to the underlying black-box secret generation, through mechanized evaluations of the black-box processing function, the hardness of adversarial inference can be provably quantified and controlled through properly selected perturbations. The detailed contributions can be summarized from three perspectives: a). Privacy Definition: We propose a new and semantic notion, called ProbablyApproximately-Correct (PAC) Privacy. This concept describes privacy intuitively as an impossible inference task for a computationally-unbounded adversary and supports expression of a universal privacy concern that is accessible to a general audience. b). Black-Box Leakage Quantification: We introduce randomization optimization and noise smoothing tricks and develop a set of information-theoretical tools based on f-divergence to characterize privacy risk through a statistical mean estimation. Provided sufficient sampling, one can approach this objective risk bound arbitrarily closely, which thus leads to a high confidence proof. The established theory also connects algorithmic stability and generalization error, demonstrating win-win situations in machine learning that simultaneously improve PAC Privacy and learning performance. c). Automated Privacy-Preserving Solutions: Theoretically, we characterize the tradeoff between required privacy guarantees (privacy budget), approximation error of the optimal perturbation strategy (utility loss), and simulation budget (computation power) to automatically construct a perturbation-based privacy solution from black-box evaluations. Operationally, we establish a series of tools to efficiently optimize the noise distribution in high-dimensional or constrained support spaces, and study their online versions with adversarially-adaptive composition. Concrete applications are presented, ranging from formal privacy proof for heuristic obfuscations, to privacy-preserving statistical learning, to response privacy in deep learning with vision models and large language models (LLM), such as ResNet and GPT-2, and hardware security, such as side-channel cache-timing leakage control.
Date issued
2024-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology