Show simple item record

dc.contributor.advisorDevadas, Srinivas
dc.contributor.authorXiao, Hanshen
dc.date.accessioned2025-04-23T15:05:00Z
dc.date.available2025-04-23T15:05:00Z
dc.date.issued2024-09
dc.date.submitted2025-03-04T18:37:31.231Z
dc.identifier.urihttps://hdl.handle.net/1721.1/159202
dc.description.abstractThis thesis initiates a study on universal leakage quantification and automated privacy-preserving solutions. To minimize assumptions on leakage generation and symbiotically accommodate cutting-edge advances in both algorithms and their implementations, a framework is established that models leakage as the output of a black-box processing function and produces rigorous privacy analysis based entirely on end-to-end simulation. At a high level, we demonstrate the following results: Given access to the underlying black-box secret generation, through mechanized evaluations of the black-box processing function, the hardness of adversarial inference can be provably quantified and controlled through properly selected perturbations. The detailed contributions can be summarized from three perspectives: a). Privacy Definition: We propose a new and semantic notion, called ProbablyApproximately-Correct (PAC) Privacy. This concept describes privacy intuitively as an impossible inference task for a computationally-unbounded adversary and supports expression of a universal privacy concern that is accessible to a general audience. b). Black-Box Leakage Quantification: We introduce randomization optimization and noise smoothing tricks and develop a set of information-theoretical tools based on f-divergence to characterize privacy risk through a statistical mean estimation. Provided sufficient sampling, one can approach this objective risk bound arbitrarily closely, which thus leads to a high confidence proof. The established theory also connects algorithmic stability and generalization error, demonstrating win-win situations in machine learning that simultaneously improve PAC Privacy and learning performance. c). Automated Privacy-Preserving Solutions: Theoretically, we characterize the tradeoff between required privacy guarantees (privacy budget), approximation error of the optimal perturbation strategy (utility loss), and simulation budget (computation power) to automatically construct a perturbation-based privacy solution from black-box evaluations. Operationally, we establish a series of tools to efficiently optimize the noise distribution in high-dimensional or constrained support spaces, and study their online versions with adversarially-adaptive composition. Concrete applications are presented, ranging from formal privacy proof for heuristic obfuscations, to privacy-preserving statistical learning, to response privacy in deep learning with vision models and large language models (LLM), such as ResNet and GPT-2, and hardware security, such as side-channel cache-timing leakage control.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleAutomated and Provable Privatization for Black-Box Processing
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record