Automated and Provable Privatization for Black-Box Processing

Xiao, Hanshen

Author(s)

Xiao, Hanshen

DownloadThesis PDF (4.460Mb)

Advisor

Devadas, Srinivas

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

This thesis initiates a study on universal leakage quantification and automated privacy-preserving solutions. To minimize assumptions on leakage generation and symbiotically accommodate cutting-edge advances in both algorithms and their implementations, a framework is established that models leakage as the output of a black-box processing function and produces rigorous privacy analysis based entirely on end-to-end simulation. At a high level, we demonstrate the following results: Given access to the underlying black-box secret generation, through mechanized evaluations of the black-box processing function, the hardness of adversarial inference can be provably quantified and controlled through properly selected perturbations. The detailed contributions can be summarized from three perspectives: a). Privacy Definition: We propose a new and semantic notion, called ProbablyApproximately-Correct (PAC) Privacy. This concept describes privacy intuitively as an impossible inference task for a computationally-unbounded adversary and supports expression of a universal privacy concern that is accessible to a general audience. b). Black-Box Leakage Quantification: We introduce randomization optimization and noise smoothing tricks and develop a set of information-theoretical tools based on f-divergence to characterize privacy risk through a statistical mean estimation. Provided sufficient sampling, one can approach this objective risk bound arbitrarily closely, which thus leads to a high confidence proof. The established theory also connects algorithmic stability and generalization error, demonstrating win-win situations in machine learning that simultaneously improve PAC Privacy and learning performance. c). Automated Privacy-Preserving Solutions: Theoretically, we characterize the tradeoff between required privacy guarantees (privacy budget), approximation error of the optimal perturbation strategy (utility loss), and simulation budget (computation power) to automatically construct a perturbation-based privacy solution from black-box evaluations. Operationally, we establish a series of tools to efficiently optimize the noise distribution in high-dimensional or constrained support spaces, and study their online versions with adversarially-adaptive composition. Concrete applications are presented, ranging from formal privacy proof for heuristic obfuscations, to privacy-preserving statistical learning, to response privacy in deep learning with vision models and large language models (LLM), such as ResNet and GPT-2, and hardware security, such as side-channel cache-timing leakage control.

Date issued

2024-09

URI

https://hdl.handle.net/1721.1/159202

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses