Statistics in high dimensions without IID samples : truncated statistics and minimax optimization

Zampetakis, Emmanouil.

dc.contributor.advisor	Constantinos Daskalakis.	en_US
dc.contributor.author	Zampetakis, Emmanouil.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2021-01-06T20:18:19Z
dc.date.available	2021-01-06T20:18:19Z
dc.date.copyright	2020	en_US
dc.date.issued	2020	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/129316
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020	en_US
dc.description	Cataloged from student-submitted PDF of thesis.	en_US
dc.description	Includes bibliographical references (pages 367-380).	en_US
dc.description.abstract	A key assumption underlying many methods in Statistics and Machine Learning is that data points are independently and identically distributed (i.i.d.). However, this assumption ignores the following recurring challenges in data collection: (1) censoring - truncation of data, (2) strategic behavior. In this thesis we provide a mathematical and computational theory for designing solutions or proving impossibility results related to challenges (1) and (2). For the classical statistical problem of estimation from truncated samples, we provide the first statistically and computationally efficient algorithms that provably recover an estimate of the whole non-truncated population. We design algorithms both in the parametric setting, e.g. Gaussian Estimation, Linear Regression, and in the non-parametric setting. In the non-parametric setting, we provide techniques that bound the extrapolation error of multi-variate polynomial log densities. Our main tool for the non-parametric setting is a Statistical Taylor Theorem that is based on sample access from some probability distribution with smooth probability density function. We believe that this theorem can have applications beyond the topic of Truncated Statistics. In the context of learning from strategic behavior, we consider the problem of min-max optimization, which is a central problem in Game Theory and Statistics and which has recently found interesting applications in Machine Learning tasks such as Generative Adversarial Networks. Our first result is the PPAD-hardness of minmax optimization which implies the first exponential separation between finding an approximate local minimum and finding an approximate local min-max solution. Our second result is a second-order algorithm that provably asymptotically converges to an approximate local min-max solution. Due to our PPAD-hardness result an algorithm with stronger guarantee is out of reach.	en_US
dc.description.statementofresponsibility	by Emmanouil Zampetakis.	en_US
dc.format.extent	380 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Statistics in high dimensions without IID samples : truncated statistics and minimax optimization	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	1227782503	en_US
dc.description.collection	Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science	en_US
dspace.imported	2021-01-06T20:18:18Z	en_US
mit.thesis.degree	Doctoral	en_US
mit.thesis.department	EECS	en_US

Files in this item

Name:: 1227782503-MIT.pdf
Size:: 4.239Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record