Algorithms and lower bounds in the streaming and sparse recovery models

Do Ba, Khanh

dc.contributor.advisor	Piotr Indyk.	en_US
dc.contributor.author	Do Ba, Khanh	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2012-12-13T18:47:27Z
dc.date.available	2012-12-13T18:47:27Z
dc.date.copyright	2012	en_US
dc.date.issued	2012	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/75629
dc.description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (p. 52-56).	en_US
dc.description.abstract	In the data stream computation model, input data is given to us sequentially (the data stream), and our goal is to compute or approximate some function or statistic on that data using a sublinear (in both the length of the stream and the size of the universe of items that can appear in the stream) amount of space; in particular, we can store neither the entire stream nor a counter for each possible item we might see. In the sparse recovery model (also known as compressed sensing), input data is a large but sparse vector x [epsilon] Rn, and our goal is to design an m x n matrix [Phi]D, where m << n, such that for any sufficiently sparse x we can efficiently recover a good approximation of x from [Phi]x. Although at first glance these two models may seem quite different, they are in fact intimately related. In the streaming model, most statistics of interest are order-invariant, meaning they care only about the frequency of each item in the stream and not their position. For these problems, the data in the stream can be viewed as an n-dimensional vector x, where xi is the number of occurrences of item i. Using this representation, one of the high-level tools that have proven most popular has been the linear sketch, where for some m x n matrix {Phi]we maintain {Phi]x (the sketch) for the partial vector x as we progress along the stream. The linearity of the mapping D allows us to efficiently do incremental updates on our sketch, and as in its use in sparse recovery, the linear sketch turns out to be surprisingly powerful. In this thesis, we try to answer some questions of interest in each model, illustrating both the power and the limitations of the linear sketch. In Chapter 2, we provide an efficient sketch for estimating the (planar) Earth-Mover Distance (EMD) between two multisets of points. The EMD between point sets A, B R2 of the same size is defined as the minimum cost of a perfect matching between them, with each edge contributing a cost equal to its (Euclidean) length. As immediate consequences, we give an improved algorithm for estimating EMD between point sets given over a stream, and an improved algorithm for the approximate nearest neighbor problem under EMD. In Chapter 3, we prove tight lower bounds for sparse recovery in the number of rows in the matrix [Phi] (i.e., the number of measurements) in order to achieve any of the three most studied recovery guarantees. Specifically, consider a matrix [Phi] and an algorithm R such that for any signal x, R can recover an approximation & from [Phi] satisfying ... where (1) p= q= 1 and C= O(1), (2) p= q= 2 and C = O(1), or (3) p =2, q = 1 and C = O(k-1/ 2 ). We show that any such [Phi] I must have at least [Omega](k log(n/k)) rows. This is known to be optimal in cases (1) and (2), and near optimal for (3). In Chapter 4, we propose a variant of sparse recovery that incorporates some additional knowledge about the signal that allows the above lower bound to be broken. In particular, we consider the scenario where, after measurements are taken, we are given a set S of size s < n (s is known beforehand) that is supposed to contain most of the "large" coefficients of x. The goal is then to recover i satisfying ... We refer to this formulation as the sparse recovery with partial support knowledge problem (SRPSK). We focus on the guarantees where p = q = 1 or 2 and C = 1 + e, for which we provide lower bounds as well as a method of converting algorithms for "standard" sparse recovery into ones for SRPSK. We also make use of one of the reductions to give an optimal algorithm for SRPSK for the guarantee where p = q = 2.	en_US
dc.description.statementofresponsibility	by Khanh Do Ba.	en_US
dc.format.extent	56 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Algorithms and lower bounds in the streaming and sparse recovery models	en_US
dc.title.alternative	Algorithms and lower bounds for stream computation and sparse recovery	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	818192834	en_US

Files in this item

Name:: 818192834-MIT.pdf
Size:: 3.548Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record