Show simple item record

dc.contributor.advisorPiotr Indyk.en_US
dc.contributor.authorDo Ba, Khanhen_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2012-12-13T18:47:27Z
dc.date.available2012-12-13T18:47:27Z
dc.date.copyright2012en_US
dc.date.issued2012en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/75629
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (p. 52-56).en_US
dc.description.abstractIn the data stream computation model, input data is given to us sequentially (the data stream), and our goal is to compute or approximate some function or statistic on that data using a sublinear (in both the length of the stream and the size of the universe of items that can appear in the stream) amount of space; in particular, we can store neither the entire stream nor a counter for each possible item we might see. In the sparse recovery model (also known as compressed sensing), input data is a large but sparse vector x [epsilon] Rn, and our goal is to design an m x n matrix [Phi]D, where m << n, such that for any sufficiently sparse x we can efficiently recover a good approximation of x from [Phi]x. Although at first glance these two models may seem quite different, they are in fact intimately related. In the streaming model, most statistics of interest are order-invariant, meaning they care only about the frequency of each item in the stream and not their position. For these problems, the data in the stream can be viewed as an n-dimensional vector x, where xi is the number of occurrences of item i. Using this representation, one of the high-level tools that have proven most popular has been the linear sketch, where for some m x n matrix {Phi]we maintain {Phi]x (the sketch) for the partial vector x as we progress along the stream. The linearity of the mapping D allows us to efficiently do incremental updates on our sketch, and as in its use in sparse recovery, the linear sketch turns out to be surprisingly powerful. In this thesis, we try to answer some questions of interest in each model, illustrating both the power and the limitations of the linear sketch. In Chapter 2, we provide an efficient sketch for estimating the (planar) Earth-Mover Distance (EMD) between two multisets of points. The EMD between point sets A, B R2 of the same size is defined as the minimum cost of a perfect matching between them, with each edge contributing a cost equal to its (Euclidean) length. As immediate consequences, we give an improved algorithm for estimating EMD between point sets given over a stream, and an improved algorithm for the approximate nearest neighbor problem under EMD. In Chapter 3, we prove tight lower bounds for sparse recovery in the number of rows in the matrix [Phi] (i.e., the number of measurements) in order to achieve any of the three most studied recovery guarantees. Specifically, consider a matrix [Phi] and an algorithm R such that for any signal x, R can recover an approximation & from [Phi] satisfying ... where (1) p= q= 1 and C= O(1), (2) p= q= 2 and C = O(1), or (3) p =2, q = 1 and C = O(k-1/ 2 ). We show that any such [Phi] I must have at least [Omega](k log(n/k)) rows. This is known to be optimal in cases (1) and (2), and near optimal for (3). In Chapter 4, we propose a variant of sparse recovery that incorporates some additional knowledge about the signal that allows the above lower bound to be broken. In particular, we consider the scenario where, after measurements are taken, we are given a set S of size s < n (s is known beforehand) that is supposed to contain most of the "large" coefficients of x. The goal is then to recover i satisfying ... We refer to this formulation as the sparse recovery with partial support knowledge problem (SRPSK). We focus on the guarantees where p = q = 1 or 2 and C = 1 + e, for which we provide lower bounds as well as a method of converting algorithms for "standard" sparse recovery into ones for SRPSK. We also make use of one of the reductions to give an optimal algorithm for SRPSK for the guarantee where p = q = 2.en_US
dc.description.statementofresponsibilityby Khanh Do Ba.en_US
dc.format.extent56 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleAlgorithms and lower bounds in the streaming and sparse recovery modelsen_US
dc.title.alternativeAlgorithms and lower bounds for stream computation and sparse recoveryen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc818192834en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record