Approximate k-means clustering through random projections
Author(s)
Persu, Elena-Mădălina
DownloadFull printable version (1.660Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Ankur Moitra.
Terms of use
Metadata
Show full item recordAbstract
Using random row projections, we show how to approximate a data matrix A with a much smaller sketch à that can be used to solve a general class of constrained k-rank approximation problems to within (1 + [epsilon]) error. Importantly, this class of problems includes k-means clustering. By reducing data points to just O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1+ [epsilon]) relative error results for random row projections which improve on the (2 + [epsilon]) prior known constant factor approximation associated with this sketching technique, while preserving the number of dimensions. For k-means clustering, we show how to achieve a (9 + [epsilon]) approximation by Johnson-Lindenstrauss projecting data points to just 0(log k/[epsilon]2 ) dimensions. This gives the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.
Description
Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. Cataloged from PDF version of thesis. Includes bibliographical references (pages 39-41).
Date issued
2015Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.