dc.contributor.advisor | Matei Zaharia. | en_US |
dc.contributor.author | Mahajan, Rohan | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2016-12-22T15:17:11Z | |
dc.date.available | 2016-12-22T15:17:11Z | |
dc.date.copyright | 2016 | en_US |
dc.date.issued | 2016 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/105977 | |
dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. | en_US |
dc.description | This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. | en_US |
dc.description | Cataloged from student-submitted PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (page 33). | en_US |
dc.description.abstract | Because most data processing systems are distributed in nature, data must be transferred between machines. Currently, Spark, a prominent such system, predetermines the strategies for shuffling this data, but in certain situations, different shuffle strategies would improve performance. We add functionality to track metrics about the data during the job and appropriately adapt the shuffle strategy. We show improvements in ShuffledRDD performance, joins using Spark's RDD interface, and joins in Spark SQL. | en_US |
dc.description.statementofresponsibility | by Rohan Mahajan. | en_US |
dc.format.extent | 33 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | Adaptive scheduling in Spark | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M. Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 965643791 | en_US |