| dc.contributor.advisor | Mądry, Aleksander | |
| dc.contributor.author | Chen, Benjamin | |
| dc.date.accessioned | 2025-09-18T14:29:04Z | |
| dc.date.available | 2025-09-18T14:29:04Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-06-23T14:01:20.449Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/162722 | |
| dc.description.abstract | A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Metagradient Descent: Differentiating Large-Scale Training | |
| dc.type | Thesis | |
| dc.description.degree | M.Eng. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |