Metagradient Descent: Differentiating Large-Scale Training

Chen, Benjamin

dc.contributor.advisor	Mądry, Aleksander
dc.contributor.author	Chen, Benjamin
dc.date.accessioned	2025-09-18T14:29:04Z
dc.date.available	2025-09-18T14:29:04Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-23T14:01:20.449Z
dc.identifier.uri	https://hdl.handle.net/1721.1/162722
dc.description.abstract	A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Metagradient Descent: Differentiating Large-Scale Training
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: chen-benchen-meng-eecs-2025-th ...
Size:: 1.194Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record