MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • Center for Brains, Minds & Machines
  • Publications
  • CBMM Memo Series
  • View Item
  • DSpace@MIT Home
  • Center for Brains, Minds & Machines
  • Publications
  • CBMM Memo Series
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

The Janus effects of SGD vs GD: high noise and low rank

Author(s)
Xu, Mengjia; Galanti, Tomer; Rangamani, Akshay; Rosasco, Lorenzo; Poggio, Tomaso
Thumbnail
DownloadCBMM-Memo-144.pdf (2.849Mb)
Metadata
Show full item record
Abstract
It was always obvious that SGD has higher fluctuations at convergence than GD. It has also been often reported that SGD in deep RELU networks has a low-rank bias in the weight matrices. A recent theoretical analysis linked SGD noise with the low-rank bias induced by the SGD updates associated with small minibatch sizes [1]. In this paper, we provide an empirical and theoretical analysis of the convergence of SGD vs GD, first for deep RELU networks and then for the case of linear regression, where sharper estimates can be obtained and which is of independent interest. In the linear case, we prove that the components of the matrix W corresponding to the null space of the data matrix X converges to zero for both SGD and GD, provided the regularization term is non-zero (in the case of square loss; for exponential loss the result holds independently of regularization). The convergence rate, however, is exponential for SGD, and linear for GD. Thus SGD has a much stronger bias than GD towards solutions for weight matrices W with high fluctuations and low rank, provided the initialization is from a random matrix (but not if W is initialized as a zero matrix). Thus SGD under exponential loss, or under the square loss with non-zero regularization, shows the coupled phenomenon of low rank and asymptotic noise.
Date issued
2023-12-21
URI
https://hdl.handle.net/1721.1/153227
Series/Report no.
CBMM Memo;144

Collections
  • CBMM Memo Series

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.