Search

Now showing items 1-5 of 5

The Janus effects of SGD vs GD: high noise and low rank

Xu, Mengjia; Galanti, Tomer; Rangamani, Akshay; Rosasco, Lorenzo; Poggio, Tomaso (2023-12-21)

It was always obvious that SGD has higher fluctuations at convergence than GD. It has also been often reported that SGD in deep RELU networks has a low-rank bias in the weight matrices. A recent theoretical analysis linked ...

SGD and Weight Decay Provably Induce a Low-Rank Bias in Deep Neural Networks

Galanti, Tomer; Siegel, Zachary; Gupte, Aparna; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2023-02-14)

In this paper, we study the bias of Stochastic Gradient Descent (SGD) to learn low-rank weight matrices when training deep ReLU neural networks. Our results show that training neural networks with mini-batch SGD and weight ...

Feature learning in deep classifiers through Intermediate Neural Collapse

Rangamani, Akshay; Lindegaard, Marius; Galanti, Tomer; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2023-02-27)

In this paper, we conduct an empirical study of the feature learning process in deep classifiers. Recent research has identified a training phenomenon called Neural Collapse (NC), in which the top-layer feature embeddings ...

SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks

Galanti, Tomer; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2022-03-28)

We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight decay. We prove that the source of the SGD noise is an implicit low rank constraint across all of the weight matrices within ...

Norm-Based Generalization Bounds for Compositionally Sparse Neural Network

Galanti, Tomer; Xu, Mengjia; Galanti, Liane; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2023-02-14)

In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs. We prove generalization bounds for multilayered sparse ReLU neural networks, ...