Now showing items 1-5 of 5

    • Feature learning in deep classifiers through Intermediate Neural Collapse 

      Rangamani, Akshay; Lindegaard, Marius; Galanti, Tomer; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2023-02-27)
      In this paper, we conduct an empirical study of the feature learning process in deep classifiers. Recent research has identified a training phenomenon called Neural Collapse (NC), in which the top-layer feature embeddings ...
    • The Janus effects of SGD vs GD: high noise and low rank 

      Xu, Mengjia; Galanti, Tomer; Rangamani, Akshay; Rosasco, Lorenzo; Poggio, Tomaso (2023-12-21)
      It was always obvious that SGD has higher fluctuations at convergence than GD. It has also been often reported that SGD in deep RELU networks has a low-rank bias in the weight matrices. A recent theoretical analysis linked ...
    • Norm-Based Generalization Bounds for Compositionally Sparse Neural Network 

      Galanti, Tomer; Xu, Mengjia; Galanti, Liane; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2023-02-14)
      In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs. We prove generalization bounds for multilayered sparse ReLU neural networks, ...
    • SGD and Weight Decay Provably Induce a Low-Rank Bias in Deep Neural Networks 

      Galanti, Tomer; Siegel, Zachary; Gupte, Aparna; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2023-02-14)
      In this paper, we study the bias of Stochastic Gradient Descent (SGD) to learn low-rank weight matrices when training deep ReLU neural networks. Our results show that training neural networks with mini-batch SGD and weight ...
    • SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks 

      Galanti, Tomer; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2022-03-28)
      We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight decay. We prove that the source of the SGD noise is an implicit low rank constraint across all of the weight matrices within ...