Search

Now showing items 1-10 of 22

Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

Liao, Qianli; Kawaguchi, Kenji; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-10-19)

We systematically explored a spectrum of normalization algorithms related to Batch Normalization (BN) and propose a generalized formulation that simultaneously solves two major limitations of BN: (1) online learning and ...

Theory IIIb: Generalization in Deep Networks

Poggio, Tomaso; Liao, Qianli; Miranda, Brando; Burbanski, Andrzej; Hidary, Jack (Center for Brains, Minds and Machines (CBMM), arXiv.org, 2018-06-29)

The general features of the optimization problem for the case of overparametrized nonlinear networks have been clear for a while: SGD selects with high probability global minima vs local minima. In the overparametrized ...

Classical generalization bounds are surprisingly tight for Deep Networks

Liao, Qianli; Miranda, Brando; Hidary, Jack; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2018-07-11)

Deep networks are usually trained and tested in a regime in which the training classification error is not a good predictor of the test error. Thus the consensus has been that generalization, defined as convergence of the ...

View-tolerant face recognition and Hebbian learning imply mirror-symmetric neural tuning to head orientation

Leibo, Joel Z.; Liao, Qianli; Freiwald, Winrich; Anselmi, Fabio; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-06-03)

The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and relatively robust against identity-preserving ...

Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

Liao, Qianli; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-04-12)

We discuss relations between Residual Networks (ResNet), Recurrent Neural Networks (RNNs) and the primate visual cortex. We begin with the observation that a shallow RNN is exactly equivalent to a very deep ResNet with ...

Musings on Deep Learning: Properties of SGD

Zhang, Chiyuan; Liao, Qianli; Rakhlin, Alexander; Sridharan, Karthik; Miranda, Brando; e.a. (Center for Brains, Minds and Machines (CBMM), 2017-04-04)

[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in ...

Unsupervised learning of clutter-resistant visual representations from natural videos

Liao, Qianli; Leibo, Joel Z; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2015-04-27)

Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning ...

Learning Real and Boolean Functions: When Is Deep Better Than Shallow

Mhaskar, Hrushikesh; Liao, Qianli; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-03-08)

We describe computational tasks - especially in vision - that correspond to compositional/hierarchical functions. While the universal approximation property holds both for hierarchical and shallow networks, we prove that ...

Theory II: Landscape of the Empirical Risk in Deep Learning

Poggio, Tomaso; Liao, Qianli (Center for Brains, Minds and Machines (CBMM), arXiv, 2017-03-30)

Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima. However, the practical observation is that, at least for the most successful Deep ...

Theory of Deep Learning III: explaining the non-overfitting puzzle

Poggio, Tomaso; Kawaguchi, Kenji; Liao, Qianli; Miranda, Brando; Rosasco, Lorenzo; e.a. (arXiv, 2017-12-30)

THIS MEMO IS REPLACED BY CBMM MEMO 90 A main puzzle of deep networks revolves around the absence of overfitting despite overparametrization and despite the large capacity demonstrated by zero training error on randomly ...