Show simple item record

dc.contributor.authorPoggio, Tomaso
dc.contributor.authorKawaguchi, Kenji
dc.contributor.authorLiao, Qianli
dc.contributor.authorMiranda, Brando
dc.contributor.authorRosasco, Lorenzo
dc.contributor.authorBoix, Xavier
dc.contributor.authorHidary, Jack
dc.contributor.authorMhaskar, Hrushikesh
dc.date.accessioned2018-01-01T00:08:27Z
dc.date.available2018-01-01T00:08:27Z
dc.date.issued2017-12-30
dc.identifier.urihttp://hdl.handle.net/1721.1/113003
dc.description.abstractTHIS MEMO IS REPLACED BY CBMM MEMO 90 A main puzzle of deep networks revolves around the absence of overfitting despite overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamical systems associated with gradient descent minimization of nonlinear networks behave near zero stable minima of the empirical error as gradient system in a quadratic potential with degenerate Hessian. The proposition is supported by theoretical and numerical results, under the assumption of stable minima of the gradient. Our proposition provides the extension to deep networks of key properties of gradient descent methods for linear networks, that as, suggested in (1), can be the key to understand generalization. Gradient descent enforces a form of implicit regular- ization controlled by the number of iterations, and asymptotically converging to the minimum norm solution. This implies that there is usually an optimum early stopping that avoids overfitting of the loss (this is relevant mainly for regression). For classification, the asymptotic convergence to the minimum norm solution implies convergence to the maximum margin solution which guarantees good classification error for “low noise” datasets. The implied robustness to overparametrization has suggestive implications for the robustness of deep hierarchically local networks to variations of the architecture with respect to the curse of dimensionality.en_US
dc.description.sponsorshipThis work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216.en_US
dc.language.isoen_USen_US
dc.publisherarXiven_US
dc.relation.ispartofseriesCBMM Memo Series;073
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.titleTheory of Deep Learning III: explaining the non-overfitting puzzleen_US
dc.typeTechnical Reporten_US
dc.typeWorking Paperen_US
dc.typeOtheren_US
dc.identifier.citationarXiv:1801.00173en_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record