Show simple item record

dc.contributor.authorZhang, Chiyuan
dc.contributor.authorLiao, Qianli
dc.contributor.authorRakhlin, Alexander
dc.contributor.authorMiranda, Brando
dc.contributor.authorGolowich, Noah
dc.contributor.authorPoggio, Tomaso
dc.date.accessioned2018-05-16T17:44:36Z
dc.date.available2018-05-16T17:44:36Z
dc.date.issued2017-12-27
dc.identifier.urihttp://hdl.handle.net/1721.1/115407
dc.description.abstractIn Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability - like the classical Langevin equation – on large volume, “flat” minima, selecting flat minimizers which are with very high probability also global minimizers.en_US
dc.description.sponsorshipThis work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 123 1216.en_US
dc.language.isoen_USen_US
dc.publisherCenter for Brains, Minds and Machines (CBMM)en_US
dc.relation.ispartofseriesCBMM Memo Series;072
dc.titleTheory of Deep Learning IIb: Optimization Properties of SGDen_US
dc.typeTechnical Reporten_US
dc.typeWorking Paperen_US
dc.typeOtheren_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record