Show simple item record

dc.contributor.authorWang, Yulin
dc.contributor.authorNi, Zanlin
dc.contributor.authorPu, Yifan
dc.contributor.authorZhou, Cai
dc.contributor.authorYing, Jixuan
dc.contributor.authorSong, Shiji
dc.contributor.authorHuang, Gao
dc.date.accessioned2025-04-18T20:35:12Z
dc.date.available2025-04-18T20:35:12Z
dc.date.issued2024-12-11
dc.identifier.urihttps://hdl.handle.net/1721.1/159179
dc.description.abstractEnd-to-end (E2E) training has become the de-facto standard for training modern deep networks, e.g., ConvNets and vision Transformers (ViTs). Typically, a global error signal is generated at the end of a model and back-propagated layer-by-layer to update the parameters. This paper shows that the reliance on back-propagating global errors may not be necessary for deep learning. More precisely, deep networks with a competitive or even better performance can be obtained by purely leveraging locally supervised learning, i.e., splitting a network into gradient-isolated modules and training them with local supervision signals. However, such an extension is non-trivial. Our experimental and theoretical analysis demonstrates that simply training local modules with an E2E objective tends to be short-sighted, collapsing task-relevant information at early layers, and hurting the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discarding task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm. We evaluate InfoPro extensively with ConvNets and ViTs, based on twelve computer vision benchmarks organized into five tasks (i.e., image/video recognition, semantic/instance segmentation, and object detection). InfoPro exhibits superior efficiency over E2E training in terms of GPU memory footprints, convergence speed, and training data scale. Moreover, InfoPro enables the effective training of more parameter- and computation-efficient models (e.g., much deeper networks), which suffer from inferior performance when trained in E2E. Code: https://github.com/blackfeather-wang/InfoPro-Pytorch .en_US
dc.publisherSpringer USen_US
dc.relation.isversionof10.1007/s11263-024-02296-0en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceSpringer USen_US
dc.titleInfoPro: Locally Supervised Deep Learning by Maximizing Information Propagationen_US
dc.typeArticleen_US
dc.identifier.citationWang, Y., Ni, Z., Pu, Y. et al. InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation. Int J Comput Vis 133, 2752–2782 (2025).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journalInternational Journal of Computer Visionen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2025-04-18T03:40:05Z
dc.language.rfc3066en
dc.rights.holderThe Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature
dspace.embargo.termsY
dspace.date.submission2025-04-18T03:40:05Z
mit.journal.volume133en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record