InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation

Wang, Yulin; Ni, Zanlin; Pu, Yifan; Zhou, Cai; Ying, Jixuan; Song, Shiji; Huang, Gao

dc.contributor.author	Wang, Yulin
dc.contributor.author	Ni, Zanlin
dc.contributor.author	Pu, Yifan
dc.contributor.author	Zhou, Cai
dc.contributor.author	Ying, Jixuan
dc.contributor.author	Song, Shiji
dc.contributor.author	Huang, Gao
dc.date.accessioned	2025-04-18T20:35:12Z
dc.date.available	2025-04-18T20:35:12Z
dc.date.issued	2024-12-11
dc.identifier.uri	https://hdl.handle.net/1721.1/159179
dc.description.abstract	End-to-end (E2E) training has become the de-facto standard for training modern deep networks, e.g., ConvNets and vision Transformers (ViTs). Typically, a global error signal is generated at the end of a model and back-propagated layer-by-layer to update the parameters. This paper shows that the reliance on back-propagating global errors may not be necessary for deep learning. More precisely, deep networks with a competitive or even better performance can be obtained by purely leveraging locally supervised learning, i.e., splitting a network into gradient-isolated modules and training them with local supervision signals. However, such an extension is non-trivial. Our experimental and theoretical analysis demonstrates that simply training local modules with an E2E objective tends to be short-sighted, collapsing task-relevant information at early layers, and hurting the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discarding task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm. We evaluate InfoPro extensively with ConvNets and ViTs, based on twelve computer vision benchmarks organized into five tasks (i.e., image/video recognition, semantic/instance segmentation, and object detection). InfoPro exhibits superior efficiency over E2E training in terms of GPU memory footprints, convergence speed, and training data scale. Moreover, InfoPro enables the effective training of more parameter- and computation-efficient models (e.g., much deeper networks), which suffer from inferior performance when trained in E2E. Code: https://github.com/blackfeather-wang/InfoPro-Pytorch .	en_US
dc.publisher	Springer US	en_US
dc.relation.isversionof	10.1007/s11263-024-02296-0	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Springer US	en_US
dc.title	InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation	en_US
dc.type	Article	en_US
dc.identifier.citation	Wang, Y., Ni, Z., Pu, Y. et al. InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation. Int J Comput Vis 133, 2752–2782 (2025).	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.relation.journal	International Journal of Computer Vision	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2025-04-18T03:40:05Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature
dspace.embargo.terms	Y
dspace.date.submission	2025-04-18T03:40:05Z
mit.journal.volume	133	en_US
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 11263_2024_2296_ReferencePDF.pdf
Size:: 3.143Mb
Format:: PDF
Embargoed until:: 2025-12-11

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record