dc.contributor.author | Wang, Yulin | |
dc.contributor.author | Ni, Zanlin | |
dc.contributor.author | Pu, Yifan | |
dc.contributor.author | Zhou, Cai | |
dc.contributor.author | Ying, Jixuan | |
dc.contributor.author | Song, Shiji | |
dc.contributor.author | Huang, Gao | |
dc.date.accessioned | 2025-04-18T20:35:12Z | |
dc.date.available | 2025-04-18T20:35:12Z | |
dc.date.issued | 2024-12-11 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/159179 | |
dc.description.abstract | End-to-end (E2E) training has become the de-facto standard for training modern deep networks, e.g., ConvNets and vision Transformers (ViTs). Typically, a global error signal is generated at the end of a model and back-propagated layer-by-layer to update the parameters. This paper shows that the reliance on back-propagating global errors may not be necessary for deep learning. More precisely, deep networks with a competitive or even better performance can be obtained by purely leveraging locally supervised learning, i.e., splitting a network into gradient-isolated modules and training them with local supervision signals. However, such an extension is non-trivial. Our experimental and theoretical analysis demonstrates that simply training local modules with an E2E objective tends to be short-sighted, collapsing task-relevant information at early layers, and hurting the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discarding task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm. We evaluate InfoPro extensively with ConvNets and ViTs, based on twelve computer vision benchmarks organized into five tasks (i.e., image/video recognition, semantic/instance segmentation, and object detection). InfoPro exhibits superior efficiency over E2E training in terms of GPU memory footprints, convergence speed, and training data scale. Moreover, InfoPro enables the effective training of more parameter- and computation-efficient models (e.g., much deeper networks), which suffer from inferior performance when trained in E2E. Code: https://github.com/blackfeather-wang/InfoPro-Pytorch . | en_US |
dc.publisher | Springer US | en_US |
dc.relation.isversionof | 10.1007/s11263-024-02296-0 | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | Springer US | en_US |
dc.title | InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Wang, Y., Ni, Z., Pu, Y. et al. InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation. Int J Comput Vis 133, 2752–2782 (2025). | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.relation.journal | International Journal of Computer Vision | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2025-04-18T03:40:05Z | |
dc.language.rfc3066 | en | |
dc.rights.holder | The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature | |
dspace.embargo.terms | Y | |
dspace.date.submission | 2025-04-18T03:40:05Z | |
mit.journal.volume | 133 | en_US |
mit.license | PUBLISHER_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |