Symmetries in Neural Network Functions and Parameters

Lim, Derek

Author(s)

Lim, Derek

DownloadThesis PDF (21.07Mb)

Advisor

Jegelka, Stefanie

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Modern neural networks are large, complex objects, which can be difficult to study and work with. In this thesis, I analyze and improve neural networks from the perspective of symmetries, with particular focus on function symmetries and parameter symmetries. Function symmetries are transformations of the input that lead to predictable changes in the output, which can be enforced in neural network architectures to improve performance on data with symmetry structures. Parameter symmetries are transformations of parameters that leave the underlying neural network function unchanged, and they have impacts on various empirical phenomena in neural networks. In Part I of this thesis, I focus on function symmetries, and develop new methods and analysis techniques for equivariant neural networks that have function symmetries baked into their architectures. I apply these techniques primarily on eigenvector-valued data, resulting in the first provably expressive neural network architectures that respect the symmetries of eigenvector data. In Part II, I focus on parameter-symmetries, and analyze their impact in various empirical phenomena of neural networks, as well as their impact in the open-weight ecosystem of models with publicly-shared parameters. In Part III, I consider both function and parameter symmetries to construct metanetworks: models that take in the parameters of other neural networks as input. Since the input to metanetworks are parameters, I develop metanetworks that are invariant or equivariant to the parameter symmetries of the input networks. All in all, my work shows that accounting for function and parameter symmetries is both theoretically and empirically beneficial across diverse types of data, learning tasks, neural network architectures, and other parts of the deep learning pipeline.

Date issued

2025-09

URI

https://hdl.handle.net/1721.1/165587

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses