Methods for Latent Space Interpretation via In-the-loop Fine-Tuning

Wen, Collin

Author(s)

Wen, Collin

DownloadThesis PDF (6.410Mb)

Advisor

Lippman, Andrew B.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

With language models increasing exponentially in scale, being able to interpret and justify model outputs is an area of increasing interest. Although enhancing the performance of these models in chat mediums has been the focus of interaction with AI, the visualization of model latent space offers a novel modality of interpreting information. Embedding models have traditionally served as a means of retrieving relevant information to a topic by converting text into a high-dimensional vector. The high-dimensional vector spaces created via embedding offer a way to encode information that captures similarities and differences in ideas, and visualizing these nuances in terms of meaningful dimensions can offer novel insights into the specific qualities that make two item similar. Leveraging fine-tuning mechanisms, dimension reduction algorithms and Sparse Autoencoders (SAEs), this work surveys state-of-the-art techniques to visualize the latent space in highly interpretable dimensions. ConceptAxes, derived from these techniques, is a framework is provided to produce axes that can capture high-level ideas that are ingrained into embedding models. ConceptAxes with highly interpretable dimensions allow for better justification for the latent space and clusters. This method of increasing embedding transparency proves valuable in various domains: (1) AI-enhanced creative exploration can be more guided and customized for a particular experience and (2) high-level insights can be made more intuitive with vast text datasets.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162996

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses