Methods for Latent Space Interpretation via In-the-loop Fine-Tuning

Wen, Collin

dc.contributor.advisor	Lippman, Andrew B.
dc.contributor.author	Wen, Collin
dc.date.accessioned	2025-10-06T17:38:45Z
dc.date.available	2025-10-06T17:38:45Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-23T14:04:12.476Z
dc.identifier.uri	https://hdl.handle.net/1721.1/162996
dc.description.abstract	With language models increasing exponentially in scale, being able to interpret and justify model outputs is an area of increasing interest. Although enhancing the performance of these models in chat mediums has been the focus of interaction with AI, the visualization of model latent space offers a novel modality of interpreting information. Embedding models have traditionally served as a means of retrieving relevant information to a topic by converting text into a high-dimensional vector. The high-dimensional vector spaces created via embedding offer a way to encode information that captures similarities and differences in ideas, and visualizing these nuances in terms of meaningful dimensions can offer novel insights into the specific qualities that make two item similar. Leveraging fine-tuning mechanisms, dimension reduction algorithms and Sparse Autoencoders (SAEs), this work surveys state-of-the-art techniques to visualize the latent space in highly interpretable dimensions. ConceptAxes, derived from these techniques, is a framework is provided to produce axes that can capture high-level ideas that are ingrained into embedding models. ConceptAxes with highly interpretable dimensions allow for better justification for the latent space and clusters. This method of increasing embedding transparency proves valuable in various domains: (1) AI-enhanced creative exploration can be more guided and customized for a particular experience and (2) high-level insights can be made more intuitive with vast text datasets.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Methods for Latent Space Interpretation via In-the-loop Fine-Tuning
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: wen-wenc-meng-eecs-2025-thesis.pdf
Size:: 6.410Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record