Show simple item record

dc.contributor.advisorLippman, Andrew B.
dc.contributor.authorWen, Collin
dc.date.accessioned2025-10-06T17:38:45Z
dc.date.available2025-10-06T17:38:45Z
dc.date.issued2025-05
dc.date.submitted2025-06-23T14:04:12.476Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162996
dc.description.abstractWith language models increasing exponentially in scale, being able to interpret and justify model outputs is an area of increasing interest. Although enhancing the performance of these models in chat mediums has been the focus of interaction with AI, the visualization of model latent space offers a novel modality of interpreting information. Embedding models have traditionally served as a means of retrieving relevant information to a topic by converting text into a high-dimensional vector. The high-dimensional vector spaces created via embedding offer a way to encode information that captures similarities and differences in ideas, and visualizing these nuances in terms of meaningful dimensions can offer novel insights into the specific qualities that make two item similar. Leveraging fine-tuning mechanisms, dimension reduction algorithms and Sparse Autoencoders (SAEs), this work surveys state-of-the-art techniques to visualize the latent space in highly interpretable dimensions. ConceptAxes, derived from these techniques, is a framework is provided to produce axes that can capture high-level ideas that are ingrained into embedding models. ConceptAxes with highly interpretable dimensions allow for better justification for the latent space and clusters. This method of increasing embedding transparency proves valuable in various domains: (1) AI-enhanced creative exploration can be more guided and customized for a particular experience and (2) high-level insights can be made more intuitive with vast text datasets.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleMethods for Latent Space Interpretation via In-the-loop Fine-Tuning
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record