"My Very Subjective Human Interpretation": Domain Expert Perspectives on Navigating the Text Analysis Loop for Topic Models
Author(s)
Schofield, Alexandra; Wu, Siqi; Bayard de Volo, Theo; Kuze, Tatsuki; Gomez, Alfredo; Sultana, Sharifa; ... Show more Show less
Download3701201.pdf (723.1Kb)
Publisher Policy
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
Practitioners dealing with large text collections frequently use topic models such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) in their projects to explore trends. Despite twenty years of accrued advancement in natural language processing tools, these models are found to be slow and challenging to apply to text exploration projects. In our work, we engaged with practitioners (n=15) who use topic modeling to explore trends in large text collections to understand their project workflows and investigate which factors often slow down the processes and how they deal with such errors and interruptions in automated topic modeling. Our findings show that practitioners are required to diagnose and resolve context-specific problems with preparing data and models and need control for these steps, especially for data cleaning and parameter selection. Our major findings resonate with existing work across CSCW, computational social science, machine learning, data science, and digital humanities. They also leave us questioning whether automation is actually a useful goal for tools designed for topic models and text exploration.
Date issued
2025-01-10Journal
Proceedings of the ACM on Human-Computer Interaction
Publisher
Association for Computing Machinery
Citation
Schofield, Alexandra, Wu, Siqi, Bayard de Volo, Theo, Kuze, Tatsuki, Gomez, Alfredo et al. 2025. ""My Very Subjective Human Interpretation": Domain Expert Perspectives on Navigating the Text Analysis Loop for Topic Models." Proceedings of the ACM on Human-Computer Interaction, 9 (GROUP).
Version: Final published version
ISSN
2573-0142
Collections
The following license files are associated with this item: