MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Domain-Independent Mode Estimation for Human-Robot Collaboration

Author(s)
Gomez, Annabel Reyna
Thumbnail
DownloadThesis PDF (5.258Mb)
Advisor
Williams, Brian C.
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
To collaborate safely and intelligently with humans, robots must infer high-level semantic sates, such as intentions or interaction modes, from uncertain sensor input. While dynamic, probabilistic mode estimation is commonly used in fault diagnosis, this thesis extends the problem to activity recognition, where the goal is to estimate qualitative, symbolic human-object interaction states in real time. Robust human activity recognition is essential for collaborative and assistive robotics, particularly in dynamic or safety-critical environments. The core solution presented in this thesis is a mode-estimator and its efficient implementation using the A* with bounding conflicts (A*BC) algorithm. This performs best-first enumeration over symbolic activity states while integrating recursive Bayesian filtering to maintain belief under noisy observations. Unlike low-level trajectory tracking or deep-learned classifiers, qualitative spatial filtering operates at the right level of abstraction to recognize symbolic actions. It can also generalize across domains with minimal retraining and support efficient, probabilistically grounded reasoning about uncertainty in both perception and symbolic mode transitions. The proposed system fuses RGB-D perception, object segmentation, qualitative spatial reasoning (QSR), and probabilistic inference into a real-time pipeline capable of tracking and inferring symbolic human-object interaction states. Evaluated in a human-robot rehabilitation setting, this domain-independent system successfully infers latent human and object activity states from noise RGB-D data. It resolves ambiguity using Vision-Language Model (VLM)-guided semantic arbitration and demonstrates robustness and adaptability in unstructured environments. This work establishes qualitative spatial filtering with A*BC as a generalizable and efficient solution for semantic activity recognition, laying the foundation for future perception-driven collaborative systems.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/163032
Department
Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.