Domain-Independent Mode Estimation for Human-Robot Collaboration
Author(s)
Gomez, Annabel Reyna
DownloadThesis PDF (5.258Mb)
Advisor
Williams, Brian C.
Terms of use
Metadata
Show full item recordAbstract
To collaborate safely and intelligently with humans, robots must infer high-level semantic sates, such as intentions or interaction modes, from uncertain sensor input. While dynamic, probabilistic mode estimation is commonly used in fault diagnosis, this thesis extends the problem to activity recognition, where the goal is to estimate qualitative, symbolic human-object interaction states in real time. Robust human activity recognition is essential for collaborative and assistive robotics, particularly in dynamic or safety-critical environments. The core solution presented in this thesis is a mode-estimator and its efficient implementation using the A* with bounding conflicts (A*BC) algorithm. This performs best-first enumeration over symbolic activity states while integrating recursive Bayesian filtering to maintain belief under noisy observations. Unlike low-level trajectory tracking or deep-learned classifiers, qualitative spatial filtering operates at the right level of abstraction to recognize symbolic actions. It can also generalize across domains with minimal retraining and support efficient, probabilistically grounded reasoning about uncertainty in both perception and symbolic mode transitions. The proposed system fuses RGB-D perception, object segmentation, qualitative spatial reasoning (QSR), and probabilistic inference into a real-time pipeline capable of tracking and inferring symbolic human-object interaction states. Evaluated in a human-robot rehabilitation setting, this domain-independent system successfully infers latent human and object activity states from noise RGB-D data. It resolves ambiguity using Vision-Language Model (VLM)-guided semantic arbitration and demonstrates robustness and adaptability in unstructured environments. This work establishes qualitative spatial filtering with A*BC as a generalizable and efficient solution for semantic activity recognition, laying the foundation for future perception-driven collaborative systems.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Aeronautics and AstronauticsPublisher
Massachusetts Institute of Technology