Domain-Independent Mode Estimation for Human-Robot
Collaboration

Gomez, Annabel Reyna

Author(s)

Gomez, Annabel Reyna

DownloadThesis PDF (5.258Mb)

Advisor

Williams, Brian C.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

To collaborate safely and intelligently with humans, robots must infer high-level semantic sates, such as intentions or interaction modes, from uncertain sensor input. While dynamic, probabilistic mode estimation is commonly used in fault diagnosis, this thesis extends the problem to activity recognition, where the goal is to estimate qualitative, symbolic human-object interaction states in real time. Robust human activity recognition is essential for collaborative and assistive robotics, particularly in dynamic or safety-critical environments. The core solution presented in this thesis is a mode-estimator and its efficient implementation using the A* with bounding conflicts (A*BC) algorithm. This performs best-first enumeration over symbolic activity states while integrating recursive Bayesian filtering to maintain belief under noisy observations. Unlike low-level trajectory tracking or deep-learned classifiers, qualitative spatial filtering operates at the right level of abstraction to recognize symbolic actions. It can also generalize across domains with minimal retraining and support efficient, probabilistically grounded reasoning about uncertainty in both perception and symbolic mode transitions. The proposed system fuses RGB-D perception, object segmentation, qualitative spatial reasoning (QSR), and probabilistic inference into a real-time pipeline capable of tracking and inferring symbolic human-object interaction states. Evaluated in a human-robot rehabilitation setting, this domain-independent system successfully infers latent human and object activity states from noise RGB-D data. It resolves ambiguity using Vision-Language Model (VLM)-guided semantic arbitration and demonstrates robustness and adaptability in unstructured environments. This work establishes qualitative spatial filtering with A*BC as a generalizable and efficient solution for semantic activity recognition, laying the foundation for future perception-driven collaborative systems.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/163032

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses