MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Natural Interaction: 3D Modeling in Wearable VR Using a Gesture and Speech Interface

Author(s)
Bei, Yining
Thumbnail
DownloadThesis PDF (28.20Mb)
Advisor
Nagakura, Takehiko
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Designers often rely on keyboard and mouse for 3D modeling, a method that can feel unintuitive or restrictive—especially in collaborative or spatially immersive settings. This thesis explores how multimodal interaction, specifically the combination of hand gestures and voice commands, can support more natural, efficient, and accessible 3D modeling in virtual reality (VR). Built on a custom Unity-based system integrating Meta Quest hand tracking and Wit.ai voice recognition, the study investigates how these two input modes—gesture and speech—can be used together to manipulate and modify 3D geometry in real time. The research proceeds in three phases: (1) a formative study analyzing how users intuitively deploy gestures, revealing common preferences, task breakdown strategies, and limitations in gesture inputs; (2) system design and implementation of both gesture-only and gesture + speech interfaces for navigation and object manipulation (e.g., translation, scaling, duplication); and (3) a comparative user study evaluating gesture-only, gesture + speech, and keyboard + mouse workflows in terms of learning curve, task efficiency, and user satisfaction. Results show that gesture + speech enables smoother transitions across modeling subtasks and allows users to offload certain parameters (e.g., numeric values, distances) to voice while using gestures for spatial control. Participants reported higher engagement and lower cognitive load compared to keyboard-based workflows, especially in tasks involving spatial scale and collaboration. This thesis demonstrates the feasibility and design potential of multimodal interaction for immersive modeling workflows and offers insights for future XR design tools that seek to blend precision with embodied interaction.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/163569
Department
Massachusetts Institute of Technology. Department of Architecture
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.