Optimization Techniques for Trustworthy 3D Object Understanding

Shaikewitz, Lorenzo Franceschini

Author(s)

Shaikewitz, Lorenzo Franceschini

DownloadThesis PDF (15.04Mb)

Advisor

Carlone, Luca

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Autonomous machines require reliable 3D object understanding to interpret and interact with their environment. In this thesis, we consider two tightly coupled 3D object understanding problems. Shape estimation seeks a consistent 3D model of an object given sensor data and some set of priors. Pose estimation seeks an estimate of the object’s position and orientation relative to an invariant shape frame. In general, these problems are non-convex and thus difficult to solve. We present algorithms which nonetheless solve shape and pose estimation efficiently and with assurances in terms of of optimality, uncertainty, or latency. We begin in the multi-frame tracking setting, where we propose the certifiably optimal estimator CAST⋆ for simultaneous shape estimation and object tracking. CAST⋆ uses 3D keypoint measurements extracted from an RGB-D image sequence and phrases the estimation as fixed-lag smoothing. Temporal constraints enforce rigidity and continuous motion. Despite the non-convexity of this problem, we solve it to certifiable optimality using a smallsize semidefinite relaxation. We also present a compatibility-based outlier rejection scheme to handle outliers, and evaluate the proposed approach on synthetic and real data. Next, we focus on estimating the pose of an object given its shape and a single RGB image (no depth). Assuming only bounded noise on 2D keypoint measurements (e.g., from conformal prediction), we derive an estimator for the most likely object pose which uses a semidefinite relaxation to initialize a local solver. We pair this with an efficient uncertainty estimation routine which relies on a generalization of the S-Lemma to propagate keypoint uncertainty to high-probability translation and rotation bounds. The high-probability bounds hold regardless of the accuracy of the pose estimate, and are reasonably tight when tested on the LineMOD-Occluded dataset. Lastly, we propose a sub-millisecond solution to simultaneous estimation of object shape and pose from a single RGB-D image. Our approach converts the first-order optimality conditions of the non-convex optimization problem to a nonlinear eigenproblem in the quaternion representation of orientation. We use self-consistent field iteration to efficiently arrive at a local stationary point, finding solutions more than an order of magnitude faster than Gauss-Newton or on-manifold local solvers on synthetically generated data.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/163014

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses