Kernel-based approximate dynamic programming using Bellman residual elimination

Bethke, Brett (Brett M.)

dc.contributor.advisor	Jonathan P. How.	en_US
dc.contributor.author	Bethke, Brett (Brett M.)	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Aeronautics and Astronautics.	en_US
dc.date.accessioned	2010-08-26T15:21:54Z
dc.date.available	2010-08-26T15:21:54Z
dc.date.copyright	2010	en_US
dc.date.issued	2010	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/57544
dc.description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2010.	en_US
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Cataloged from student submitted PDF version of thesis.	en_US
dc.description	Includes bibliographical references (p. 207-221).	en_US
dc.description.abstract	Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if there is randomness in the system evolution over time. Unfortunately, the curse of dimensionality prevents most MDPs of practical size from being solved exactly. One main focus of the thesis is on the development of a new family of algorithms for computing approximate solutions to large-scale MDPs. Our algorithms are similar in spirit to Bellman residual methods, which attempt to minimize the error incurred in solving Bellman's equation at a set of sample states. However, by exploiting kernel-based regression techniques (such as support vector regression and Gaussian process regression) with nondegenerate kernel functions as the underlying cost-to-go function approximation architecture, our algorithms are able to construct cost-to-go solutions for which the Bellman residuals are explicitly forced to zero at the sample states. For this reason, we have named our approach Bellman residual elimination (BRE). In addition to developing the basic ideas behind BRE, we present multi-stage and model-free extensions to the approach. The multistage extension allows for automatic selection of an appropriate kernel for the MDP at hand, while the model-free extension can use simulated or real state trajectory data to learn an approximate policy when a system model is unavailable.	en_US
dc.description.abstract	(cont.) We present theoretical analysis of all BRE algorithms proving convergence to the optimal policy in the limit of sampling the entire state space, and show computational results on several benchmark problems. Another challenge in implementing control policies based on MDPs is that there may be parameters of the system model that are poorly known and/or vary with time as the system operates. System performance can suer if the model used to compute the policy differs from the true model. To address this challenge, we develop an adaptive architecture that allows for online MDP model learning and simultaneous re-computation of the policy. As a result, the adaptive architecture allows the system to continuously re-tune its control policy to account for better model information 3 obtained through observations of the actual system in operation, and react to changes in the model as they occur. Planning in complex, large-scale multi-agent robotic systems is another focus of the thesis. In particular, we investigate the persistent surveillance problem, in which one or more unmanned aerial vehicles (UAVs) and/or unmanned ground vehicles (UGVs) must provide sensor coverage over a designated location on a continuous basis. This continuous coverage must be maintained even in the event that agents suer failures over the course of the mission. The persistent surveillance problem is pertinent to a number of applications, including search and rescue, natural disaster relief operations, urban traffic monitoring, etc.	en_US
dc.description.abstract	(cont.) Using both simulations and actual flight experiments conducted in the MIT RAVEN indoor flight facility, we demonstrate the successful application of the BRE algorithms and the adaptive MDP architecture in achieving high mission performance despite the random occurrence of failures. Furthermore, we demonstrate performance benefits of our approach over a deterministic planning approach that does not account for these failures.	en_US
dc.description.statementofresponsibility	by Brett M. Bethke.	en_US
dc.format.extent	221 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Aeronautics and Astronautics.	en_US
dc.title	Kernel-based approximate dynamic programming using Bellman residual elimination	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
dc.identifier.oclc	639230464	en_US

Files in this item

Name:: 639230464-MIT.pdf
Size:: 19.76Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record