Q-learning with nearest neighbors

Shah, Devavrat; Xie, Qiaomin

dc.contributor.author	Shah, Devavrat
dc.contributor.author	Xie, Qiaomin
dc.date.accessioned	2021-11-09T16:08:56Z
dc.date.available	2021-11-09T16:08:56Z
dc.date.issued	2018
dc.identifier.uri	https://hdl.handle.net/1721.1/137946
dc.description.abstract	© 2018 Curran Associates Inc.All rights reserved. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor γ ∈ (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an ε-accurate estimate of the optimal Q-function using Õ e (L/(ε 3 (1 - γ) 7 )) samples. For instance, for a well-behaved MDP, the covering time of the sample path under the purely random policy scales as Õ e (1/ε d ), so the sample complexity scales as Õ e (1/ε d+3 ). Indeed, we establish a lower bound that argues that the dependence of Ω e (1/ε d+2 ) is necessary.	en_US
dc.language.iso	en
dc.relation.isversionof	https://papers.nips.cc/paper/7574-q-learning-with-nearest-neighbors	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Neural Information Processing Systems (NIPS)	en_US
dc.title	Q-learning with nearest neighbors	en_US
dc.type	Article	en_US
dc.identifier.citation	Shah, Devavrat and Xie, Qiaomin. 2018. "Q-learning with nearest neighbors."
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	Statistics and Data Science Center (Massachusetts Institute of Technology)
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-07-10T16:26:47Z
dspace.date.submission	2019-07-10T16:26:48Z
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 7574-q-learning-with-nearest-n ...
Size:: 514.1Kb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record