Show simple item record

dc.contributor.authorShah, Devavrat
dc.contributor.authorXie, Qiaomin
dc.date.accessioned2021-11-09T16:08:56Z
dc.date.available2021-11-09T16:08:56Z
dc.date.issued2018
dc.identifier.urihttps://hdl.handle.net/1721.1/137946
dc.description.abstract© 2018 Curran Associates Inc.All rights reserved. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor γ ∈ (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an ε-accurate estimate of the optimal Q-function using Õ e (L/(ε 3 (1 - γ) 7 )) samples. For instance, for a well-behaved MDP, the covering time of the sample path under the purely random policy scales as Õ e (1/ε d ), so the sample complexity scales as Õ e (1/ε d+3 ). Indeed, we establish a lower bound that argues that the dependence of Ω e (1/ε d+2 ) is necessary.en_US
dc.language.isoen
dc.relation.isversionofhttps://papers.nips.cc/paper/7574-q-learning-with-nearest-neighborsen_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceNeural Information Processing Systems (NIPS)en_US
dc.titleQ-learning with nearest neighborsen_US
dc.typeArticleen_US
dc.identifier.citationShah, Devavrat and Xie, Qiaomin. 2018. "Q-learning with nearest neighbors."
dc.contributor.departmentMassachusetts Institute of Technology. Laboratory for Information and Decision Systems
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentStatistics and Data Science Center (Massachusetts Institute of Technology)
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2019-07-10T16:26:47Z
dspace.date.submission2019-07-10T16:26:48Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record