dc.contributor.author | Shah, Devavrat | |
dc.contributor.author | Xie, Qiaomin | |
dc.date.accessioned | 2021-11-09T16:08:56Z | |
dc.date.available | 2021-11-09T16:08:56Z | |
dc.date.issued | 2018 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/137946 | |
dc.description.abstract | © 2018 Curran Associates Inc.All rights reserved. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor γ ∈ (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an ε-accurate estimate of the optimal Q-function using Õ e (L/(ε 3 (1 - γ) 7 )) samples. For instance, for a well-behaved MDP, the covering time of the sample path under the purely random policy scales as Õ e (1/ε d ), so the sample complexity scales as Õ e (1/ε d+3 ). Indeed, we establish a lower bound that argues that the dependence of Ω e (1/ε d+2 ) is necessary. | en_US |
dc.language.iso | en | |
dc.relation.isversionof | https://papers.nips.cc/paper/7574-q-learning-with-nearest-neighbors | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | Neural Information Processing Systems (NIPS) | en_US |
dc.title | Q-learning with nearest neighbors | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Shah, Devavrat and Xie, Qiaomin. 2018. "Q-learning with nearest neighbors." | |
dc.contributor.department | Massachusetts Institute of Technology. Laboratory for Information and Decision Systems | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.contributor.department | Statistics and Data Science Center (Massachusetts Institute of Technology) | |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2019-07-10T16:26:47Z | |
dspace.date.submission | 2019-07-10T16:26:48Z | |
mit.license | PUBLISHER_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |