机器学习与数据科学博士生系列论坛(第五十八期)—— Estimation and Inference in Distributional Reinforcement Learning
报告人:Liangyu Zhang (PKU)
时间:2023-09-28 16:00-17:00
地点:腾讯会议 723 1564 5542
摘要:
Classical reinforcement learning relies on the 'reward hypothesis,' where the performance of a learning agent is assessed based on its expected returns. However, in many applications, it is not enough to merely consider the expected returns, because other factors such as uncertainty or risks can be crucial. To address this challenge, distributional reinforcement learning extends beyond the notion of expected returns, introducing the idea of learning the complete return distribution.
In this talk, we discuss the topics of estimation and inference in distributional reinforcement learning. Our investigation focuses on distributional policy evaluation, aiming to estimate the distribution of the return (denoted $\eta^\pi$) attained by a given policy $\pi$. We show that a polynomial number of samples can guarantee a near-optimal estimation when the estimator $\hat\eta^\pi$ is constructed by the certainty-equivalence method. We also examine the asymptotics of $\sqrt{\hat\eta^\pi-\eta^\pi}$ and show that it converges weakly to a Gaussian random element. Based on this, we propose a unified inference procedure for a wide class of statistical functionals of $\eta^\pi$.
论坛简介:该线上论坛是由张志华教授机器学习实验室组织,每两周主办一次(除了公共假期)。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍,主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。