kaiyun·体育(全站)官方网站登录入口

首页» 科学研究» 学术报告» 讨论班» Information Sciences

讨论班

机器学习与数据科学博士生系列论坛（第三十八期）—— A Statistical Perspective on Off-policy Evaluation in Reinforcement Learning

报告人：Chuhan Xie (PKU)

时间：2022-10-24 16:00-17:00

地点：腾讯会议 723 1564 5542

Abstract:

Off-policy evaluation (OPE) is one of the most important tasks in offline reinforcement learning. As opposed to online reinforcement learning tasks, in which the agent can directly interact with the environment and instantly get rewards, OPE problems only assume a given dataset of trajectories, generated by an unknown behavior policy in advance. OPE has a pure statistical formulation; however, most existing works mainly focus on point estimation and lack statistical interpretations and theoretical guarantees, which may impede its application in fields requiring high precision.

In this talk, we try to give a statistical understanding of recent OPE algorithms. In particular, we will review three popular methods for OPE: direct method (DM), importance sampling (IS), and doubly robust estimator (DR), and discuss their close relationship with the estimation of average treatment effect (ATE) in causal inference literature. These estimators additionally need two nuisance components to be estimated, and we will present how to estimate them with theoretical guarantees. We will finally give a selective introduction of recent progress on OPE problems, including remedies for different data assumptions as well as new combinations with traditional statistical approaches.

北大数学成就展

人才引进

捐赠