太阳成tyc7111cc(中国)集团-官方网站

当前位置:首页 > 学术交流 打印页面】【关闭
“数字+”与之江统计讲坛(第18讲)10月27日华东师范大学李育强教授来我院线上讲座预告
( 来源:   发布日期:2022-10-24 阅读:次)

题目:Minimax Weight Learning for Absorbing MDPs

报告人:李育强

讲座时间:2022年10月27日,星期四:14:00--15:00,

地点:综合楼644

          腾讯会议:184-369-637

摘要: Reinforcement learning policy evaluation problems are often modeled as a finite or infinite-horizon MDP, but this is often unrealistic for practical issues. In this paper, we study off-policy policy estimation for absorbing MDPs. Based on the Minimax Weight Learning (MWL) algorithm, we propose a so-called MWLA algorithm to directly estimate the importance ratio of state-action measure when the behavior policy is unknown, under the assumption that the data is collected by i.i.d. episodes. The Mean Square Error (MSE) bound for the MWLA method is investigated. In the episodic taxi environment, we show that the MWLA method has the lower MSE as the number of episodes and truncation length increase, significantly improving the accuracy of policy evaluation.

This talk is based on a joint work with Fengying Li and Xianyi Wu.


报告人简介:李育强,华东师范大学太阳成集团tyc7111cc教授,博士生导师,《应用概率统计》期刊编辑部主任。主要研究兴趣包括随机过程理论及其应用,强化学习等方向。主持国家自然科学基金、上海市自然科学基金、上海市教委科研创新重点项目等十余项,目前在Stochastic Processes and Their Applications,Bernoulli,Science China-Mathematics,Journal of Applied Probability等杂志上发表30余篇论文,研究成果被包括墨西哥科学院院士Gorostiza教授在内的数十位国内外学者所引用。


上一条: 没有了
下一条: 没有了
XML 地图