机器学习与数据科学博士生系列论坛(第七十六期)—— Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems
报告人:詹景昕(304am永利集团)
时间:2024-09-19 16:00-17:00
地点:腾讯会议 627-5441-1672
摘要:
Best-Of-Both-Worlds (BOBW) bandit algorithms that have regret guarantees for both stochastic and adversarial settings have been studied for many years and Tsallis-INF (or other FTRL policies) is one of the most promising frameworks for BOBW policies.
However, a limitation of FTRL policies is that we need to explicitly compute the list of arm selection probabilities. The Follow-The-Perturbed-Leader (FTPL) policy has been researched as a promising candidate to circumvent this limitation. In this talk, we will introduce a FTPL algorithm with Fréchet perturbation, which also achieves the BOBW bound, based on a recent work by Lee, Honda, Ito and Oh (Colt 2024).
论坛简介:该线上论坛是由张志华教授机器学习实验室组织,每两周主办一次(除了公共假期)。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍,主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。