
Bandits#
(à venir)
Lectures
Learning to Interact (John Langford)
Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
Learning the distribution with largest mean: two bandit frameworks
Analyse de stratégies bayésiennes et fréquentistes pour l’allocation séquentielle de ressources (thèse)
Modules