Reinforcement Learning and Stochastic Approximation

Stochastic approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. Reinforcement learning algorithms such as TD- and Q-learning are two of its most famous applications.

This talk provides an overview of stochastic approximation, with focus on optimizing the rate of convergence. Based on this general theory, the well known slow convergence of Q-learning is explained: the variance of the algorithm is typically infinite. New algorithms with provably fast (even optimal) convergence rate have been developed in recent years: stochastic Newton-Raphson, Zap SNR, and acceleration techniques inspired by Polyak and Nesterov will be discussed (as time permits, and depending on the interests of the audience).