Call for Award Nominations
The goal of reinforcement learning is at the core of the CSS mission: computation of policies that are approximately optimal, subject to information constraints. From the beginning, control foundations have lurked behind the RL curtain: Watkins’ Q-function looks suspiciously like the Hamiltonian in Pontryagin’s minimum principle, and (since Van Roy’s thesis) it has been known that our beloved adjoint operators are the key to understanding what is going on with TD-learning. This talk will briefly survey the goals and foundations of RL, and present new work showing how to dramatically accelerate convergence based on a combination of control techniques. The talk will include a wish-list of open problems in both deterministic and stochastic control settings.