Concurrent Learning-Based Adaptive Dynamic Programming for Autonomous Agents

Analytical solutions to the infinite horizon optimal control problem for continuous time nonlinear systems are generally not possible because they involve solving a nonlinear partial differential equation. Another challenge is that the optimal controller includes exact knowledge of the system dynamics. Motivated by these issues, researchers have recently used reinforcement learning methods that involve an actor and a critic to yield a forward-in-time approximate optimal control design. Methods that also seek to compensate for uncertain dynamics exploit some form of persistence of excitation assumption to yield parameter identification. However, in the adaptive dynamic programming context, this is impossible to verify a priori, and as a result researchers generally add an ad hoc probing signal to the controller that degrades the transient performance of the system. This presentation describes a forward-in-time dynamic programming approach that exploits the use of concurrent learning tools where the adaptive update laws are driven by current state information and recorded state information to yield approximate optimal control solutions without the need for ad hoc probing. A unique desired goal sampling method is also introduced as a means to address the classical exploration versus exploitation conundrum. Applications are presented for autonomous systems including robot manipulators, underwater vehicles, and fin controlled cruise missiles. Solutions are also developed for networks of systems where the problem is cast as a differential game where a Nash equilibrium is sought.