Header menu link for other important links
X
Q-learning with policy function approximation for a benchmark ball and beam control problem
Published in ICIC International
2017
Volume: 13
   
Issue: 5
Pages: 1467 - 1476
Abstract
Reinforcement learning (RL), a dynamic programming algorithm, solves optimization problems through autonomous agents. These agents interact with the environment to learn the optimal actions which lead them to the goal. Q-learning algorithm is a model-free reinforcement learning algorithm that learns a Q-function from delayed rewards. Commonly RL algorithms are applied for discrete state and action based environments. This discretization commoves the performance of the RL agent in control system applications, where the state and action spaces are continuous. This paper addresses the problem of handling continuous state-action spaces using Q-learning, by utilizing an artificial neural network (ANN) as an interpolator. A simple feedforward neural network was trained using the discontinuous policy function, extracted from the final Q-function. The proposed controller learning scheme was tested on a benchmark, real time ball-beam setup. Observed results indicate, the controller with approximated policy function produces less magnitude of oscillations and reduces steady state error. © 2017 ICIC International.
About the journal
JournalInternational Journal of Innovative Computing, Information and Control
PublisherICIC International
ISSN13494198