NAFAgent

Introduction

rl.agents.dqn.NAFAgent(V_model, L_model, mu_model, random_process=None, covariance_mode='full')

Normalized Advantage Function (NAF) agents is a way of extending DQN to a continuous action space, and is simpler than DDPG agents.

The Q-function is here decomposed into an advantage term A and state value term V. The agent thus makes use of three models: the V_model learns the state value term, while the advantage term A is constructed based on the L_model and the mu_model such that the mu_model is always the action that maximizes the Q function. (exact mathematical formulation in the paper)

Since the mu_model chooses the action deterministically, we can add a random_process to balance exploration and exploitation. Similar to DQN, we use target networks for stability and keep a replay buffer.

References

Continuous Deep Q-Learning with Model-based Acceleration, Gu et al., 2016