Available Agents
Name | Implementation | Observation Space | Action Space |
---|---|---|---|
DQN | rl.agents.DQNAgent |
discrete or continuous | discrete |
DDPG | rl.agents.DDPGAgent |
discrete or continuous | continuous |
NAF | rl.agents.NAFAgent |
discrete or continuous | continuous |
CEM | rl.agents.CEMAgent |
discrete or continuous | discrete |
SARSA | rl.agents.SARSAAgent |
discrete or continuous | discrete |
Common API
All agents share a common API. This allows you to easily switch between different agents. That being said, keep in mind that some agents make assumptions regarding the action space, i.e. assume discrete or continuous actions.
fit
fit(self, env, nb_steps, action_repetition=1, callbacks=None, verbose=1, visualize=False, nb_max_start_steps=0, start_step_policy=None, log_interval=10000, nb_max_episode_steps=None)
Trains the agent on the given environment.
Arguments
- env: (
Env
instance): Environment that the agent interacts with. See Env for details. - nb_steps (integer): Number of training steps to be performed.
- action_repetition (integer): Number of times the agent repeats the same action without observing the environment again. Setting this to a value > 1 can be useful if a single action only has a very small effect on the environment.
- callbacks (list of
keras.callbacks.Callback
orrl.callbacks.Callback
instances): List of callbacks to apply during training. See callbacks for details. - verbose (integer): 0 for no logging, 1 for interval logging (compare
log_interval
), 2 for episode logging - visualize (boolean): If
True
, the environment is visualized during training. However, this is likely going to slow down training significantly and is thus intended to be a debugging instrument. - nb_max_start_steps (integer): Number of maximum steps that the agent performs at the beginning
of each episode using
start_step_policy
. Notice that this is an upper limit since the exact number of steps to be performed is sampled uniformly from [0, max_start_steps] at the beginning of each episode. - start_step_policy (
lambda observation: action
): The policy to follow ifnb_max_start_steps
> 0. If set toNone
, a random action is performed. - log_interval (integer): If
verbose
= 1, the number of steps that are considered to be an interval. - nb_max_episode_steps (integer): Number of steps per episode that the agent performs before
automatically resetting the environment. Set to
None
if each episode should run (potentially indefinitely) until the environment signals a terminal state.
Returns
A keras.callbacks.History
instance that recorded the entire training process.
test
test(self, env, nb_episodes=1, action_repetition=1, callbacks=None, visualize=True, nb_max_episode_steps=None, nb_max_start_steps=0, start_step_policy=None, verbose=1)
Callback that is called before training begins."
compile
compile(self, optimizer, metrics=[])
Compiles an agent and the underlaying models to be used for training and testing.
Arguments
- optimizer (
keras.optimizers.Optimizer
instance): The optimizer to be used during training. - metrics (list of functions
lambda y_true, y_pred: metric
): The metrics to run during training.
get_config
get_config(self)
Configuration of the agent for serialization.
reset_states
reset_states(self)
Resets all internally kept states after an episode is completed.
load_weights
load_weights(self, filepath)
Loads the weights of an agent from an HDF5 file.
Arguments
- filepath (str): The path to the HDF5 file.
save_weights
save_weights(self, filepath, overwrite=False)
Saves the weights of an agent as an HDF5 file.
Arguments
- filepath (str): The path to where the weights should be saved.
- overwrite (boolean): If
False
andfilepath
already exists, raises an error.