Overview - Keras-RL Documentation

Available Agents

Name	Implementation	Observation Space	Action Space
DQN	`rl.agents.DQNAgent`	discrete or continuous	discrete
DDPG	`rl.agents.DDPGAgent`	discrete or continuous	continuous
NAF	`rl.agents.NAFAgent`	discrete or continuous	continuous
CEM	`rl.agents.CEMAgent`	discrete or continuous	discrete
SARSA	`rl.agents.SARSAAgent`	discrete or continuous	discrete

Common API

All agents share a common API. This allows you to easily switch between different agents. That being said, keep in mind that some agents make assumptions regarding the action space, i.e. assume discrete or continuous actions.

[source]

fit

fit(self, env, nb_steps, action_repetition=1, callbacks=None, verbose=1, visualize=False, nb_max_start_steps=0, start_step_policy=None, log_interval=10000, nb_max_episode_steps=None)

Trains the agent on the given environment.

Arguments

env: (Env instance): Environment that the agent interacts with. See Env for details.
nb_steps (integer): Number of training steps to be performed.
action_repetition (integer): Number of times the agent repeats the same action without observing the environment again. Setting this to a value > 1 can be useful if a single action only has a very small effect on the environment.
callbacks (list of keras.callbacks.Callback or rl.callbacks.Callback instances): List of callbacks to apply during training. See callbacks for details.
verbose (integer): 0 for no logging, 1 for interval logging (compare log_interval), 2 for episode logging
visualize (boolean): If True, the environment is visualized during training. However, this is likely going to slow down training significantly and is thus intended to be a debugging instrument.
nb_max_start_steps (integer): Number of maximum steps that the agent performs at the beginning of each episode using start_step_policy. Notice that this is an upper limit since the exact number of steps to be performed is sampled uniformly from [0, max_start_steps] at the beginning of each episode.
start_step_policy (lambda observation: action): The policy to follow if nb_max_start_steps > 0. If set to None, a random action is performed.
log_interval (integer): If verbose = 1, the number of steps that are considered to be an interval.
nb_max_episode_steps (integer): Number of steps per episode that the agent performs before automatically resetting the environment. Set to None if each episode should run (potentially indefinitely) until the environment signals a terminal state.

Returns

A keras.callbacks.History instance that recorded the entire training process.

[source]

test

test(self, env, nb_episodes=1, action_repetition=1, callbacks=None, visualize=True, nb_max_episode_steps=None, nb_max_start_steps=0, start_step_policy=None, verbose=1)

Callback that is called before training begins."

[source]

compile

compile(self, optimizer, metrics=[])

Compiles an agent and the underlaying models to be used for training and testing.

Arguments

optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training.
metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.

[source]

get_config

get_config(self)

Configuration of the agent for serialization.

[source]

reset_states

reset_states(self)

Resets all internally kept states after an episode is completed.

[source]

load_weights

load_weights(self, filepath)

Loads the weights of an agent from an HDF5 file.

Arguments

filepath (str): The path to the HDF5 file.

[source]

save_weights

save_weights(self, filepath, overwrite=False)

Saves the weights of an agent as an HDF5 file.

Arguments

filepath (str): The path to where the weights should be saved.
overwrite (boolean): If False and filepath already exists, raises an error.