minihack.reward_manager module

class minihack.reward_manager.AbstractRewardManager[source]

Bases: abc.ABC

This is the abstract base class for the RewardManager that is used for defining custom reward functions.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

abstract check_episode_end_call(env, previous_observation, action, observation)bool[source]

Check if the task has ended, and accumulate any reward from the transition in self._reward.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

Boolean whether the episode has ended.

Return type

bool

abstract collect_reward()float[source]

Return reward calculated and accumulated in check_episode_end_call, and then reset it.

Returns

The reward.

Return type

flaot

abstract reset()None[source]

Reset all events, to be called when a new episode occurs.

class minihack.reward_manager.CoordEvent(*args, coordinates: Tuple[int, int])[source]

Bases: minihack.reward_manager.Event

An event which occurs when reaching certain coordinates.

__init__(*args, coordinates: Tuple[int, int])[source]

Initialise the Event.

Parameters
  • coordinates (tuple) – The coordinates to reach for the event.

  • reward (float) – The reward for the event occuring

  • repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly

  • terminal_required (bool) – Whether this event is required for the episode to terminate.

  • terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

check(env, previous_observation, action, observation)float[source]

Check whether the environment is in the state such that this event has occured.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

The reward.

Return type

float

class minihack.reward_manager.Event(reward: float, repeatable: bool, terminal_required: bool, terminal_sufficient: bool)[source]

Bases: abc.ABC

An event which can occur in a MiniHack episode.

This is the base class of all other events.

__init__(reward: float, repeatable: bool, terminal_required: bool, terminal_sufficient: bool)[source]

Initialise the Event.

Parameters
  • reward (float) – The reward for the event occuring

  • repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly

  • terminal_required (bool) – Whether this event is required for the episode to terminate.

  • terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

abstract check(env, previous_observation, action, observation)float[source]

Check whether the environment is in the state such that this event has occured.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

The reward.

Return type

float

reset()[source]

Reset the event, if there is any state necessary.

class minihack.reward_manager.EventType(value)[source]

Bases: enum.IntEnum

An enumeration.

COORD = 2
LOC = 3
LOC_ACTION = 1
MESSAGE = 0
class minihack.reward_manager.GroupedRewardManager[source]

Bases: minihack.reward_manager.AbstractRewardManager

Operates as a collection of reward managers.

The rewards from each reward manager are summed, and termination can be specified by terminal_sufficient and terminal_required on each reward manager.

Given this can be nested arbitrarily deeply (as each reward manager could itself be a GroupedRewardManager), this enables complex specification of groups of rewards.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

add_reward_manager(reward_manager: minihack.reward_manager.AbstractRewardManager, terminal_required: bool, terminal_sufficient: bool)None[source]

Add a new reward manager, with terminal_sufficient and terminal_required acting as for individual events.

Parameters
  • reward_manager (RewardManager) – The reward manager to be added.

  • terminal_required (bool) – Whether this reward manager terminating is required for the episode to terminate.

  • terminal_sufficient – Whether this reward manager terminating is sufficient for the episode to terminate.

check_episode_end_call(env, previous_observation, action, observation)bool[source]

Check if the task has ended, and accumulate any reward from the transition in self._reward.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

Boolean whether the episode has ended.

Return type

bool

collect_reward()[source]

Return reward calculated and accumulated in check_episode_end_call, and then reset it.

Returns

The reward.

Return type

flaot

reset()[source]

Reset all events, to be called when a new episode occurs.

class minihack.reward_manager.LocActionEvent(*args, loc: str, action: nle.nethack.Command)[source]

Bases: minihack.reward_manager.Event

An event which checks whether an action is performed at a specified location.

__init__(*args, loc: str, action: nle.nethack.Command)[source]

Initialise the Event.

Parameters
  • loc (str) – The name of the location to reach.

  • action (int) – The action to perform.

  • reward (float) – The reward for the event occuring

  • repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly

  • terminal_required (bool) – Whether this event is required for the episode to terminate.

  • terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

check(env, previous_observation, action, observation)float[source]

Check whether the environment is in the state such that this event has occured.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

The reward.

Return type

float

reset()[source]

Reset the event, if there is any state necessary.

class minihack.reward_manager.LocEvent(*args, loc: str)[source]

Bases: minihack.reward_manager.Event

An event which checks whether a specified location is reached.

__init__(*args, loc: str)[source]

Initialise the Event.

Parameters
  • reward (float) – The reward for the event occuring

  • repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly

  • terminal_required (bool) – Whether this event is required for the episode to terminate.

  • terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

check(env, previous_observation, action, observation)float[source]

Check whether the environment is in the state such that this event has occured.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

The reward.

Return type

float

class minihack.reward_manager.MessageEvent(*args, messages: List[str])[source]

Bases: minihack.reward_manager.Event

An event which occurs when any of the messages appear.

__init__(*args, messages: List[str])[source]

Initialise the Event.

Parameters
  • messages (list) – The messages to be seen to trigger the event.

  • reward (float) – The reward for the event occuring

  • repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly

  • terminal_required (bool) – Whether this event is required for the episode to terminate.

  • terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

check(env, previous_observation, action, observation)float[source]

Check whether the environment is in the state such that this event has occured.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

The reward.

Return type

float

class minihack.reward_manager.RewardManager[source]

Bases: minihack.reward_manager.AbstractRewardManager

This class is used for managing rewards, events and termination for MiniHack tasks.

Some notes on the ordering or calls in the MiniHack/NetHack base class:

  • step(action) is called on the environment

  • Within step, first a copy of the last observation is made, and then the underlying NetHack game is stepped

  • Then _is_episode_end(observation) is called to check whether this the episode has ended (and this is overridden if we’ve gone over our max_steps, or the underlying NetHack game says we’re done (i.e. we died)

  • Then _reward_fn(last_observation, observation) is called to calculate the reward at this time-step

  • if end_status tells us the game is done, we quit the game

  • then step returns the observation, calculated reward, done, and some

statistics.

All this means that we need to check whether an observation is terminal in _is_episode_end before we’re calculating the reward function.

The call of _is_episode_end in MiniHack will call check_episode_end_call in this class, which checks for termination and accumulates any reward, which is returned and zeroed in collect_reward.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

add_amulet_event(reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when an amulet is worn.

Parameters
  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_coordinate_event(coordinates: Tuple[int, int], reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered on when reaching the specified coordinates.

Parameters
  • coordinates (Tuple[int, int]) – The coordinates to be reached (tuple of ints).

  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_custom_reward_fn(reward_fn: Callable[[MiniHack, Any, int, Any], float])None[source]

Add a custom reward function which is called every after step to calculate reward.

The function should be a callable which takes the environment, previous observation, action and current observation and returns a float reward.

Parameters

reward_fn (Callable[[MiniHack, Any, int, Any], float]) – A reward function which takes an environment, previous observation, action, next observation and returns a reward.

add_eat_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add an event which is triggered when name is eaten.

Parameters
  • name (str) – The name of the object being eaten.

  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_event(event: minihack.reward_manager.Event)[source]

Add an event to be managed by the reward manager.

Parameters

event (Event) – The event to be added.

add_kill_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when a specified monster is killed.

Parameters
  • name (str) – The name of the monster to be killed.

  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_location_event(location: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered on reaching a specified location.

Parameters
  • name (str) – The name of the location to be reached.

  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_message_event(msgs: List[str], reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when any of the given messages are seen.

Parameters
  • msgs (List[str]) – The name of the monster to be killed.

  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_positional_event(place_name: str, action_name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered on taking a given action at a given place.

Parameters
  • place_name (str) – The name of the place to trigger the event.

  • action_name (int) – The name of the action to trigger the event.

  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_wear_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when a specific armor is worn.

Parameters
  • name (str) – The name of the armor to be worn.

  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_wield_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when a specific weapon is wielded.

Parameters
  • name (str) – The name of the weapon to be wielded.

  • reward (float) – The reward for this event. Defaults to 1.

  • repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.

  • terminal_required (bool) – Whether this event is required for termination. Defaults to True.

  • terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

check_episode_end_call(env, previous_observation, action, observation)bool[source]

Check if the task has ended, and accumulate any reward from the transition in self._reward.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

Boolean whether the episode has ended.

Return type

bool

collect_reward()float[source]

Return reward calculated and accumulated in check_episode_end_call, and then reset it.

Returns

The reward.

Return type

flaot

reset()[source]

Reset all events, to be called when a new episode occurs.

class minihack.reward_manager.SequentialRewardManager[source]

Bases: minihack.reward_manager.RewardManager

A reward manager that ignores terminal_required and terminal_sufficient, and just require every event is completed in the order it is added to the reward manager.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

check_episode_end_call(env, previous_observation, action, observation)[source]

Check if the task has ended, and accumulate any reward from the transition in self._reward.

Parameters
  • env (MiniHack) – The MiniHack environment in question.

  • previous_observation (tuple) – The previous state observation.

  • action (int) – The action taken.

  • observation (tuple) – The current observation.

Returns

Boolean whether the episode has ended.

Return type

bool