minihack.reward_manager module

class minihack.reward_manager.AbstractRewardManager[source]

Bases: abc.ABC

This is the abstract base class for the RewardManager that is used for defining custom reward functions.

__init__()[source]: Initialize self. See help(type(self)) for accurate signature.

abstract check_episode_end_call(env, previous_observation, action, observation) → bool[source]

Check if the task has ended, and accumulate any reward from the transition in self._reward.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

Boolean whether the episode has ended.

Return type

bool

abstract collect_reward() → float[source]

Return reward calculated and accumulated in check_episode_end_call, and then reset it.

Returns: The reward.
Return type: flaot

abstract reset() → None[source]: Reset all events, to be called when a new episode occurs.

class minihack.reward_manager.CoordEvent(*args, coordinates: Tuple[int, int])[source]

Bases: minihack.reward_manager.Event

An event which occurs when reaching certain coordinates.

__init__(*args, coordinates: Tuple[int, int])[source]

Initialise the Event.

Parameters

coordinates (tuple) – The coordinates to reach for the event.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

check(env, previous_observation, action, observation) → float[source]

Check whether the environment is in the state such that this event has occured.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

The reward.

Return type

float

class minihack.reward_manager.Event(reward: float, repeatable: bool, terminal_required: bool, terminal_sufficient: bool)[source]

Bases: abc.ABC

An event which can occur in a MiniHack episode.

This is the base class of all other events.

__init__(reward: float, repeatable: bool, terminal_required: bool, terminal_sufficient: bool)[source]

Initialise the Event.

Parameters

reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

abstract check(env, previous_observation, action, observation) → float[source]

Check whether the environment is in the state such that this event has occured.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

The reward.

Return type

float

reset()[source]: Reset the event, if there is any state necessary.

class minihack.reward_manager.EventType(value)[source]

Bases: enum.IntEnum

An enumeration.

COORD = 2

LOC = 3

LOC_ACTION = 1

MESSAGE = 0

class minihack.reward_manager.GroupedRewardManager[source]

Bases: minihack.reward_manager.AbstractRewardManager

Operates as a collection of reward managers.

The rewards from each reward manager are summed, and termination can be specified by terminal_sufficient and terminal_required on each reward manager.

Given this can be nested arbitrarily deeply (as each reward manager could itself be a GroupedRewardManager), this enables complex specification of groups of rewards.

__init__()[source]: Initialize self. See help(type(self)) for accurate signature.

add_reward_manager(reward_manager: minihack.reward_manager.AbstractRewardManager, terminal_required: bool, terminal_sufficient: bool) → None[source]

Add a new reward manager, with terminal_sufficient and terminal_required acting as for individual events.

Parameters

reward_manager (RewardManager) – The reward manager to be added.
terminal_required (bool) – Whether this reward manager terminating is required for the episode to terminate.
terminal_sufficient – Whether this reward manager terminating is sufficient for the episode to terminate.

check_episode_end_call(env, previous_observation, action, observation) → bool[source]

Check if the task has ended, and accumulate any reward from the transition in self._reward.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

Boolean whether the episode has ended.

Return type

bool

collect_reward()[source]

Return reward calculated and accumulated in check_episode_end_call, and then reset it.

Returns: The reward.
Return type: flaot

reset()[source]: Reset all events, to be called when a new episode occurs.

class minihack.reward_manager.LocActionEvent(*args, loc: str, action: nle.nethack.Command)[source]

Bases: minihack.reward_manager.Event

An event which checks whether an action is performed at a specified location.

__init__(*args, loc: str, action: nle.nethack.Command)[source]

Initialise the Event.

Parameters

loc (str) – The name of the location to reach.
action (int) – The action to perform.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

check(env, previous_observation, action, observation) → float[source]

Check whether the environment is in the state such that this event has occured.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

The reward.

Return type

float

reset()[source]: Reset the event, if there is any state necessary.

class minihack.reward_manager.LocEvent(*args, loc: str)[source]

Bases: minihack.reward_manager.Event

An event which checks whether a specified location is reached.

__init__(*args, loc: str)[source]

Initialise the Event.

Parameters

reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

check(env, previous_observation, action, observation) → float[source]

Check whether the environment is in the state such that this event has occured.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

The reward.

Return type

float

class minihack.reward_manager.MessageEvent(*args, messages: List[str])[source]

Bases: minihack.reward_manager.Event

An event which occurs when any of the messages appear.

__init__(*args, messages: List[str])[source]

Initialise the Event.

Parameters

messages (list) – The messages to be seen to trigger the event.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.

check(env, previous_observation, action, observation) → float[source]

Check whether the environment is in the state such that this event has occured.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

The reward.

Return type

float

class minihack.reward_manager.RewardManager[source]

Bases: minihack.reward_manager.AbstractRewardManager

This class is used for managing rewards, events and termination for MiniHack tasks.

Some notes on the ordering or calls in the MiniHack/NetHack base class:

step(action) is called on the environment
Within step, first a copy of the last observation is made, and then the underlying NetHack game is stepped
Then _is_episode_end(observation) is called to check whether this the episode has ended (and this is overridden if we’ve gone over our max_steps, or the underlying NetHack game says we’re done (i.e. we died)
Then _reward_fn(last_observation, observation) is called to calculate the reward at this time-step
if end_status tells us the game is done, we quit the game
then step returns the observation, calculated reward, done, and some

statistics.

All this means that we need to check whether an observation is terminal in _is_episode_end before we’re calculating the reward function.

The call of _is_episode_end in MiniHack will call check_episode_end_call in this class, which checks for termination and accumulates any reward, which is returned and zeroed in collect_reward.

__init__()[source]: Initialize self. See help(type(self)) for accurate signature.

add_amulet_event(reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when an amulet is worn.

Parameters

reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_coordinate_event(coordinates: Tuple[int, int], reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered on when reaching the specified coordinates.

Parameters

coordinates (Tuple[int, int]) – The coordinates to be reached (tuple of ints).
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_custom_reward_fn(reward_fn: Callable[[MiniHack, Any, int, Any], float]) → None[source]

Add a custom reward function which is called every after step to calculate reward.

The function should be a callable which takes the environment, previous observation, action and current observation and returns a float reward.

Parameters: reward_fn (Callable[[MiniHack, Any, int, Any], float]) – A reward function which takes an environment, previous observation, action, next observation and returns a reward.

add_eat_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add an event which is triggered when name is eaten.

Parameters

name (str) – The name of the object being eaten.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_event(event: minihack.reward_manager.Event)[source]

Add an event to be managed by the reward manager.

Parameters: event (Event) – The event to be added.

add_kill_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when a specified monster is killed.

Parameters

name (str) – The name of the monster to be killed.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_location_event(location: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered on reaching a specified location.

Parameters

name (str) – The name of the location to be reached.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_message_event(msgs: List[str], reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when any of the given messages are seen.

Parameters

msgs (List[str]) – The name of the monster to be killed.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_positional_event(place_name: str, action_name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered on taking a given action at a given place.

Parameters

place_name (str) – The name of the place to trigger the event.
action_name (int) – The name of the action to trigger the event.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_wear_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when a specific armor is worn.

Parameters

name (str) – The name of the armor to be worn.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

add_wield_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]

Add event which is triggered when a specific weapon is wielded.

Parameters

name (str) – The name of the weapon to be wielded.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.

check_episode_end_call(env, previous_observation, action, observation) → bool[source]

Check if the task has ended, and accumulate any reward from the transition in self._reward.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

Boolean whether the episode has ended.

Return type

bool

collect_reward() → float[source]

Return reward calculated and accumulated in check_episode_end_call, and then reset it.

Returns: The reward.
Return type: flaot

reset()[source]: Reset all events, to be called when a new episode occurs.

class minihack.reward_manager.SequentialRewardManager[source]

Bases: minihack.reward_manager.RewardManager

A reward manager that ignores terminal_required and terminal_sufficient, and just require every event is completed in the order it is added to the reward manager.

__init__()[source]: Initialize self. See help(type(self)) for accurate signature.

check_episode_end_call(env, previous_observation, action, observation)[source]

Check if the task has ended, and accumulate any reward from the transition in self._reward.

Parameters

env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.

Returns

Boolean whether the episode has ended.

Return type

bool