minihack.reward_manager module
- class minihack.reward_manager.AbstractRewardManager[source]
Bases:
abc.ABC
This is the abstract base class for the
RewardManager
that is used for defining custom reward functions.- abstract check_episode_end_call(env, previous_observation, action, observation) → bool[source]
Check if the task has ended, and accumulate any reward from the transition in
self._reward
.- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
Boolean whether the episode has ended.
- Return type
bool
- class minihack.reward_manager.CoordEvent(*args, coordinates: Tuple[int, int])[source]
Bases:
minihack.reward_manager.Event
An event which occurs when reaching certain coordinates.
- __init__(*args, coordinates: Tuple[int, int])[source]
Initialise the Event.
- Parameters
coordinates (tuple) – The coordinates to reach for the event.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.Event(reward: float, repeatable: bool, terminal_required: bool, terminal_sufficient: bool)[source]
Bases:
abc.ABC
An event which can occur in a MiniHack episode.
This is the base class of all other events.
- __init__(reward: float, repeatable: bool, terminal_required: bool, terminal_sufficient: bool)[source]
Initialise the Event.
- Parameters
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- abstract check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.EventType(value)[source]
Bases:
enum.IntEnum
An enumeration.
- COORD = 2
- LOC = 3
- LOC_ACTION = 1
- MESSAGE = 0
- class minihack.reward_manager.GroupedRewardManager[source]
Bases:
minihack.reward_manager.AbstractRewardManager
Operates as a collection of reward managers.
The rewards from each reward manager are summed, and termination can be specified by
terminal_sufficient
andterminal_required
on each reward manager.Given this can be nested arbitrarily deeply (as each reward manager could itself be a GroupedRewardManager), this enables complex specification of groups of rewards.
- add_reward_manager(reward_manager: minihack.reward_manager.AbstractRewardManager, terminal_required: bool, terminal_sufficient: bool) → None[source]
Add a new reward manager, with
terminal_sufficient
andterminal_required
acting as for individual events.- Parameters
reward_manager (RewardManager) – The reward manager to be added.
terminal_required (bool) – Whether this reward manager terminating is required for the episode to terminate.
terminal_sufficient – Whether this reward manager terminating is sufficient for the episode to terminate.
- check_episode_end_call(env, previous_observation, action, observation) → bool[source]
Check if the task has ended, and accumulate any reward from the transition in
self._reward
.- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
Boolean whether the episode has ended.
- Return type
bool
- class minihack.reward_manager.LocActionEvent(*args, loc: str, action: nle.nethack.Command)[source]
Bases:
minihack.reward_manager.Event
An event which checks whether an action is performed at a specified location.
- __init__(*args, loc: str, action: nle.nethack.Command)[source]
Initialise the Event.
- Parameters
loc (str) – The name of the location to reach.
action (int) – The action to perform.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.LocEvent(*args, loc: str)[source]
Bases:
minihack.reward_manager.Event
An event which checks whether a specified location is reached.
- __init__(*args, loc: str)[source]
Initialise the Event.
- Parameters
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.MessageEvent(*args, messages: List[str])[source]
Bases:
minihack.reward_manager.Event
An event which occurs when any of the messages appear.
- __init__(*args, messages: List[str])[source]
Initialise the Event.
- Parameters
messages (list) – The messages to be seen to trigger the event.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.RewardManager[source]
Bases:
minihack.reward_manager.AbstractRewardManager
This class is used for managing rewards, events and termination for MiniHack tasks.
Some notes on the ordering or calls in the MiniHack/NetHack base class:
step(action)
is called on the environmentWithin
step
, first a copy of the last observation is made, and then the underlying NetHack game is steppedThen
_is_episode_end(observation)
is called to check whether this the episode has ended (and this is overridden if we’ve gone over our max_steps, or the underlying NetHack game says we’re done (i.e. we died)Then
_reward_fn(last_observation, observation)
is called to calculate the reward at this time-stepif
end_status
tells us the game is done, we quit the gamethen
step
returns the observation, calculated reward, done, and some
statistics.
All this means that we need to check whether an observation is terminal in
_is_episode_end
before we’re calculating the reward function.The call of
_is_episode_end
inMiniHack
will callcheck_episode_end_call
in this class, which checks for termination and accumulates any reward, which is returned and zeroed incollect_reward
.- add_amulet_event(reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when an amulet is worn.
- Parameters
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_coordinate_event(coordinates: Tuple[int, int], reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered on when reaching the specified coordinates.
- Parameters
coordinates (Tuple[int, int]) – The coordinates to be reached (tuple of ints).
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_custom_reward_fn(reward_fn: Callable[[MiniHack, Any, int, Any], float]) → None[source]
Add a custom reward function which is called every after step to calculate reward.
The function should be a callable which takes the environment, previous observation, action and current observation and returns a float reward.
- Parameters
reward_fn (Callable[[MiniHack, Any, int, Any], float]) – A reward function which takes an environment, previous observation, action, next observation and returns a reward.
- add_eat_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add an event which is triggered when name is eaten.
- Parameters
name (str) – The name of the object being eaten.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_event(event: minihack.reward_manager.Event)[source]
Add an event to be managed by the reward manager.
- Parameters
event (Event) – The event to be added.
- add_kill_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when a specified monster is killed.
- Parameters
name (str) – The name of the monster to be killed.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_location_event(location: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered on reaching a specified location.
- Parameters
name (str) – The name of the location to be reached.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_message_event(msgs: List[str], reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when any of the given messages are seen.
- Parameters
msgs (List[str]) – The name of the monster to be killed.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_positional_event(place_name: str, action_name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered on taking a given action at a given place.
- Parameters
place_name (str) – The name of the place to trigger the event.
action_name (int) – The name of the action to trigger the event.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_wear_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when a specific armor is worn.
- Parameters
name (str) – The name of the armor to be worn.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_wield_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when a specific weapon is wielded.
- Parameters
name (str) – The name of the weapon to be wielded.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- check_episode_end_call(env, previous_observation, action, observation) → bool[source]
Check if the task has ended, and accumulate any reward from the transition in
self._reward
.- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
Boolean whether the episode has ended.
- Return type
bool
- class minihack.reward_manager.SequentialRewardManager[source]
Bases:
minihack.reward_manager.RewardManager
A reward manager that ignores
terminal_required
andterminal_sufficient
, and just require every event is completed in the order it is added to the reward manager.- check_episode_end_call(env, previous_observation, action, observation)[source]
Check if the task has ended, and accumulate any reward from the transition in
self._reward
.- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
Boolean whether the episode has ended.
- Return type
bool