minihack.reward_manager module
- class minihack.reward_manager.AbstractRewardManager[source]
Bases:
abc.ABCThis is the abstract base class for the
RewardManagerthat is used for defining custom reward functions.- abstract check_episode_end_call(env, previous_observation, action, observation) → bool[source]
Check if the task has ended, and accumulate any reward from the transition in
self._reward.- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
Boolean whether the episode has ended.
- Return type
bool
- class minihack.reward_manager.CoordEvent(*args, coordinates: Tuple[int, int])[source]
Bases:
minihack.reward_manager.EventAn event which occurs when reaching certain coordinates.
- __init__(*args, coordinates: Tuple[int, int])[source]
Initialise the Event.
- Parameters
coordinates (tuple) – The coordinates to reach for the event.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.Event(reward: float, repeatable: bool, terminal_required: bool, terminal_sufficient: bool)[source]
Bases:
abc.ABCAn event which can occur in a MiniHack episode.
This is the base class of all other events.
- __init__(reward: float, repeatable: bool, terminal_required: bool, terminal_sufficient: bool)[source]
Initialise the Event.
- Parameters
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- abstract check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.EventType(value)[source]
Bases:
enum.IntEnumAn enumeration.
- COORD = 2
- LOC = 3
- LOC_ACTION = 1
- MESSAGE = 0
- class minihack.reward_manager.GroupedRewardManager[source]
Bases:
minihack.reward_manager.AbstractRewardManagerOperates as a collection of reward managers.
The rewards from each reward manager are summed, and termination can be specified by
terminal_sufficientandterminal_requiredon each reward manager.Given this can be nested arbitrarily deeply (as each reward manager could itself be a GroupedRewardManager), this enables complex specification of groups of rewards.
- add_reward_manager(reward_manager: minihack.reward_manager.AbstractRewardManager, terminal_required: bool, terminal_sufficient: bool) → None[source]
Add a new reward manager, with
terminal_sufficientandterminal_requiredacting as for individual events.- Parameters
reward_manager (RewardManager) – The reward manager to be added.
terminal_required (bool) – Whether this reward manager terminating is required for the episode to terminate.
terminal_sufficient – Whether this reward manager terminating is sufficient for the episode to terminate.
- check_episode_end_call(env, previous_observation, action, observation) → bool[source]
Check if the task has ended, and accumulate any reward from the transition in
self._reward.- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
Boolean whether the episode has ended.
- Return type
bool
- class minihack.reward_manager.LocActionEvent(*args, loc: str, action: nle.nethack.Command)[source]
Bases:
minihack.reward_manager.EventAn event which checks whether an action is performed at a specified location.
- __init__(*args, loc: str, action: nle.nethack.Command)[source]
Initialise the Event.
- Parameters
loc (str) – The name of the location to reach.
action (int) – The action to perform.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.LocEvent(*args, loc: str)[source]
Bases:
minihack.reward_manager.EventAn event which checks whether a specified location is reached.
- __init__(*args, loc: str)[source]
Initialise the Event.
- Parameters
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.MessageEvent(*args, messages: List[str])[source]
Bases:
minihack.reward_manager.EventAn event which occurs when any of the messages appear.
- __init__(*args, messages: List[str])[source]
Initialise the Event.
- Parameters
messages (list) – The messages to be seen to trigger the event.
reward (float) – The reward for the event occuring
repeatable (bool) – Whether the event can occur repeated (i.e. if the reward can be collected repeatedly
terminal_required (bool) – Whether this event is required for the episode to terminate.
terminal_sufficient (bool) – Whether this event causes the episode to terminate on its own.
- check(env, previous_observation, action, observation) → float[source]
Check whether the environment is in the state such that this event has occured.
- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
The reward.
- Return type
float
- class minihack.reward_manager.RewardManager[source]
Bases:
minihack.reward_manager.AbstractRewardManagerThis class is used for managing rewards, events and termination for MiniHack tasks.
Some notes on the ordering or calls in the MiniHack/NetHack base class:
step(action)is called on the environmentWithin
step, first a copy of the last observation is made, and then the underlying NetHack game is steppedThen
_is_episode_end(observation)is called to check whether this the episode has ended (and this is overridden if we’ve gone over our max_steps, or the underlying NetHack game says we’re done (i.e. we died)Then
_reward_fn(last_observation, observation)is called to calculate the reward at this time-stepif
end_statustells us the game is done, we quit the gamethen
stepreturns the observation, calculated reward, done, and some
statistics.
All this means that we need to check whether an observation is terminal in
_is_episode_endbefore we’re calculating the reward function.The call of
_is_episode_endinMiniHackwill callcheck_episode_end_callin this class, which checks for termination and accumulates any reward, which is returned and zeroed incollect_reward.- add_amulet_event(reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when an amulet is worn.
- Parameters
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_coordinate_event(coordinates: Tuple[int, int], reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered on when reaching the specified coordinates.
- Parameters
coordinates (Tuple[int, int]) – The coordinates to be reached (tuple of ints).
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_custom_reward_fn(reward_fn: Callable[[MiniHack, Any, int, Any], float]) → None[source]
Add a custom reward function which is called every after step to calculate reward.
The function should be a callable which takes the environment, previous observation, action and current observation and returns a float reward.
- Parameters
reward_fn (Callable[[MiniHack, Any, int, Any], float]) – A reward function which takes an environment, previous observation, action, next observation and returns a reward.
- add_eat_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add an event which is triggered when name is eaten.
- Parameters
name (str) – The name of the object being eaten.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_event(event: minihack.reward_manager.Event)[source]
Add an event to be managed by the reward manager.
- Parameters
event (Event) – The event to be added.
- add_kill_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when a specified monster is killed.
- Parameters
name (str) – The name of the monster to be killed.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_location_event(location: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered on reaching a specified location.
- Parameters
name (str) – The name of the location to be reached.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_message_event(msgs: List[str], reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when any of the given messages are seen.
- Parameters
msgs (List[str]) – The name of the monster to be killed.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_positional_event(place_name: str, action_name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered on taking a given action at a given place.
- Parameters
place_name (str) – The name of the place to trigger the event.
action_name (int) – The name of the action to trigger the event.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_wear_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when a specific armor is worn.
- Parameters
name (str) – The name of the armor to be worn.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- add_wield_event(name: str, reward=1, repeatable=False, terminal_required=True, terminal_sufficient=False)[source]
Add event which is triggered when a specific weapon is wielded.
- Parameters
name (str) – The name of the weapon to be wielded.
reward (float) – The reward for this event. Defaults to 1.
repeatable (bool) – Whether this event can be triggered multiple times. Defaults to False.
terminal_required (bool) – Whether this event is required for termination. Defaults to True.
terminal_sufficient (bool) – Whether this event is sufficient for termination. Defaults to False.
- check_episode_end_call(env, previous_observation, action, observation) → bool[source]
Check if the task has ended, and accumulate any reward from the transition in
self._reward.- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
Boolean whether the episode has ended.
- Return type
bool
- class minihack.reward_manager.SequentialRewardManager[source]
Bases:
minihack.reward_manager.RewardManagerA reward manager that ignores
terminal_requiredandterminal_sufficient, and just require every event is completed in the order it is added to the reward manager.- check_episode_end_call(env, previous_observation, action, observation)[source]
Check if the task has ended, and accumulate any reward from the transition in
self._reward.- Parameters
env (MiniHack) – The MiniHack environment in question.
previous_observation (tuple) – The previous state observation.
action (int) – The action taken.
observation (tuple) – The current observation.
- Returns
Boolean whether the episode has ended.
- Return type
bool