# Reward Function

## Default Configuration

Reward functions in MiniHack can easily be configured. The default reward function of custom MiniHack environments is a sparse reward of +1 for reaching the staircase down (which terminates the episode), and 0 otherwise, with episodes terminating after a configurable number of timesteps. In addition, the agent receives a negative reward of -0.01 if the game timer doesn’t progress during a step (e.g. the agent moves towards a wall).

These defaults can be easily adjusted using the following environment flags:

Parameter

Default Value

Description

reward_win

1

the reward received upon successfully completing an episode.

reward_lose

0

the reward received upon death or aborting.

penalty_mode

“constant”

name of the mode for calculating the time step penalty. Can be constant, exp, square, linear, or always.

penalty_step

-0.01

constant applied to amount of frozen steps.

penalty_time

0

constant applied to amount of non-frozen steps.

## Reward Manager

We also provide an interface for designing custom reward functions. By using the RewardManager, users can control what events give the agent reward, whether those events can be repeated, and what combinations of events are sufficient or required to terminate the episode.

In order to use the reward managers, users need create an instance of the class and pass it to a MiniHack environment. In the example below, the agent receives +1 reward for eating an apple or +2 reward for wielding a dagger (both of which also terminate the episode). In addition, the agent receives -1 reward for standing on a sink, but the episode isn’t termianted in this case.

from minihack import RewardManager

reward_manager = RewardManager()

While the basic reward manager supports many events by default, users may want to extend this interface to define their own events. This can be done easily by inheriting from the Event class and implementing the check and reset methods. Beyond that, custom reward functions can be added to the reward manager through add_custom_reward_fn method. These functions take the environment instance, the previous observation, action taken and current observation, and should return a float.