# TorchBeast

To get started with MiniHack environments, we provide baseline agents using the [TorchBeast](https://github.com/facebookresearch/torchbeast) framework.
TorchBeast provides a [PyTorch](https://pytorch.org/) implementation of [IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures](https://arxiv.org/abs/1802.01561).

TorchBeast comes in two variants: MonoBeast and PolyBeast. PolyBeast is the more powerful version of the framework and allows training agents across multiple machines. For further details, see the [TorchBeast paper](https://arxiv.org/abs/1910.03552).

For MiniHack, we use the PolyBeast implementation of TorchBeast and additionally provide an implementation of the following exploration methods:
- [RND: Exploration by Random Network Distillation](https://arxiv.org/abs/1810.12894)
- [RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments](https://arxiv.org/abs/2002.12292)

## Installation

To install and train a polybeast agent in MiniHack, first install polybeast by following the instructions [here](https://github.com/facebookresearch/torchbeast#installing-polybeast), then use the following commands:

```bash
pip install ".[polybeast]"
# Test IMPALA run
python3 -m minihack.agent.polybeast.polyhydra env=MiniHack-Room-5x5-v0 total_steps=100000
```

## Running Experiments

We use the [hydra](https://github.com/facebookresearch/hydra) framework for configuring our experiments. All environment and training parameters can be specified using command line arguments (or edited directly in `config.yaml`). See `config.yaml` file in  `minihack.agent.polybeast` for more information. Be sure to set up appropriate parameters for logging with [wandb](https://wandb.ai/site) (disabled by default).


```bash
# Single IMPALA run
python3 -m minihack.agent.polybeast.polyhydra model=baseline env=MiniHack-Room-5x5-v0 total_steps=1000000

# Single RND run
python3 -m minihack.agent.polybeast.polyhydra model=rnd env=MiniHack-Room-5x5-v0 total_steps=1000000

# Single RND run
python3 -m minihack.agent.polybeast.polyhydra model=ride state_counter=coordinates env=MiniHack-Room-5x5-v0 total_steps=1000000

# To perform a sweep on the cluster: add another --multirun command and comma-separate values
python3 -m minihack.agent.polybeast.polyhydra --multirun model=baseline,rnd env=MiniHack-Room-Random-15x15-v0,MiniHack-Room-Monster-15x15-v0 total_steps=10000000
```

## Replicating the Results of the Paper

To replicate results of the paper performed using polybeast, simply run a sweep of 5 runs with IMPALA, RND or RIDE agents on the desired environments as follows:

```bash
python3 -m minihack.agent.polybeast.polyhydra --multirun model=baseline name=1,2,3,4,5 env=MiniHack-Room-Random-15x15-v0,MiniHack-Room-Monster-15x15-v0 total_steps=10000000
```

For navigation tasks, the default parameters are already set. For skill acquisition tasks, additionally set `learning_rate=0.00005 msg.model=lt_cnn`.

The learning curves for all of our polybeast experiments can be accessed in our [Weights&Biases repository](https://wandb.ai/minihack).

## Evaluate and Watch

The following script allows to evaluate the performance of a model pre-trained with polybeast:

```bash
# Watch the learned behaviour step-by-step in the terminal
python3 -m minihack.agent.polybeast.evaluate --env MiniHack-Room-5x5-v0 -c /path/to/checkpoint/directory --watch

# Evaluate the pre-trained model for 1 episode and save the replay as a GIF file
python3 -m minihack.agent.polybeast.evaluate --env MiniHack-Room-5x5-v0 -c /path/to/checkpoint/directory -n 1 --no-watch --save_gif --gif_path replay.gif

# Print all options of the evaluation script
python3 -m minihack.agent.polybeast.evaluate --help
```