- class minihack.wiki.NetHackWiki(raw_wiki_file_name: str, processed_wiki_file_name: str, save_processed_json: bool = True, ignore_inpage_anchors: bool = True, preprocess_input: bool = True, exceptions: Optional[tuple] = None)[source]
A class representing Nethack Wiki Data - pages and links between them.
raw_wiki_file_name (str) – The path to the raw file of NetHack wiki. The raw file can be downloaded using the get_nhwiki_data.sh script located in minihack/scripts.
processed_wiki_file_name (str) – The path to the processed file of NetHack wiki. The processing is performed in the __init__ function of this classed.
save_processed_json (bool) – Whether to save the processed json file of the wiki. Only considered when a raw wiki file is passed. Defaults to True.
ignore_inpage_anchors (bool) – Whether to ingnore in-page anchors. Defaults to True.
preprocess_input (bool) – Whether to perform a preprocessing on wiki data. Defaults to True.
exceptions (Tuple[str] or None) – Name of entities in screen descriptions that are ingored. If None, there are no exceptions. Defaults to None.
- __init__(raw_wiki_file_name: str, processed_wiki_file_name: str, save_processed_json: bool = True, ignore_inpage_anchors: bool = True, preprocess_input: bool = True, exceptions: Optional[tuple] = None) → None[source]
Initialize self. See help(type(self)) for accurate signature.
- get_page_data(page: str) → dict[source]
Get the data of a page.
page (str) – The page name.
The page data as a dict.
- Return type
- class minihack.wiki.TextProcessor[source]
Base class for modeling relations between an object and subject.
- minihack.wiki.clean_page_text(text: List[str]) → str[source]
Clean Markdown text to make it more passable into an NLP model.
This is currently very basic, and more advanced parsing could be employed if necessary.
- minihack.wiki.load_json(file_name: str) → list[source]
Load a file containing a json object per line into a list of dicts.