minihack.wiki module

class minihack.wiki.NetHackWiki(raw_wiki_file_name: str, processed_wiki_file_name: str, save_processed_json: bool = True, ignore_inpage_anchors: bool = True, preprocess_input: bool = True, exceptions: Optional[tuple] = None)[source]

Bases: object

A class representing Nethack Wiki Data - pages and links between them.

Parameters

raw_wiki_file_name (str) – The path to the raw file of NetHack wiki. The raw file can be downloaded using the get_nhwiki_data.sh script located in minihack/scripts.
processed_wiki_file_name (str) – The path to the processed file of NetHack wiki. The processing is performed in the __init__ function of this classed.
save_processed_json (bool) – Whether to save the processed json file of the wiki. Only considered when a raw wiki file is passed. Defaults to True.
ignore_inpage_anchors (bool) – Whether to ingnore in-page anchors. Defaults to True.
preprocess_input (bool) – Whether to perform a preprocessing on wiki data. Defaults to True.
exceptions (Tuple[str] or None) – Name of entities in screen descriptions that are ingored. If None, there are no exceptions. Defaults to None.

__init__(raw_wiki_file_name: str, processed_wiki_file_name: str, save_processed_json: bool = True, ignore_inpage_anchors: bool = True, preprocess_input: bool = True, exceptions: Optional[tuple] = None) → None[source]: Initialize self. See help(type(self)) for accurate signature.

get_page_data(page: str) → dict[source]

Get the data of a page.

Parameters: page (str) – The page name.
Returns: The page data as a dict.
Return type: dict

get_page_text(page: str) → str[source]

Get the text of a page.

Parameters: page (str) – The page name.
Returns: The text of the page.
Return type: str

class minihack.wiki.TextProcessor[source]

Bases: object

Base class for modeling relations between an object and subject.

__init__()[source]: Initialize self. See help(type(self)) for accurate signature.

preprocess(input_str: str) → str[source]

process(input_str: str) → str[source]

minihack.wiki.clean_page_text(text: List[str]) → str[source]

Clean Markdown text to make it more passable into an NLP model.

This is currently very basic, and more advanced parsing could be employed if necessary.

minihack.wiki.load_json(file_name: str) → list[source]: Load a file containing a json object per line into a list of dicts.

minihack.wiki.process_json(wiki_json: List[dict], ignore_inpage_anchors) → dict[source]: Process a list of json pages of the wiki into one dict of all pages.