minihack.wiki module

class minihack.wiki.NetHackWiki(raw_wiki_file_name: str, processed_wiki_file_name: str, save_processed_json: bool = True, ignore_inpage_anchors: bool = True, preprocess_input: bool = True, exceptions: Optional[tuple] = None)[source]

Bases: object

A class representing Nethack Wiki Data - pages and links between them.

Parameters
  • raw_wiki_file_name (str) – The path to the raw file of NetHack wiki. The raw file can be downloaded using the get_nhwiki_data.sh script located in minihack/scripts.

  • processed_wiki_file_name (str) – The path to the processed file of NetHack wiki. The processing is performed in the __init__ function of this classed.

  • save_processed_json (bool) – Whether to save the processed json file of the wiki. Only considered when a raw wiki file is passed. Defaults to True.

  • ignore_inpage_anchors (bool) – Whether to ingnore in-page anchors. Defaults to True.

  • preprocess_input (bool) – Whether to perform a preprocessing on wiki data. Defaults to True.

  • exceptions (Tuple[str] or None) – Name of entities in screen descriptions that are ingored. If None, there are no exceptions. Defaults to None.

__init__(raw_wiki_file_name: str, processed_wiki_file_name: str, save_processed_json: bool = True, ignore_inpage_anchors: bool = True, preprocess_input: bool = True, exceptions: Optional[tuple] = None)None[source]

Initialize self. See help(type(self)) for accurate signature.

get_page_data(page: str)dict[source]

Get the data of a page.

Parameters

page (str) – The page name.

Returns

The page data as a dict.

Return type

dict

get_page_text(page: str)str[source]

Get the text of a page.

Parameters

page (str) – The page name.

Returns

The text of the page.

Return type

str

class minihack.wiki.TextProcessor[source]

Bases: object

Base class for modeling relations between an object and subject.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

preprocess(input_str: str)str[source]
process(input_str: str)str[source]
minihack.wiki.clean_page_text(text: List[str])str[source]

Clean Markdown text to make it more passable into an NLP model.

This is currently very basic, and more advanced parsing could be employed if necessary.

minihack.wiki.load_json(file_name: str)list[source]

Load a file containing a json object per line into a list of dicts.

minihack.wiki.process_json(wiki_json: List[dict], ignore_inpage_anchors)dict[source]

Process a list of json pages of the wiki into one dict of all pages.