core

Utility class for working with TEI files.

source

convert_xml_encoding

 convert_xml_encoding (input_file, output_file)

source

XMLBreaker

 XMLBreaker (split_element, break_after=1000, split_attr=None,
             split_value=None, out=None, *args, **kwargs)

Interface for receiving logical document content events.

This is the main callback interface in SAX, and the one most important to applications. The order of events in this interface mirrors the order of the information in the document.


source

CycleFile

 CycleFile (filename)

Initialize self. See help(type(self)) for accurate signature.


source

TeiUtils

 TeiUtils ()

Utility class for working with TEI files.


source

TeiUtils.download

 TeiUtils.download (url:str, path:str)

Download a file from a specified URL to a local path.

Args: url: The URL from which to download the file. path: The local file path to save the downloaded file.


source

TeiUtils.get_tag_freq

 TeiUtils.get_tag_freq (path:str)

Read an XML file from a specified path and count the frequency of each tag.

The frequencies are stored in an attribute tag_counts. A sorted DataFrame of tags and counts is stored in df and df_tag.

Args: path: The file path of the XML file to parse.


source

TeiUtils.get_javascript

 TeiUtils.get_javascript ()

Generate JavaScript code to check checkboxes in the TEI tag list.


source

TeiUtils.split_xml_file

 TeiUtils.split_xml_file (input_file, output_basename, split_element,
                          break_after=1, split_attr=None,
                          split_value=None)

Splits an XML file into multiple files based on the provided element and attributes.

Args:
input_file (str): The path to the input XML file.
output_basename (str): The base name for the output files.
split_element (str): The name of the element to split on.
break_after (int): The number of occurrences of the element after which to split.
split_attr (str): Optional. The attribute name to further refine the split condition.
split_value (str): Optional. The attribute value to further refine the split condition.