core
convert_xml_encoding
convert_xml_encoding (input_file, output_file)
XMLBreaker
XMLBreaker (split_element, break_after=1000, split_attr=None, split_value=None, out=None, *args, **kwargs)
Interface for receiving logical document content events.
This is the main callback interface in SAX, and the one most important to applications. The order of events in this interface mirrors the order of the information in the document.
CycleFile
CycleFile (filename)
Initialize self. See help(type(self)) for accurate signature.
TeiUtils
TeiUtils ()
Utility class for working with TEI files.
TeiUtils.download
TeiUtils.download (url:str, path:str)
Download a file from a specified URL to a local path.
Args: url: The URL from which to download the file. path: The local file path to save the downloaded file.
TeiUtils.get_tag_freq
TeiUtils.get_tag_freq (path:str)
Read an XML file from a specified path and count the frequency of each tag.
The frequencies are stored in an attribute tag_counts
. A sorted DataFrame of tags and counts is stored in df
and df_tag
.
Args: path: The file path of the XML file to parse.
TeiUtils.get_javascript
TeiUtils.get_javascript ()
Generate JavaScript code to check checkboxes in the TEI tag list.
TeiUtils.split_xml_file
TeiUtils.split_xml_file (input_file, output_basename, split_element, break_after=1, split_attr=None, split_value=None)
Splits an XML file into multiple files based on the provided element and attributes.
Args:
input_file (str): The path to the input XML file.
output_basename (str): The base name for the output files.
split_element (str): The name of the element to split on.
break_after (int): The number of occurrences of the element after which to split.
split_attr (str): Optional. The attribute name to further refine the split condition.
split_value (str): Optional. The attribute value to further refine the split condition.