xpfcorpus Documentation

A Python package for grapheme-to-phoneme transcription based on the XPF Corpus.

XPF Resources:

Installation

pip install xpfcorpus

Quick Example

from xpfcorpus import Transcriber

# Basic usage
es = Transcriber("es")
result = es.transcribe("ejemplo")
print(result)  # ['e', 'x', 'e', 'm', 'p', 'l', 'o']

Features

  • 201 languages with 203 language/script combinations

  • Pure Python with no required dependencies

  • BCP-47 language codes support (e.g., “es-ES”, “yi-Latn”)

  • Multiple data sources: bundled JSON, external YAML, or legacy formats

  • Command-line interface for batch processing

  • 100% verification - all 203 language/script combinations pass verification tests

Indices and tables