Command-Line Interface

The xpfcorpus command provides a comprehensive CLI for transcription and language management.

Commands

transcribe

Transcribe words from graphemes to phonemes.

xpfcorpus transcribe LANGUAGE [WORDS...] [OPTIONS]

Arguments:

LANGUAGE - Language code (e.g., “es”, “tt”). Supports BCP-47 style codes with script/region (e.g., “es-ES”, “yi-Latn”, “tt-cyrillic”)
WORDS - Words to transcribe (optional if using -f)

Options:

-s, --script SCRIPT - Script to use (e.g., “latin”, “cyrillic”)
-f, --file FILE - Read words from FILE (use “-” for stdin). Extracts first word from each line.
--yaml FILE - Use external YAML file
--rules FILE - Use legacy .rules file
--verify-file FILE - Use legacy .verify file
--no-verify - Skip verification on load
--json - Output as JSON

Examples:

# Transcribe command-line arguments
xpfcorpus transcribe es ejemplo hola mundo

# From a file
xpfcorpus transcribe es -f words.txt

# From stdin
echo -e "mundo\nbueno" | xpfcorpus transcribe es
cat words.txt | xpfcorpus transcribe es -f -

# Combine sources
xpfcorpus transcribe es ejemplo -f more_words.txt

# With BCP-47 codes
xpfcorpus transcribe es-ES ejemplo
xpfcorpus transcribe yi-Latn shalom
xpfcorpus transcribe tt-cyrillic привет

# JSON output
xpfcorpus transcribe es ejemplo --json

# Explicit script
xpfcorpus transcribe tt -s cyrillic привет

list

List all available languages.

xpfcorpus list [OPTIONS]

Options:

--json - Output as JSON

Examples:

# Human-readable list
xpfcorpus list

# JSON format
xpfcorpus list --json

Output Format:

Available languages: 201

  es: latin (default: latin)
  tt: latin, cyrillic
  yi: hebrew (default: hebrew)
  ...

export

Export a language’s rules as YAML.

xpfcorpus export LANGUAGE [OPTIONS]

Arguments:

LANGUAGE - Language code to export

Options:

-o, --output FILE - Output file (default: stdout)

Examples:

# To stdout
xpfcorpus export es

# To file
xpfcorpus export es -o spanish.yaml

verify

Verify language rules against test data.

xpfcorpus verify [LANGUAGE] [OPTIONS]

Arguments:

LANGUAGE - Language code to verify (required unless using --all)

Options:

-s, --script SCRIPT - Script to verify (for multi-script languages)
--all - Verify all languages (all scripts for multi-script languages)
-v, --verbose - Show error details
-q, --quiet - Only show summary
--json - Output as JSON

Examples:

# Verify single language
xpfcorpus verify es

# With details
xpfcorpus verify es -v

# Specific script
xpfcorpus verify tt -s cyrillic

# All languages
xpfcorpus verify --all

# Quiet mode (just pass/fail count)
xpfcorpus verify --all -q

# JSON output
xpfcorpus verify --all --json

Output Format:

aak: PASS
ab: PASS
acf: PASS
...
tt-cyrillic: PASS
tt-latin: PASS
...

Results: 203/203 passed

Input File Format

When using the -f option with the transcribe command, the input file should have one word per line. The first word on each line is extracted (splitting on whitespace or commas).

# words.txt
ejemplo
hola, mundo
buenos días

# Comments are ignored

Lines starting with # are treated as comments and ignored.

Exit Codes

0 - Success
1 - Error (language not found, verification failed, etc.)
127 - Command not found (package not installed)