t9

Import from those modules directly or use the main package imports.

t9.corpus.extractor

Comment extraction and cleaning from Reddit JSON data.

CommentExtractor Objects

class CommentExtractor()

Extracts and cleans Reddit comments from JSON files.

init

def __init__()

Initialize comment extractor.

extract_comments_from_file

def extract_comments_from_file(json_file: Path) -> Iterator[str]

Extract comment text from a single JSON file.

Arguments:

json_file - Path to JSON file containing Reddit data

Yields:

Raw comment text strings

extract_comments_from_files

def extract_comments_from_files(json_files: List[Path]) -> Iterator[str]

Extract comment text from multiple JSON files.

Arguments:

json_files - List of JSON file paths

Yields:

Raw comment text strings

clean_comment

def clean_comment(comment: str) -> str

Clean a single comment by removing quotes, decoding entities, and cleaning markdown.

Arguments:

comment - Raw comment text

Returns:

Cleaned comment text

extract_and_clean_comments

def extract_and_clean_comments(json_files: List[Path]) -> List[str]

Extract and clean all comments from JSON files.

Arguments:

json_files - List of JSON file paths

Returns:

List of cleaned comment strings

save_comments

def save_comments(comments: List[str], output_file: Path) -> None

Save cleaned comments to a text file.

Arguments:

comments - List of cleaned comment strings
output_file - Path to output file

process_json_files

def process_json_files(json_files: List[Path], output_file: Path) -> Path

Complete processing pipeline: extract, clean, and save comments.

Arguments:

json_files - List of input JSON files
output_file - Path to output text file

Returns:

Path to saved output file

t9.corpus

Corpus generation tools for T9 wordlist creation.

t9.corpus.cli

CLI commands for corpus generation.

cmd_scrape

def cmd_scrape(args) -> int

Scrape Reddit user data.

Arguments:

args - Parsed command line arguments

Returns:

Exit code

cmd_extract

def cmd_extract(args) -> int

Extract and clean comments from JSON files.

Arguments:

args - Parsed command line arguments

Returns:

Exit code

cmd_process

def cmd_process(args) -> int

Process corpus text to create frequency wordlist.

Arguments:

args - Parsed command line arguments

Returns:

Exit code

cmd_generate

def cmd_generate(args) -> int

Complete corpus generation pipeline.

Arguments:

args - Parsed command line arguments

Returns:

Exit code

add_corpus_commands

def add_corpus_commands(subparsers) -> None

Add corpus subcommands to argument parser.

Arguments:

subparsers - ArgumentParser subparsers object

t9.corpus.scraper

Reddit corpus data scraper using reddit-export package.

RedditScraper Objects

class RedditScraper()

Scraper for downloading Reddit user comment data.

init

def __init__(output_dir: Path)

Initialize scraper with output directory.

Arguments:

output_dir - Directory to save scraped JSON files

scrape_user

def scrape_user(username: str) -> Optional[Path]

Scrape comments for a single Reddit user.

Arguments:

username - Reddit username to scrape

Returns:

Path to saved JSON file, or None if failed

scrape_users

def scrape_users(usernames: List[str]) -> List[Path]

Scrape comments for multiple Reddit users.

Arguments:

usernames - List of Reddit usernames to scrape

Returns:

List of paths to saved JSON files

get_scraped_files

def get_scraped_files() -> List[Path]

Get list of all scraped JSON files in output directory.

Returns:

List of JSON file paths

interactive_scrape

def interactive_scrape() -> List[Path]

Interactive scraping - prompt for usernames.

Returns:

List of paths to saved JSON files

t9.corpus.processor

Text processing and frequency-based wordlist generation.

CorpusProcessor Objects

class CorpusProcessor()

Processes text corpus to create frequency-ordered wordlists.

init

def __init__()

Initialize corpus processor.

split_sentences

def split_sentences(text: str) -> str

Split text on sentence endings and remove first word from each line.

This helps avoid proper noun vs sentence-start capitalization issues.

Arguments:

text - Input text

Returns:

Text with sentences split and first words removed

tokenize_text

def tokenize_text(text: str) -> List[str]

Tokenize text into individual words.

Arguments:

text - Input text

Returns:

List of tokenized words

count_word_frequencies

def count_word_frequencies(words: List[str]) -> Counter

Count word frequencies (case-insensitive).

Arguments:

words - List of words

Returns:

Counter with word frequencies

load_dictionary_words

def load_dictionary_words(wordlist_file: Optional[Path] = None) -> Set[str]

Load dictionary words (case-insensitive).

Arguments:

wordlist_file - Optional path to wordlist file, uses system dict if None

Returns:

Set of dictionary words in lowercase

create_frequency_wordlist

def create_frequency_wordlist(corpus_frequencies: Counter,
                              dictionary_words: Set[str],
                              output_file: Path) -> int

Create frequency-ordered wordlist.

Arguments:

corpus_frequencies - Word frequency counter from corpus
dictionary_words - Set of valid dictionary words (lowercase)
output_file - Output file path

Returns:

Total number of words written

process_corpus_file

def process_corpus_file(corpus_file: Path,
                        wordlist_file: Optional[Path] = None,
                        output_file: Optional[Path] = None) -> Path

Complete corpus processing pipeline.

Arguments:

corpus_file - Input corpus text file
wordlist_file - Optional dictionary wordlist file
output_file - Optional output file path

Returns:

Path to generated frequency wordlist

t9.utils

Utility functions for PY9 T9 text input system.

getkey

def getkey(word)

Convert a word to T9 keypress sequence.

Example: “hello” -> “43556”

read_wordlist

def read_wordlist(filename)

Read lines from a wordlist file, handling both plain and gzipped files.

Yields stripped non-empty lines from the file.

Arguments:

filename - Path to the wordlist file (.txt or .txt.gz)

Yields:

str - Each non-empty line from the file, stripped of whitespace

get_wordlists_dir

def get_wordlists_dir()

Get the path to the wordlists directory.

get_cache_dir

def get_cache_dir()

Get the path to the cache directory for dictionaries.

get_locale

def get_locale()

Get user’s locale as (language, region) tuple.

Returns:

tuple - (language, region) or (None, None) if not found

find_wordlist

def find_wordlist(language, region=None)

Find wordlist file for given language and region.

Arguments:

language - Language code (e.g., “en”)
region - Region code (e.g., “GB”) or None

Returns:

Path to wordlist file if found, None otherwise

get_system_wordlist

def get_system_wordlist()

Get system dictionary file, resolving symlinks.

Returns:

Path to system wordlist if found, None otherwise

find_or_generate_dict

def find_or_generate_dict(language=None, region=None)

Find or generate dictionary file using fallback chain.

Arguments:

language - Language code or None to auto-detect
region - Region code or None to auto-detect

Returns:

Path to dictionary file if found/generated, None if failed

draw_keypad

def draw_keypad()

Draw a T9 keypad layout for reference.

t9.maket9

T9.py - a predictive text dictionary in the style of Nokia’s T9

File Format… Header: String[7] = “PY9DICT:” Unsigned Long = Number of words Unsigned Long = root node’s start position

Node block: Unsigned Long[4] =

t9.demo

T9 demo application.

clear_screen

def clear_screen()

Clear the terminal screen.

get_input

def get_input()

Get a single character input, handling different platforms.

handle_input

def handle_input(char, input_obj)

Handle input character and return True if should exit.

draw_screen

def draw_screen(input_obj)

Draw the complete T9 interface screen.

run_demo

def run_demo(dict_file=None, language=None, region=None)

Run the T9 demo application.

t9.key

Dictionary node class for PY9 T9 text input system.

SaveState Objects

class SaveState(IntEnum)

Node save state for dictionary file operations.

UNCHANGED

Node doesn’t need any file operation

UPDATE

Existing node needs reference update

NEW

New node needs full save to file

T9Key Objects

class T9Key()

Dictionary node for file-based keypress dictionary storage.

save

def save(f)

Save the node and all child nodes to file f. Used when creating dictionary file.

savenode

def savenode(f)

Save just this node to the file. Used to add or overwrite a node.

loadnode

def loadnode(f)

Load a node from an open file object.

t9.mode

Input modes for PY9 T9 text input system.

InputMode Objects

class InputMode(IntEnum)

Available input modes.

NAVIGATE

Navigate/predictive mode

EDIT_WORD

Edit complete word mode

EDIT_CHAR

Edit individual characters mode

TEXT_LOWER

Lowercase text entry mode

TEXT_UPPER

Uppercase text entry mode

NUMERIC

Numeric entry mode

get_label

def get_label(mode)

Get the display label for a mode.

get_help

def get_help(mode)

Get the key help text for a mode.

t9.cli

Command-line interface for PY9 T9 text input system.

run_demo

def run_demo(dict_file=None, language=None, region=None)

Run the T9 demo application.

generate_dict

def generate_dict(wordlist, output, language="Unknown", comment="")

Generate a T9 dictionary from a wordlist file.

main

def main()

Main CLI entry point.

t9.dict

Dictionary class for PY9 T9 text input system.

T9Dict Objects

class T9Dict()

T9 dictionary for word lookups and modifications.

init

def __init__(dict_file)

Create a T9 dictionary class and load file header info.

dict_file: path to dictionary file

File format:

word count (4 bytes)
root node pos (4 bytes)
language string (variable)
comment string (variable)

getwords

def getwords(digits)

Get possible words for a T9 digit sequence.

Returns:

If len(result[0]) == len(digits): exact match found
If len(result[0]) > len(digits): lookahead used
If len(result[0]) < len(digits): lookbehind used

addword

def addword(word)

Add a word to the dictionary. Raises KeyError if word already exists.

delword

def delword(word)

Delete a word from the dictionary. Not implemented yet.

t9.input

Input parser class for PY9 T9 text input system.

T9Input Objects

class T9Input()

T9 input parser that handles keypresses and text manipulation.

Send keypresses with sendkeys(), retrieve display text with gettext(), get raw text with text().

init

def __init__(dict_file,
             defaulttxt="",
             defaultmode=0,
             keydelay=0.5,
             numeric=False)

Create a new input parser.

dict_file: dictionary file name defaulttxt: text to start with defaultmode: mode to start in (NAVIGATE=Predictive, TEXT_LOWER, TEXT_UPPER, NUMERIC) keydelay: key timeout in TXT mode numeric: NOT IMPLEMENTED YET

gettext

def gettext()

Get current text including cursor for display. For raw text use .text()

text

def text()

Get text buffer without cursor.

posword

def posword()

Get word with position marker.

setword

def setword()

Change current word to first valid match.

nextword

def nextword()

In edit mode 1, move to next word if possible.

nextchar

def nextchar()

In edit mode 2, select next letter in the group.

addkeypress

def addkeypress(key)

Add a keypress to current input.

sendkeys

def sendkeys(keys)

Send action keys (0-9, U/D/L/R/S).

UDLRS = UP, DOWN, LEFT, RIGHT, SELECT

Navigation mode: 0-9: start new word ULR: navigate cursor D: backspace S: switch to text mode

Edit mode (1+2): 0-9: append keystroke Complete word (mode 1): LR: accept word U: change word D: backspace S: character edit mode Incomplete word (mode 2): L: back letter R: confirm letter U: change letter D: delete char S: change case

Text modes (3+4): 0-9: character input ULR: navigate D: backspace S: cycle modes

Number mode (5): 0-9: numbers ULR: navigate D: backspace S: back to navigation

t9.constants

Constants for PY9 T9 text input system.

Key Objects

class Key(Enum)

T9 input keys - string values match the actual key characters used.

SELECT

Mode switch/Select

UP

Up/Previous word

DOWN

Down/Delete/Backspace

LEFT

Left/Back

RIGHT

Right/Forward