t9

Import from those modules directly or use the main package imports.

t9.corpus.extractor

Comment extraction and cleaning from Reddit JSON data.

CommentExtractor Objects

class CommentExtractor()

Extracts and cleans Reddit comments from JSON files.

__init__

def __init__()

Initialize comment extractor.

extract_comments_from_file

def extract_comments_from_file(json_file: Path) -> Iterator[str]

Extract comment text from a single JSON file.

Arguments:

Yields:

Raw comment text strings

extract_comments_from_files

def extract_comments_from_files(json_files: List[Path]) -> Iterator[str]

Extract comment text from multiple JSON files.

Arguments:

Yields:

Raw comment text strings

clean_comment

def clean_comment(comment: str) -> str

Clean a single comment by removing quotes, decoding entities, and cleaning markdown.

Arguments:

Returns:

Cleaned comment text

extract_and_clean_comments

def extract_and_clean_comments(json_files: List[Path]) -> List[str]

Extract and clean all comments from JSON files.

Arguments:

Returns:

List of cleaned comment strings

save_comments

def save_comments(comments: List[str], output_file: Path) -> None

Save cleaned comments to a text file.

Arguments:

process_json_files

def process_json_files(json_files: List[Path], output_file: Path) -> Path

Complete processing pipeline: extract, clean, and save comments.

Arguments:

Returns:

Path to saved output file

t9.corpus

Corpus generation tools for T9 wordlist creation.

t9.corpus.cli

CLI commands for corpus generation.

cmd_scrape

def cmd_scrape(args) -> int

Scrape Reddit user data.

Arguments:

Returns:

Exit code

cmd_extract

def cmd_extract(args) -> int

Extract and clean comments from JSON files.

Arguments:

Returns:

Exit code

cmd_process

def cmd_process(args) -> int

Process corpus text to create frequency wordlist.

Arguments:

Returns:

Exit code

cmd_generate

def cmd_generate(args) -> int

Complete corpus generation pipeline.

Arguments:

Returns:

Exit code

add_corpus_commands

def add_corpus_commands(subparsers) -> None

Add corpus subcommands to argument parser.

Arguments:

t9.corpus.scraper

Reddit corpus data scraper using reddit-export package.

RedditScraper Objects

class RedditScraper()

Scraper for downloading Reddit user comment data.

__init__

def __init__(output_dir: Path)

Initialize scraper with output directory.

Arguments:

scrape_user

def scrape_user(username: str) -> Optional[Path]

Scrape comments for a single Reddit user.

Arguments:

Returns:

Path to saved JSON file, or None if failed

scrape_users

def scrape_users(usernames: List[str]) -> List[Path]

Scrape comments for multiple Reddit users.

Arguments:

Returns:

List of paths to saved JSON files

get_scraped_files

def get_scraped_files() -> List[Path]

Get list of all scraped JSON files in output directory.

Returns:

List of JSON file paths

interactive_scrape

def interactive_scrape() -> List[Path]

Interactive scraping - prompt for usernames.

Returns:

List of paths to saved JSON files

t9.corpus.processor

Text processing and frequency-based wordlist generation.

CorpusProcessor Objects

class CorpusProcessor()

Processes text corpus to create frequency-ordered wordlists.

__init__

def __init__()

Initialize corpus processor.

split_sentences

def split_sentences(text: str) -> str

Split text on sentence endings and remove first word from each line.

This helps avoid proper noun vs sentence-start capitalization issues.

Arguments:

Returns:

Text with sentences split and first words removed

tokenize_text

def tokenize_text(text: str) -> List[str]

Tokenize text into individual words.

Arguments:

Returns:

List of tokenized words

count_word_frequencies

def count_word_frequencies(words: List[str]) -> Counter

Count word frequencies (case-insensitive).

Arguments:

Returns:

Counter with word frequencies

load_dictionary_words

def load_dictionary_words(wordlist_file: Optional[Path] = None) -> Set[str]

Load dictionary words (case-insensitive).

Arguments:

Returns:

Set of dictionary words in lowercase

create_frequency_wordlist

def create_frequency_wordlist(corpus_frequencies: Counter,
                              dictionary_words: Set[str],
                              output_file: Path) -> int

Create frequency-ordered wordlist.

Arguments:

Returns:

Total number of words written

process_corpus_file

def process_corpus_file(corpus_file: Path,
                        wordlist_file: Optional[Path] = None,
                        output_file: Optional[Path] = None) -> Path

Complete corpus processing pipeline.

Arguments:

Returns:

Path to generated frequency wordlist

t9.utils

Utility functions for PY9 T9 text input system.

getkey

def getkey(word)

Convert a word to T9 keypress sequence.

Example: “hello” -> “43556”

read_wordlist

def read_wordlist(filename)

Read lines from a wordlist file, handling both plain and gzipped files.

Yields stripped non-empty lines from the file.

Arguments:

Yields:

get_wordlists_dir

def get_wordlists_dir()

Get the path to the wordlists directory.

get_cache_dir

def get_cache_dir()

Get the path to the cache directory for dictionaries.

get_locale

def get_locale()

Get user’s locale as (language, region) tuple.

Returns:

find_wordlist

def find_wordlist(language, region=None)

Find wordlist file for given language and region.

Arguments:

Returns:

Path to wordlist file if found, None otherwise

get_system_wordlist

def get_system_wordlist()

Get system dictionary file, resolving symlinks.

Returns:

Path to system wordlist if found, None otherwise

find_or_generate_dict

def find_or_generate_dict(language=None, region=None)

Find or generate dictionary file using fallback chain.

Arguments:

Returns:

Path to dictionary file if found/generated, None if failed

draw_keypad

def draw_keypad()

Draw a T9 keypad layout for reference.

t9.maket9

T9.py - a predictive text dictionary in the style of Nokia’s T9

File Format… Header: String[7] = “PY9DICT:” Unsigned Long = Number of words Unsigned Long = root node’s start position

Node block: Unsigned Long[4] =

t9.demo

T9 demo application.

clear_screen

def clear_screen()

Clear the terminal screen.

get_input

def get_input()

Get a single character input, handling different platforms.

handle_input

def handle_input(char, input_obj)

Handle input character and return True if should exit.

draw_screen

def draw_screen(input_obj)

Draw the complete T9 interface screen.

run_demo

def run_demo(dict_file=None, language=None, region=None)

Run the T9 demo application.

t9.key

Dictionary node class for PY9 T9 text input system.

SaveState Objects

class SaveState(IntEnum)

Node save state for dictionary file operations.

UNCHANGED

Node doesn’t need any file operation

UPDATE

Existing node needs reference update

NEW

New node needs full save to file

T9Key Objects

class T9Key()

Dictionary node for file-based keypress dictionary storage.

save

def save(f)

Save the node and all child nodes to file f. Used when creating dictionary file.

savenode

def savenode(f)

Save just this node to the file. Used to add or overwrite a node.

loadnode

def loadnode(f)

Load a node from an open file object.

t9.mode

Input modes for PY9 T9 text input system.

InputMode Objects

class InputMode(IntEnum)

Available input modes.

Navigate/predictive mode

EDIT_WORD

Edit complete word mode

EDIT_CHAR

Edit individual characters mode

TEXT_LOWER

Lowercase text entry mode

TEXT_UPPER

Uppercase text entry mode

NUMERIC

Numeric entry mode

get_label

def get_label(mode)

Get the display label for a mode.

get_help

def get_help(mode)

Get the key help text for a mode.

t9.cli

Command-line interface for PY9 T9 text input system.

run_demo

def run_demo(dict_file=None, language=None, region=None)

Run the T9 demo application.

generate_dict

def generate_dict(wordlist, output, language="Unknown", comment="")

Generate a T9 dictionary from a wordlist file.

main

def main()

Main CLI entry point.

t9.dict

Dictionary class for PY9 T9 text input system.

T9Dict Objects

class T9Dict()

T9 dictionary for word lookups and modifications.

__init__

def __init__(dict_file)

Create a T9 dictionary class and load file header info.

dict_file: path to dictionary file

File format:

getwords

def getwords(digits)

Get possible words for a T9 digit sequence.

Returns:

addword

def addword(word)

Add a word to the dictionary. Raises KeyError if word already exists.

delword

def delword(word)

Delete a word from the dictionary. Not implemented yet.

t9.input

Input parser class for PY9 T9 text input system.

T9Input Objects

class T9Input()

T9 input parser that handles keypresses and text manipulation.

Send keypresses with sendkeys(), retrieve display text with gettext(), get raw text with text().

__init__

def __init__(dict_file,
             defaulttxt="",
             defaultmode=0,
             keydelay=0.5,
             numeric=False)

Create a new input parser.

dict_file: dictionary file name defaulttxt: text to start with defaultmode: mode to start in (NAVIGATE=Predictive, TEXT_LOWER, TEXT_UPPER, NUMERIC) keydelay: key timeout in TXT mode numeric: NOT IMPLEMENTED YET

gettext

def gettext()

Get current text including cursor for display. For raw text use .text()

text

def text()

Get text buffer without cursor.

posword

def posword()

Get word with position marker.

setword

def setword()

Change current word to first valid match.

nextword

def nextword()

In edit mode 1, move to next word if possible.

nextchar

def nextchar()

In edit mode 2, select next letter in the group.

addkeypress

def addkeypress(key)

Add a keypress to current input.

sendkeys

def sendkeys(keys)

Send action keys (0-9, U/D/L/R/S).

UDLRS = UP, DOWN, LEFT, RIGHT, SELECT

Navigation mode: 0-9: start new word ULR: navigate cursor D: backspace S: switch to text mode

Edit mode (1+2): 0-9: append keystroke Complete word (mode 1): LR: accept word U: change word D: backspace S: character edit mode Incomplete word (mode 2): L: back letter R: confirm letter U: change letter D: delete char S: change case

Text modes (3+4): 0-9: character input ULR: navigate D: backspace S: cycle modes

Number mode (5): 0-9: numbers ULR: navigate D: backspace S: back to navigation

t9.constants

Constants for PY9 T9 text input system.

Key Objects

class Key(Enum)

T9 input keys - string values match the actual key characters used.

SELECT

Mode switch/Select

UP

Up/Previous word

DOWN

Down/Delete/Backspace

LEFT

Left/Back

Right/Forward