t9
Import from those modules directly or use the main package imports.
t9.corpus.extractor
Comment extraction and cleaning from Reddit JSON data.
CommentExtractor Objects
class CommentExtractor()
Extracts and cleans Reddit comments from JSON files.
__init__
def __init__()
Initialize comment extractor.
extract_comments_from_file
def extract_comments_from_file(json_file: Path) -> Iterator[str]
Extract comment text from a single JSON file.
Arguments:
json_file
- Path to JSON file containing Reddit data
Yields:
Raw comment text strings
extract_comments_from_files
def extract_comments_from_files(json_files: List[Path]) -> Iterator[str]
Extract comment text from multiple JSON files.
Arguments:
json_files
- List of JSON file paths
Yields:
Raw comment text strings
clean_comment
def clean_comment(comment: str) -> str
Clean a single comment by removing quotes, decoding entities, and cleaning markdown.
Arguments:
comment
- Raw comment text
Returns:
Cleaned comment text
extract_and_clean_comments
def extract_and_clean_comments(json_files: List[Path]) -> List[str]
Extract and clean all comments from JSON files.
Arguments:
json_files
- List of JSON file paths
Returns:
List of cleaned comment strings
save_comments
def save_comments(comments: List[str], output_file: Path) -> None
Save cleaned comments to a text file.
Arguments:
comments
- List of cleaned comment stringsoutput_file
- Path to output file
process_json_files
def process_json_files(json_files: List[Path], output_file: Path) -> Path
Complete processing pipeline: extract, clean, and save comments.
Arguments:
json_files
- List of input JSON filesoutput_file
- Path to output text file
Returns:
Path to saved output file
t9.corpus
Corpus generation tools for T9 wordlist creation.
t9.corpus.cli
CLI commands for corpus generation.
cmd_scrape
def cmd_scrape(args) -> int
Scrape Reddit user data.
Arguments:
args
- Parsed command line arguments
Returns:
Exit code
cmd_extract
def cmd_extract(args) -> int
Extract and clean comments from JSON files.
Arguments:
args
- Parsed command line arguments
Returns:
Exit code
cmd_process
def cmd_process(args) -> int
Process corpus text to create frequency wordlist.
Arguments:
args
- Parsed command line arguments
Returns:
Exit code
cmd_generate
def cmd_generate(args) -> int
Complete corpus generation pipeline.
Arguments:
args
- Parsed command line arguments
Returns:
Exit code
add_corpus_commands
def add_corpus_commands(subparsers) -> None
Add corpus subcommands to argument parser.
Arguments:
subparsers
- ArgumentParser subparsers object
t9.corpus.scraper
Reddit corpus data scraper using reddit-export package.
RedditScraper Objects
class RedditScraper()
Scraper for downloading Reddit user comment data.
__init__
def __init__(output_dir: Path)
Initialize scraper with output directory.
Arguments:
output_dir
- Directory to save scraped JSON files
scrape_user
def scrape_user(username: str) -> Optional[Path]
Scrape comments for a single Reddit user.
Arguments:
username
- Reddit username to scrape
Returns:
Path to saved JSON file, or None if failed
scrape_users
def scrape_users(usernames: List[str]) -> List[Path]
Scrape comments for multiple Reddit users.
Arguments:
usernames
- List of Reddit usernames to scrape
Returns:
List of paths to saved JSON files
get_scraped_files
def get_scraped_files() -> List[Path]
Get list of all scraped JSON files in output directory.
Returns:
List of JSON file paths
interactive_scrape
def interactive_scrape() -> List[Path]
Interactive scraping - prompt for usernames.
Returns:
List of paths to saved JSON files
t9.corpus.processor
Text processing and frequency-based wordlist generation.
CorpusProcessor Objects
class CorpusProcessor()
Processes text corpus to create frequency-ordered wordlists.
__init__
def __init__()
Initialize corpus processor.
split_sentences
def split_sentences(text: str) -> str
Split text on sentence endings and remove first word from each line.
This helps avoid proper noun vs sentence-start capitalization issues.
Arguments:
text
- Input text
Returns:
Text with sentences split and first words removed
tokenize_text
def tokenize_text(text: str) -> List[str]
Tokenize text into individual words.
Arguments:
text
- Input text
Returns:
List of tokenized words
count_word_frequencies
def count_word_frequencies(words: List[str]) -> Counter
Count word frequencies (case-insensitive).
Arguments:
words
- List of words
Returns:
Counter with word frequencies
load_dictionary_words
def load_dictionary_words(wordlist_file: Optional[Path] = None) -> Set[str]
Load dictionary words (case-insensitive).
Arguments:
wordlist_file
- Optional path to wordlist file, uses system dict if None
Returns:
Set of dictionary words in lowercase
create_frequency_wordlist
def create_frequency_wordlist(corpus_frequencies: Counter,
dictionary_words: Set[str],
output_file: Path) -> int
Create frequency-ordered wordlist.
Arguments:
corpus_frequencies
- Word frequency counter from corpusdictionary_words
- Set of valid dictionary words (lowercase)output_file
- Output file path
Returns:
Total number of words written
process_corpus_file
def process_corpus_file(corpus_file: Path,
wordlist_file: Optional[Path] = None,
output_file: Optional[Path] = None) -> Path
Complete corpus processing pipeline.
Arguments:
corpus_file
- Input corpus text filewordlist_file
- Optional dictionary wordlist fileoutput_file
- Optional output file path
Returns:
Path to generated frequency wordlist
t9.utils
Utility functions for PY9 T9 text input system.
getkey
def getkey(word)
Convert a word to T9 keypress sequence.
Example: “hello” -> “43556”
read_wordlist
def read_wordlist(filename)
Read lines from a wordlist file, handling both plain and gzipped files.
Yields stripped non-empty lines from the file.
Arguments:
filename
- Path to the wordlist file (.txt or .txt.gz)
Yields:
str
- Each non-empty line from the file, stripped of whitespace
get_wordlists_dir
def get_wordlists_dir()
Get the path to the wordlists directory.
get_cache_dir
def get_cache_dir()
Get the path to the cache directory for dictionaries.
get_locale
def get_locale()
Get user’s locale as (language, region) tuple.
Returns:
tuple
- (language, region) or (None, None) if not found
find_wordlist
def find_wordlist(language, region=None)
Find wordlist file for given language and region.
Arguments:
language
- Language code (e.g., “en”)region
- Region code (e.g., “GB”) or None
Returns:
Path to wordlist file if found, None otherwise
get_system_wordlist
def get_system_wordlist()
Get system dictionary file, resolving symlinks.
Returns:
Path to system wordlist if found, None otherwise
find_or_generate_dict
def find_or_generate_dict(language=None, region=None)
Find or generate dictionary file using fallback chain.
Arguments:
language
- Language code or None to auto-detectregion
- Region code or None to auto-detect
Returns:
Path to dictionary file if found/generated, None if failed
draw_keypad
def draw_keypad()
Draw a T9 keypad layout for reference.
t9.maket9
T9.py - a predictive text dictionary in the style of Nokia’s T9
File Format… Header: String[7] = “PY9DICT:” Unsigned Long = Number of words Unsigned Long = root node’s start position
Node block: Unsigned Long[4] =
t9.demo
T9 demo application.
clear_screen
def clear_screen()
Clear the terminal screen.
get_input
def get_input()
Get a single character input, handling different platforms.
handle_input
def handle_input(char, input_obj)
Handle input character and return True if should exit.
draw_screen
def draw_screen(input_obj)
Draw the complete T9 interface screen.
run_demo
def run_demo(dict_file=None, language=None, region=None)
Run the T9 demo application.
t9.key
Dictionary node class for PY9 T9 text input system.
SaveState Objects
class SaveState(IntEnum)
Node save state for dictionary file operations.
UNCHANGED
Node doesn’t need any file operation
UPDATE
Existing node needs reference update
NEW
New node needs full save to file
T9Key Objects
class T9Key()
Dictionary node for file-based keypress dictionary storage.
save
def save(f)
Save the node and all child nodes to file f. Used when creating dictionary file.
savenode
def savenode(f)
Save just this node to the file. Used to add or overwrite a node.
loadnode
def loadnode(f)
Load a node from an open file object.
t9.mode
Input modes for PY9 T9 text input system.
InputMode Objects
class InputMode(IntEnum)
Available input modes.
NAVIGATE
Navigate/predictive mode
EDIT_WORD
Edit complete word mode
EDIT_CHAR
Edit individual characters mode
TEXT_LOWER
Lowercase text entry mode
TEXT_UPPER
Uppercase text entry mode
NUMERIC
Numeric entry mode
get_label
def get_label(mode)
Get the display label for a mode.
get_help
def get_help(mode)
Get the key help text for a mode.
t9.cli
Command-line interface for PY9 T9 text input system.
run_demo
def run_demo(dict_file=None, language=None, region=None)
Run the T9 demo application.
generate_dict
def generate_dict(wordlist, output, language="Unknown", comment="")
Generate a T9 dictionary from a wordlist file.
main
def main()
Main CLI entry point.
t9.dict
Dictionary class for PY9 T9 text input system.
T9Dict Objects
class T9Dict()
T9 dictionary for word lookups and modifications.
__init__
def __init__(dict_file)
Create a T9 dictionary class and load file header info.
dict_file: path to dictionary file
File format:
- word count (4 bytes)
- root node pos (4 bytes)
- language string (variable)
- comment string (variable)
getwords
def getwords(digits)
Get possible words for a T9 digit sequence.
Returns:
- If len(result[0]) == len(digits): exact match found
- If len(result[0]) > len(digits): lookahead used
- If len(result[0]) < len(digits): lookbehind used
addword
def addword(word)
Add a word to the dictionary. Raises KeyError if word already exists.
delword
def delword(word)
Delete a word from the dictionary. Not implemented yet.
t9.input
Input parser class for PY9 T9 text input system.
T9Input Objects
class T9Input()
T9 input parser that handles keypresses and text manipulation.
Send keypresses with sendkeys(), retrieve display text with gettext(), get raw text with text().
__init__
def __init__(dict_file,
defaulttxt="",
defaultmode=0,
keydelay=0.5,
numeric=False)
Create a new input parser.
dict_file: dictionary file name defaulttxt: text to start with defaultmode: mode to start in (NAVIGATE=Predictive, TEXT_LOWER, TEXT_UPPER, NUMERIC) keydelay: key timeout in TXT mode numeric: NOT IMPLEMENTED YET
gettext
def gettext()
Get current text including cursor for display. For raw text use .text()
text
def text()
Get text buffer without cursor.
posword
def posword()
Get word with position marker.
setword
def setword()
Change current word to first valid match.
nextword
def nextword()
In edit mode 1, move to next word if possible.
nextchar
def nextchar()
In edit mode 2, select next letter in the group.
addkeypress
def addkeypress(key)
Add a keypress to current input.
sendkeys
def sendkeys(keys)
Send action keys (0-9, U/D/L/R/S).
UDLRS = UP, DOWN, LEFT, RIGHT, SELECT
Navigation mode: 0-9: start new word ULR: navigate cursor D: backspace S: switch to text mode
Edit mode (1+2): 0-9: append keystroke Complete word (mode 1): LR: accept word U: change word D: backspace S: character edit mode Incomplete word (mode 2): L: back letter R: confirm letter U: change letter D: delete char S: change case
Text modes (3+4): 0-9: character input ULR: navigate D: backspace S: cycle modes
Number mode (5): 0-9: numbers ULR: navigate D: backspace S: back to navigation
t9.constants
Constants for PY9 T9 text input system.
Key Objects
class Key(Enum)
T9 input keys - string values match the actual key characters used.
SELECT
Mode switch/Select
UP
Up/Previous word
DOWN
Down/Delete/Backspace
LEFT
Left/Back
RIGHT
Right/Forward