hapi

Metadata
Screenshots
README

hapi

2025-01-22 15:39 UTC

Xyne

Metadata

Description:	Python library, command-line tools and server for annotating Mandarin Chinese with phonetics (pinyin, zhuyin, etc.) and colors by tone.
Latest Version:	2021
Source Code:	src/
Architecture:	any
Dependencies:	python-lxml python3 python3-colorsysplus wget
Build Dependencies:	python-setuptools
Optional Dependencies:	aria2: required for downloading audio files
Arch Repositories:	[xyne-any] [xyne-i686] [xyne-x86_64]
AUR Page:	hapi
Arch Forum Thread:	229810
Tags:	chinese python server

Screenshots

README

TL;DR

Do this to get started:

hapi-download_data -uy
hapi-srv
# open http://localhost:8000/ and post some Chinese input

About

I have noticed that some sites use different colors to represent the tones of Mandarin when writing hanzi and pinyin. I like the idea as an aid for memorizing tones so I started writing a script to generate colorized output. It ended up growing to the current Python library and scripts.

Library

hapi.common: Common constants and function used by the other modules.
hapi.honetics: Classes for working with different phonetic systems via a common interface.
hapi.unihan: Functions and classes for working with Unihan data.
hapi.db: Database class for building and querying the database.
hapi.html: lxml-based XHTML generator classes and functions.
hapi.server: Simple HTTP server based on Python’s http.server module. It processes form data to generate pages of colorized hanzi with phonetic annotations and dictionary links, along with a table of the characters and their Unihan data and optional audio.

Scripts

hapi-download_data: Download the Unihan data required to build the database and audio files for generating HTML audio elements.
hapi: The original command-line script. It provides modes for printing different types of tables and text output to the command line or saving HTML pages. The output may be optionally colored using different color systems. Colorized output can be formatted for the console, bbcode or html. See the command-line help message for details.
hapi-srv: A thin wrapper to run hapi.server.

Data Directories

The database is created in $XDG_DATA_HOME/hapi. Third-party data will be downloaded to $XDG_DATA_HOME/hapi/dat or $HAPI_DATA_DIR if it is set. Queries for the data directory use xdg.BaseDirectory and will look in system directories after checking user directories (including $HAPI_DATA_DIR) so you can put the data there after it has been downloaded.

XDG_DATA_HOME defaults to ~/.local/share.

Limitations

The script has no concept of compound hanzi and their meanings. Everything is based on the single-character Unihan file. This includes sandhi, i.e. the way tones change in juxtaposition.

Roadmap

I have a few ideas but no fixed plan:

Add radicals to tables?
Add a mode to look up stroke order for characters (I need to figure out how to include that info from the database). Check ckjlib’s implementation (thanks to Spyhawk for the suggestion).
Maybe use lxml to parse input and then replace text and tail attributes with colorized text as a way to preserve input formatting.

Installation

Install the module with setup tools and then put the scripts on the path. Run the download script to get the required third-party data.

Dependencies

The following Python libraries are required (Arch Linux package names provided):

Required Data Files

The Unihan files are required. The audio files are optional. Use hapi-download_data -uy to get everything. The script depends on wget and aria2c.

Unicode Unihan

The script parses the Unihan readings file to build maps of hanzi, pinyin and their definitions.

Pinyin MP3

For HTML audio support, MP3 files for each pinyin and tone combination must be stored in dat/mp3. The files should be named by their numbered pinyin, e.g. the sound file for “rén” should be saved as ren2.mp3. The files should be accessible via the subdirectory audio in the data directory, either directly or via symlink (which can be used to easily switch between different collections of audio files).

Contact: echo xyne.archlinux.org | sed 's/\./@/'
Validation: XHTML 1.0 Strict CSS level 3 Atom 1.0