hapi

2023-05-10 21:51 UTC
  • Xyne

Metadata

Description: Python library, command-line tools and server for annotating Mandarin Chinese with phonetics (pinyin, zhuyin, etc.) and colors by tone.
Latest Version: 2021
Source Code: src/
Architecture:
  • any
Dependencies:
Optional Dependencies:
  • aria2: required for downloading audio files
Arch Repositories:
  • [xyne-any]
  • [xyne-i686]
  • [xyne-x86_64]
AUR Page: hapi
Arch Forum Thread: 229810
Tags:

Screenshots

screenshot screenshot screenshot screenshot screenshot screenshot screenshot screenshot 

hapi Help Message

$ hapi -h

usage: hapi [-h] [-c {dummit,pleco,sinosplice,hanping,mdbg}]
            [--table {h2p,p2h}] [-f FILE] [--html <path>]
            [-p {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...]]
            [--reset] [--target {ansi,bbcode,html}] [--tw]
            [args ...]

Map hanzi to pinyin and vice versa.

positional arguments:
  args                  The hanzi or phonetic words to map, e.g. "人", "rén",
                        "ren2" or "ㄖㄣˊ", depending on mode.

options:
  -h, --help            show this help message and exit
  -c {dummit,pleco,sinosplice,hanping,mdbg}, --color {dummit,pleco,sinosplice,hanping,mdbg}
                        Color tones by given scheme.
  --table {h2p,p2h}     h2p: map hanzi to pinyin; p2h: map pinyin to hanzi;
                        color: colorize strings of hanzi
  -f FILE, --file FILE  In "color" mode, read hanzi from a file. If "-", read
                        from STDIN.
  --html <path>         Generate full-page HTML output and save to the given
                        path. If the path is "-", print to STDOUT.
  -p {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...], --phonetics {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...]
                        Display phonetics alongsize Hanzi in text output.
  --reset               Rebuild the database.
  --target {ansi,bbcode,html}
                        Select target when printing to STDOUT. Default: ansi
  --tw, --taiwanese     Use preferred zh-Hant (TW) pinyin readings.

hapi-srv Help Message

$ hapi-srv -h

usage: server.py [-h] [-a ADDRESS] [-p PORT]

Run a local HaPi server to input text via your browser.

options:
  -h, --help            show this help message and exit
  -a ADDRESS, --address ADDRESS
                        The server address. Pass en empty string to listen on
                        all interfaces. Default: localhost
  -p PORT, --port PORT  The server port. Default: 8000

README

TL;DR

Do this to get started:

hapi-download_data -uy
hapi-srv
# open http://localhost:8000/ and post some Chinese input

About

I have noticed that some sites use different colors to represent the tones of Mandarin when writing hanzi and pinyin. I like the idea as an aid for memorizing tones so I started writing a script to generate colorized output. It ended up growing to the current Python library and scripts.

Library

hapi.common
Common constants and function used by the other modules.
hapi.honetics
Classes for working with different phonetic systems via a common interface.
hapi.unihan
Functions and classes for working with Unihan data.
hapi.db
Database class for building and querying the database.
hapi.html
lxml-based XHTML generator classes and functions.
hapi.server
Simple HTTP server based on Python’s http.server module. It processes form data to generate pages of colorized hanzi with phonetic annotations and dictionary links, along with a table of the characters and their Unihan data and optional audio.

Scripts

hapi-download_data
Download the Unihan data required to build the database and audio files for generating HTML audio elements.
hapi
The original command-line script. It provides modes for printing different types of tables and text output to the command line or saving HTML pages. The output may be optionally colored using different color systems. Colorized output can be formatted for the console, bbcode or html. See the command-line help message for details.
hapi-srv
A thin wrapper to run hapi.server.

Data Directories

The database is created in $XDG_DATA_HOME/hapi. Third-party data will be downloaded to $XDG_DATA_HOME/hapi/dat or $HAPI_DATA_DIR if it is set. Queries for the data directory use xdg.BaseDirectory and will look in system directories after checking user directories (including $HAPI_DATA_DIR) so you can put the data there after it has been downloaded.

XDG_DATA_HOME defaults to ~/.local/share.

Limitations

The script has no concept of compound hanzi and their meanings. Everything is based on the single-character Unihan file. This includes sandhi, i.e. the way tones change in juxtaposition.

Roadmap

I have a few ideas but no fixed plan:

  • Add radicals to tables?
  • Add a mode to look up stroke order for characters (I need to figure out how to include that info from the database). Check ckjlib’s implementation (thanks to Spyhawk for the suggestion).
  • Maybe use lxml to parse input and then replace text and tail attributes with colorized text as a way to preserve input formatting.

Installation

Install the module with setup tools and then put the scripts on the path. Run the download script to get the required third-party data.

Dependencies

The following Python libraries are required (Arch Linux package names provided):

Required Data Files

The Unihan files are required. The audio files are optional. Use hapi-download_data -uy to get everything. The script depends on wget and aria2c.

Unicode Unihan

The script parses the Unihan readings file to build maps of hanzi, pinyin and their definitions.

Pinyin MP3

For HTML audio support, MP3 files for each pinyin and tone combination must be stored in dat/mp3. The files should be named by their numbered pinyin, e.g. the sound file for “rén” should be saved as ren2.mp3. The files should be accessible via the subdirectory audio in the data directory, either directly or via symlink (which can be used to easily switch between different collections of audio files).

Contact
echo xyne.archlinux.org | sed 's/\./@/'
Validation
XHTML 1.0 Strict CSS level 3 Atom 1.0