2024-05-17 22:37 UTC
  • Xyne


Description: Python library, command-line tools and server for annotating Mandarin Chinese with phonetics (pinyin, zhuyin, etc.) and colors by tone.
Latest Version: 2021
Source Code: src/
  • any
Build Dependencies:
  • python-setuptools
Optional Dependencies:
  • aria2: required for downloading audio files
Arch Repositories:
  • [xyne-any]
  • [xyne-i686]
  • [xyne-x86_64]
AUR Page: hapi
Arch Forum Thread: 229810


screenshot screenshot screenshot screenshot screenshot screenshot screenshot screenshot 

hapi Help Message

$ hapi -h

usage: hapi [-h] [-c {dummit,pleco,sinosplice,hanping,mdbg}]
            [--table {h2p,p2h}] [-f FILE] [--html <path>]
            [-p {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...]]
            [--reset] [--target {ansi,bbcode,html}] [--tw]
            [args ...]

Map hanzi to pinyin and vice versa.

positional arguments:
  args                  The hanzi or phonetic words to map, e.g. "人", "rén",
                        "ren2" or "ㄖㄣˊ", depending on mode.

  -h, --help            show this help message and exit
  -c {dummit,pleco,sinosplice,hanping,mdbg}, --color {dummit,pleco,sinosplice,hanping,mdbg}
                        Color tones by given scheme.
  --table {h2p,p2h}     h2p: map hanzi to pinyin; p2h: map pinyin to hanzi;
                        color: colorize strings of hanzi
  -f FILE, --file FILE  In "color" mode, read hanzi from a file. If "-", read
                        from STDIN.
  --html <path>         Generate full-page HTML output and save to the given
                        path. If the path is "-", print to STDOUT.
  -p {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...], --phonetics {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...]
                        Display phonetics alongsize Hanzi in text output.
  --reset               Rebuild the database.
  --target {ansi,bbcode,html}
                        Select target when printing to STDOUT. Default: ansi
  --tw, --taiwanese     Use preferred zh-Hant (TW) pinyin readings.

hapi-srv Help Message

$ hapi-srv -h

usage: server.py [-h] [-a ADDRESS] [-p PORT]

Run a local HaPi server to input text via your browser.

  -h, --help            show this help message and exit
  -a ADDRESS, --address ADDRESS
                        The server address. Pass en empty string to listen on
                        all interfaces. Default: localhost
  -p PORT, --port PORT  The server port. Default: 8000



Do this to get started:

hapi-download_data -uy
# open http://localhost:8000/ and post some Chinese input


I have noticed that some sites use different colors to represent the tones of Mandarin when writing hanzi and pinyin. I like the idea as an aid for memorizing tones so I started writing a script to generate colorized output. It ended up growing to the current Python library and scripts.


Common constants and function used by the other modules.
Classes for working with different phonetic systems via a common interface.
Functions and classes for working with Unihan data.
Database class for building and querying the database.
lxml-based XHTML generator classes and functions.
Simple HTTP server based on Python’s http.server module. It processes form data to generate pages of colorized hanzi with phonetic annotations and dictionary links, along with a table of the characters and their Unihan data and optional audio.


Download the Unihan data required to build the database and audio files for generating HTML audio elements.
The original command-line script. It provides modes for printing different types of tables and text output to the command line or saving HTML pages. The output may be optionally colored using different color systems. Colorized output can be formatted for the console, bbcode or html. See the command-line help message for details.
A thin wrapper to run hapi.server.

Data Directories

The database is created in $XDG_DATA_HOME/hapi. Third-party data will be downloaded to $XDG_DATA_HOME/hapi/dat or $HAPI_DATA_DIR if it is set. Queries for the data directory use xdg.BaseDirectory and will look in system directories after checking user directories (including $HAPI_DATA_DIR) so you can put the data there after it has been downloaded.

XDG_DATA_HOME defaults to ~/.local/share.


The script has no concept of compound hanzi and their meanings. Everything is based on the single-character Unihan file. This includes sandhi, i.e. the way tones change in juxtaposition.


I have a few ideas but no fixed plan:

  • Add radicals to tables?
  • Add a mode to look up stroke order for characters (I need to figure out how to include that info from the database). Check ckjlib’s implementation (thanks to Spyhawk for the suggestion).
  • Maybe use lxml to parse input and then replace text and tail attributes with colorized text as a way to preserve input formatting.


Install the module with setup tools and then put the scripts on the path. Run the download script to get the required third-party data.


The following Python libraries are required (Arch Linux package names provided):

Required Data Files

The Unihan files are required. The audio files are optional. Use hapi-download_data -uy to get everything. The script depends on wget and aria2c.

Unicode Unihan

The script parses the Unihan readings file to build maps of hanzi, pinyin and their definitions.

Pinyin MP3

For HTML audio support, MP3 files for each pinyin and tone combination must be stored in dat/mp3. The files should be named by their numbered pinyin, e.g. the sound file for “rén” should be saved as ren2.mp3. The files should be accessible via the subdirectory audio in the data directory, either directly or via symlink (which can be used to easily switch between different collections of audio files).

echo xyne.archlinux.org | sed 's/\./@/'
XHTML 1.0 Strict CSS level 3 Atom 1.0