Description: | Python library, command-line tools and server for annotating Mandarin Chinese with phonetics (pinyin, zhuyin, etc.) and colors by tone. |
Latest Version: | 2021 |
Source Code: | src/ |
Architecture: |
|
Dependencies: |
|
Build Dependencies: |
|
Optional Dependencies: |
|
Arch Repositories: |
|
AUR Page: | hapi |
Arch Forum Thread: | 229810 |
Tags: |
$ hapi -h
usage: hapi [-h] [-c {dummit,pleco,sinosplice,hanping,mdbg}]
[--table {h2p,p2h}] [-f FILE] [--html <path>]
[-p {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...]]
[--reset] [--target {ansi,bbcode,html}] [--tw]
[args ...]
Map hanzi to pinyin and vice versa.
positional arguments:
args The hanzi or phonetic words to map, e.g. "人", "rén",
"ren2" or "ㄖㄣˊ", depending on mode.
options:
-h, --help show this help message and exit
-c {dummit,pleco,sinosplice,hanping,mdbg}, --color {dummit,pleco,sinosplice,hanping,mdbg}
Color tones by given scheme.
--table {h2p,p2h} h2p: map hanzi to pinyin; p2h: map pinyin to hanzi;
color: colorize strings of hanzi
-f FILE, --file FILE In "color" mode, read hanzi from a file. If "-", read
from STDIN.
--html <path> Generate full-page HTML output and save to the given
path. If the path is "-", print to STDOUT.
-p {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...], --phonetics {pinyin,zhuyin,wadeGiles,yale} [{pinyin,zhuyin,wadeGiles,yale} ...]
Display phonetics alongsize Hanzi in text output.
--reset Rebuild the database.
--target {ansi,bbcode,html}
Select target when printing to STDOUT. Default: ansi
--tw, --taiwanese Use preferred zh-Hant (TW) pinyin readings.
$ hapi-srv -h
usage: server.py [-h] [-a ADDRESS] [-p PORT]
Run a local HaPi server to input text via your browser.
options:
-h, --help show this help message and exit
-a ADDRESS, --address ADDRESS
The server address. Pass en empty string to listen on
all interfaces. Default: localhost
-p PORT, --port PORT The server port. Default: 8000
Do this to get started:
hapi-download_data -uy
hapi-srv
# open http://localhost:8000/ and post some Chinese input
I have noticed that some sites use different colors to represent the tones of Mandarin when writing hanzi and pinyin. I like the idea as an aid for memorizing tones so I started writing a script to generate colorized output. It ended up growing to the current Python library and scripts.
hapi.server
.
The database is created in $XDG_DATA_HOME/hapi
.
Third-party data will be downloaded to
$XDG_DATA_HOME/hapi/dat
or $HAPI_DATA_DIR
if
it is set. Queries for the data directory use
xdg.BaseDirectory
and will look in system directories after
checking user directories (including $HAPI_DATA_DIR
) so you
can put the data there after it has been downloaded.
XDG_DATA_HOME defaults to ~/.local/share.
The script has no concept of compound hanzi and their meanings. Everything is based on the single-character Unihan file. This includes sandhi, i.e. the way tones change in juxtaposition.
I have a few ideas but no fixed plan:
Install the module with setup tools and then put the scripts on the path. Run the download script to get the required third-party data.
The following Python libraries are required (Arch Linux package names provided):
The Unihan files are required. The audio files are optional. Use
hapi-download_data -uy
to get everything. The script
depends on wget
and aria2c
.
The script parses the Unihan readings file to build maps of hanzi, pinyin and their definitions.
For HTML audio support, MP3 files for each pinyin and tone
combination must be stored in dat/mp3
. The files should be
named by their numbered pinyin, e.g. the sound file for “rén” should be
saved as ren2.mp3
. The files should be accessible via the
subdirectory audio
in the data directory, either directly
or via symlink (which can be used to easily switch between different
collections of audio files).