# Introduction

On a whim I began to explore the idea of creating a Pandoc writer for speech
synthesis. After several hours I ended up with this: a writer that converts
Pandoc's native format to Sable, with some tentative conventions for providing
audio metadata about the document structure. Others with paedagogical experience
in this area will no doubt have much better ideas for conveying this
information.

There is, of course, room for improvement in the code itself as well.

If you are not familiar with Pandoc, it is both a Haskell library and a file
conversion utility that supports a number of different input and output formats.
It is basically the Swiss army knife of markup conversion. With the addition of
this writer, all of the input formats recognized by Pandoc may be converted to
Sable for speech synthesis. These formats include

* docbook
* html
* json
* latex
* markdown
* mediawiki
* rst
* textile

Writing a dedicated converter for one format is trivial, as shown in the
accompanying script for converting Markdown to Sable. I expect that adding it to
a modified Pandoc executable would also be easy.

To convert the resulting Sable file to speech, you can use an application such
as festival. festival can render the audio output directly with the --tts
flag. The Festival package also provides an application named text2wave that
will accept the Sable file as an argument and generate a .wav file from it.

I know nothing about existing markup languages for speech generation. I have
only discovered Sable a few hours prior to writing this, and it was good enough
for my purposes that I did not continue to search for alternatives. It does the
trick, and it would obviously not be mutually exclusive with another output
format.

# Links

See the following for more information:

* [Pandoc homepage](http://johnmacfarlane.net/pandoc/)
* [Sable specification](http://www.bell-labs.com/project/tts/sable.html)
* [Festival homepage](http://www.cstr.ed.ac.uk/projects/festival/)


# Technical Notes

A configuration type can be added later for setting the voice, volume, rate,
etc. used to convey metadata, as well as the terms used.

The literal reading of code and other raw data is clearly sub-optimal. One
approach would be to pass the data to a custom function or external application
for tokenizing, and then generate appropriate Sable output from that.

The output is not currently strict XML. Festival does not seem to fully
understand escaped XML so the writer currently inserts raw text even if that
text contains forbidden XML characters.

The use of `<DIV>` tags to encapsulate plain text is a consequence of
Text.XML.Light's imposition of strict XML standards.

The code itself does a lot of list concatentation and should probably be
refactored with a different approach. Nevertheless, it works.