The Hyper Text Markup Language or HTML has ruled the web for many a years and
has been the starting point of many other fascinating technologies. The most
interesting among them is a language termed SML or speech markup language. This
language has the inherent capability to annotate text in to voice information.
The SML makes use of a speech synthesizer, which provides the system the
capability of audio output. Users and applications provide text to a speech
synthesizer, when is then converted in to audio output.
Most of the speech markup languages make use of speech synthesizers whose
sole purpose is to convert inputted text in to sound (audio) output. There are
different types of markup languages categorized on the quality of the output,
end user understandability and so on.
Let’s look at a few types of markup languages:
- Java Speech Markup Language (JSML)
JSML has been developed to support as many types of applications as possible
and as many different languages as possible. To make this possible, JSML marks
general information about the text and, whenever possible, uses cross-language
properties. Although JSML may be used for text in Japanese, Spanish, Tamil,
Thai, English and nearly all modern languages, a single JSML document should
contain text for only a single language. Applications are therefore responsible
for management and control of speech synthesizers if output of multiple
languages is required. JSML can be used by a wide range of applications to speak
text from equally varied sources, including email, database information, web
pages and word processor documents. The process works something like this- the
application is responsible for converting the source information to JSML text
using any special knowledge it has about content and format of the source
information. For example, an email application can provide the ability to read
e-mail messages aloud, converting messages to JSML. This could involve the
conversion of header information (sender, subject, date etc.) to an audio
format. This might also involve special processing of text in the body of the
message (for handling attachments, indented text, special abbreviations, etc.)
- Speech Synthesis Markup Language (SSML)
SSML is a speech synthesis markup language, which adds annotations to texts
for use in guiding pronunciation when the given text is being spoken by the
synthesizer.
The concept behind SSML was developed by Paul Taylor in 1992. After many
delays, a first prototype was developed by Amy Isard and Paul Taylor in the
summer of 1995. SSML version 1.0, is now fully incorporated in to the speech
synthesis system, which is available for general use. SSML does not contain
explicit instructions as to how documents should be processed as this is left to
the individual synthesis interpreters. At present only one such interpreter
exists. The current synthesis interpreter processes the tags in the way one
would expect: short pauses are inserted as it is not envisaged that SSML will be
used in quite the same way as HTML or other document markup languages. The
envisaged applications are deemed to be more in the area of language generatory
synthesis program, which passes the entire text for output to the synthesizer.
- TalkML
TalkML is an experimental XML language for voice browsers and is being
developed by HP labs for use in call centers, sales and support services, adding
speech recognition to today’s touch tone systems.
TalkML supports more natural conversations than dialog systems based on
keywords, while remaining simple to author. Other work is underway to
investigate how to author "dual-access" application, where the same
application can be accessed by both conventional visual browsers, perhaps by
transforming HTML to TalkML.
- VOXML
Motorola designed VoXML to serve as a platform for voice applications, much
as HTML does for web based applications. VoXML technology enables the applicable
interface ( voice enabled browsers etc) to be in the form of dialog boxes .
Navigation and input are produced via recognition of a user’s voice; output is
produced via text-to speech specification, especially version 4.0 does a very
good job of allowing web page authors to visually lay out a document and include
niceties such as animation and so on. However, HTML falls short when it comes to
providing a framework for enabling structured speech dialogs , such as the
automated responses of interactive voice recognition systems. The development
environment for VoXML applications comprises an HTTP server that hosts a VoXML
voice application and a desktop client for accessing the voice application. A
user calls the voice browser by telephone and can then interact with their
specified Internet voice applications.
In conclusion, all these SML based applications increase the interactivity of
the Internet as a medium.
The present scenario makes extensive use of text markup. But with better
implementation of the speech synthesizers and advancements in their ability to
mimic human speech, the web of the future will be marked by extensive use of the
speech markup languages and a whole new world of interactive experience.