Advertisment

Speech markup language: The next in line

author-image
CIOL Bureau
Updated On
New Update

The Hyper Text Markup Language or HTML has ruled the web for many a years and

has been the starting point of many other fascinating technologies. The most

interesting among them is a language termed SML or speech markup language. This

language has the inherent capability to annotate text in to voice information.

The SML makes use of a speech synthesizer, which provides the system the

capability of audio output. Users and applications provide text to a speech

synthesizer, when is then converted in to audio output.

Advertisment

Most of the speech markup languages make use of speech synthesizers whose

sole purpose is to convert inputted text in to sound (audio) output. There are

different types of markup languages categorized on the quality of the output,

end user understandability and so on.

Let’s look at a few types of markup languages:

  • Java Speech Markup Language (JSML)
Advertisment

JSML has been developed to support as many types of applications as possible

and as many different languages as possible. To make this possible, JSML marks

general information about the text and, whenever possible, uses cross-language

properties. Although JSML may be used for text in Japanese, Spanish, Tamil,

Thai, English and nearly all modern languages, a single JSML document should

contain text for only a single language. Applications are therefore responsible

for management and control of speech synthesizers if output of multiple

languages is required. JSML can be used by a wide range of applications to speak

text from equally varied sources, including email, database information, web

pages and word processor documents. The process works something like this- the

application is responsible for converting the source information to JSML text

using any special knowledge it has about content and format of the source

information. For example, an email application can provide the ability to read

e-mail messages aloud, converting messages to JSML. This could involve the

conversion of header information (sender, subject, date etc.) to an audio

format. This might also involve special processing of text in the body of the

message (for handling attachments, indented text, special abbreviations, etc.)

  • Speech Synthesis Markup Language (SSML)
Advertisment

SSML is a speech synthesis markup language, which adds annotations to texts

for use in guiding pronunciation when the given text is being spoken by the

synthesizer.

The concept behind SSML was developed by Paul Taylor in 1992. After many

delays, a first prototype was developed by Amy Isard and Paul Taylor in the

summer of 1995. SSML version 1.0, is now fully incorporated in to the speech

synthesis system, which is available for general use. SSML does not contain

explicit instructions as to how documents should be processed as this is left to

the individual synthesis interpreters. At present only one such interpreter

exists. The current synthesis interpreter processes the tags in the way one

would expect: short pauses are inserted as it is not envisaged that SSML will be

used in quite the same way as HTML or other document markup languages. The

envisaged applications are deemed to be more in the area of language generatory

synthesis program, which passes the entire text for output to the synthesizer.

Advertisment
  • TalkML

TalkML is an experimental XML language for voice browsers and is being

developed by HP labs for use in call centers, sales and support services, adding

speech recognition to today’s touch tone systems.

Advertisment

TalkML supports more natural conversations than dialog systems based on

keywords, while remaining simple to author. Other work is underway to

investigate how to author "dual-access" application, where the same

application can be accessed by both conventional visual browsers, perhaps by

transforming HTML to TalkML.

  • VOXML

Advertisment

Motorola designed VoXML to serve as a platform for voice applications, much

as HTML does for web based applications. VoXML technology enables the applicable

interface ( voice enabled browsers etc) to be in the form of dialog boxes .

Navigation and input are produced via recognition of a user’s voice; output is

produced via text-to speech specification, especially version 4.0 does a very

good job of allowing web page authors to visually lay out a document and include

niceties such as animation and so on. However, HTML falls short when it comes to

providing a framework for enabling structured speech dialogs , such as the

automated responses of interactive voice recognition systems. The development

environment for VoXML applications comprises an HTTP server that hosts a VoXML

voice application and a desktop client for accessing the voice application. A

user calls the voice browser by telephone and can then interact with their

specified Internet voice applications.

In conclusion, all these SML based applications increase the interactivity of

the Internet as a medium.

The present scenario makes extensive use of text markup. But with better

implementation of the speech synthesizers and advancements in their ability to

mimic human speech, the web of the future will be marked by extensive use of the

speech markup languages and a whole new world of interactive experience.

tech-news