Adding Emotion to Text to Speech Output with the W3C Emotion Markup Language

A hands-on 45 minute workshop at the 2013 SpeechTEK Conference.

Latest change: 1st March 2013


Adding Emotion to Text to Speech Output with the W3C Emotion Markup Language Emotional state is an important component of interactions between humans, and the manner and tone of speech is one of the most common means by which humans convey emotions to each other. Text to speech (TTS) output would be much more natural and lifelike if it were possible for TTS systems to be configured by developers to easily express emotions. In fact, there has been considerable research on expressive TTS, but most existing systems for expressive TTS use proprietary approaches for adding the expression of emotion to TTS, making it difficult to integrate different products into systems. The World Wide Web Consortium has recently published the Emotion Markup Language (EmotionML) which provides a standard that can be used for annotating text with emotion for TTS (as well as for annotating other system outputs such as facial expressions). This workshop will use the open-source, platform-independent Mary TTS system from DFKI to introduce the concepts of EmotionML. Participants will be able to use Mary and EmotionML to create and save their own expressive short narratives.

Workshop goal

This workshop will give

Required background knowledge to complete the project

There is some background knowledge and samples on emotional synthetic speech at

Summary of the project which the attendee will build during the workshop

The participants will get a famous text, like the first few lines of the "to be or not to be" speech from Hamlet and we'll have a little contest to see who could come up with the best version using EmotionML and the Mary synthesizer.

Workshop requirements

Mary is fully open source. All code is now open source under the LGPL, including German TTS. Voices are distributed under Creative Commons or BSD licenses.
In case of problems or questions, contact workshop organizer Felix Burkhardt

Workshop agenda

We suggest (parts of) this exract from Alice in Wonderland to be enriched with emotional TTS.

Very soon the Rabbit noticed Alice, and called out to her in an angry tone,
`Why, Mary Ann, what are you doing out here? Run home this moment, and fetch me a pair of gloves and a fan! Quick, now!'
`He took me for his housemaid,'
she said to herself as she ran.
`How surprised he'll be when he finds out who I am! But I'd better take him his fan and gloves.'
`How queer it seems,'
Alice said to herself,
`to be going messages for a rabbit! I suppose Dinah'll be sending me on messages next!'
And she began fancying the sort of thing that would happen:
`"Miss Alice! Come here directly, and get ready for your walk!"
"Coming in a minute, nurse! But I've got to see that the mouse doesn't get out."
Only I don't think,'
Alice went on,
`that they'd let Dinah stop in the house if it began ordering people about like that!'

The organizers

Deborah Dahl
Deborah Dahl has over 20 years of experience in speech and natural language technologies, including working on research, defense, and commercial systems. She is also active in speech and multimodal standards activities in the World Wide Web Consortium, serving as Chair of the Multimodal Interaction Working Group and Co-Chair of the Hypertext Coordination Group. She is an editor of the EMMA (Extensible MultiModal Annotation specification).
Dr. Dahl received the prestigious "Speech Luminary" award from Speech Technology Magazine, in 2012.

Felix Burkhardt
Felix Burkhardt does tutoring, consulting, research and development in the working fields human-machine dialog systems, text-to-speech synthesis, speaker classification, ontology based natural language modeling, voice search and emotional human-machine interfaces. Originally an expert of Speech Synthesis at the Technical University of Berlin, he wrote his ph.d. thesis on the simulation of emotional speech by machines, recorded the Berlin acted emotions database, "EmoDB", and maintains several open source projects, including the emotional speech synthesizer "Emofilt" and the speech labeling and annotation tool "Speechalyzer". He has been working for the Deutsche Telekom AG since 2000, currently for the Telekom Innovation Laboratories in Berlin. He was a member of the European Network of Excellence HUMAINE on emotion-oriented computing and is currently the editor of the W3C Emotion Markup Language specification.