Conference papers

Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction

Charlie Cullen, Technological University DublinFollow
Brian Vaughan, Technological University DublinFollow
Spyros Kousidis, Technological University Dublin
Yi Wang, Technological University DublinFollow
Ciaran McDonnell, Technological University Dublin
Dermot Campbell, Technological University DublinFollow

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE

Publication Details

International Conference on Multidisciplinary Information Sciences and Technologies Extremadura (InSciT), Merida, Spain. 2006.

Abstract

Detecting emotional dimensions [1] in speech is an area of great research interest, notably as a means of improving human computer interaction in areas such as speech synthesis [2]. In this paper, a method of obtaining high quality emotional audio speech assets is proposed. The methods of obtaining emotional content are subject to considerable debate, with distinctions between acted [3] and natural [4] speech being made based on the grounds of authenticity. Mood Induction Procedures (MIP’s) [5] are often employed to stimulate emotional dimensions in a controlled environment. This paper details experimental procedures based around MIP 4, using performance related tasks to engender activation and evaluation responses from the participant. Tasks are specified involving two participants, who must co-operate in order to complete a given task [6] within the allotted time. Experiments designed in this manner also allow for the specification of high quality audio assets (notably 24bit/192Khz [7]), within an acoustically controlled environment [8], thus providing means of reducing unwanted acoustic factors within the recorded speech signal. Once suitable assets are obtained, they will be assessed for the purposes of segregation into differing emotional dimensions. The most statistically robust method of evaluation involves the use of listening tests to determine the perceived emotional dimensions within an audio clip. In this experiment, the FeelTrace [9] rating tool is employed within user listening tests to specify the categories of emotional dimensions for each audio clip.

Recommended Citation

Cullen, C. et al (2006) Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction. International Conference on Multidisciplinary Information Sciences and Technologies Extremadura (InSciT), Merida, Spain. 25th-28th October.

Funder

Salero Project

Download

Included in

Computer Engineering Commons

COinS

Conference papers

Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction

Document Type

Rights

Disciplines

Publication Details

Abstract

Recommended Citation

Funder

Included in

Search

Browse

Author Corner

Links

Conference papers

Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

Recommended Citation

Funder

Included in

Share

Search

Browse

Author Corner

Links