Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction
Document Type Conference Paper
International Conference on Multidisciplinary Information Sciences and Technologies Extremadura (InSciT), Merida, Spain. 2006.
Detecting emotional dimensions  in speech is an area of great research interest, notably as a means of improving human computer interaction in areas such as speech synthesis . In this paper, a method of obtaining high quality emotional audio speech assets is proposed. The methods of obtaining emotional content are subject to considerable debate, with distinctions between acted  and natural  speech being made based on the grounds of authenticity. Mood Induction Procedures (MIP’s)  are often employed to stimulate emotional dimensions in a controlled environment. This paper details experimental procedures based around MIP 4, using performance related tasks to engender activation and evaluation responses from the participant. Tasks are specified involving two participants, who must co-operate in order to complete a given task  within the allotted time. Experiments designed in this manner also allow for the specification of high quality audio assets (notably 24bit/192Khz ), within an acoustically controlled environment , thus providing means of reducing unwanted acoustic factors within the recorded speech signal. Once suitable assets are obtained, they will be assessed for the purposes of segregation into differing emotional dimensions. The most statistically robust method of evaluation involves the use of listening tests to determine the perceived emotional dimensions within an audio clip. In this experiment, the FeelTrace  rating tool is employed within user listening tests to specify the categories of emotional dimensions for each audio clip.