Document Type

Theses, Masters

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Abstract

Voice conversion is an active new branch of speech processing and deals with the transformation of natural speech, focusing on changing the characteristics of the speaker’s voice. As a result of voice conversion, the speaker’s identity can be changed to make the converted speech sound as if it were uttered by a different speaker, or certain characteristics of the voice can be modified while maintaining the speaker’s identity. Voice-gender conversion (VGC) is a subset of voice conversion and focuses on the transformation of gender-specific voice characteristics. As a result of a voice-gender conversation, male speech is converted into female-sounding speech and vice versa. A major application of voice conversion is speaker normalisation. This means that a given voice is converted to a normalised voice. This allows speech recognition and speech compression methods to perform better as their effective signal space is reduced significantly. In speech compression applications, the reduction of the signal space enhances the efficiency and achieves higher compression rates. Another application is voice transformation to accommodate hearing impairements: a straightforward application is the usage of voice-gender conversion to disguise voices for the protection of individuals, e. g. witnesses, or for nuisance-call determent. The system presented in this thesis achieves voice-gender transformation by independently frequency-scaling the excitation and the formant spectrum of the speech signal in order to model the different voice-gender features from the voice-production perspective. The novelty of this research is the linearization of the non-linear relationship between the male and female formant spectrum. The algorithm used to achieve frequency-scaling is a time scale modification (TSM) algorithm called adaptive over-lap and add (AOLA), which is a recently developed method to efficiently change the duration of time-based signals.

DOI

https://doi.org/10.21427/D71164


Included in

Engineering Commons

Share

COinS