Author ORCID Identifier
Document Type
Conference Paper
Disciplines
Computer Sciences, Acoustics
Abstract
Controllable timbre synthesis has been a subject of research for several decades, and deep neural networks have been the most successful in this area. Deep generative models such as Variational Autoencoders (VAEs) have the ability to generate a high-level representation of audio while providing a structured latent space. Despite their advantages, the interpretability of these latent spaces in terms of human perception is often limited. To address this limitation and enhance the control over timbre generation, we propose a regularized VAE-based latent space that incorporates timbre descriptors. Moreover, we suggest a more concise representation of sound by utilizing its harmonic content, in order to minimize the dimensionality of the latent space.
DOI
https://doi.org/10.21427/7MPD-6420
Recommended Citation
Natsiou, A., Longo, L., & O'Leary, S. (2023). Interpretable timbre synthesis using variational autoencoders regularized on timbre descriptors. DAFX Conference. DOI: 10.21427/7MPD-6420
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Publication Details
Proceedings of the 26th International Conference on Digital Audio Effects (DAFx23), Copenhagen, Denmark, 4 - 7 September 2023
https://www.dafx.de/