Conference Papers

An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

Anastasia Natsiou, Technological University DublinFollow
Luca Longo, Technological University DublinFollow
Seán O'Leary, Technological University DublinFollow

Author ORCID Identifier

https://orcid.org/0000-0002-2916-0134

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences

Publication Details

IEEE 16th International Conference on Signal Image Technology & Internet-based Systems

Abstract

In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. These representations can be used to manipulate the timbre and influence the synthesis of creative instrumental notes. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument timbre compression. Unsupervised deep learning methods can achieve audio compression by training the network to learn a mapping from waveforms or spectrograms to low-dimensional representations. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch. Further exploration of hyper-parameters and regularization techniques is demonstrated to enhance the performance of the initial design. In an unsupervised manner, the network is able to reconstruct a monophonic and harmonic sound based on latent representations. In addition, we introduce an evaluation metric to measure the similarity between the original and reconstructed samples. Evaluating a deep generative model for the synthesis of sound is a challenging task. Our approach is based on the accuracy of the generated frequencies as it presents a significant metric for the perception of harmonic sounds. This work is expected to accelerate future experiments on audio compression using neural autoencoders.

DOI

https://doi.org/10.21427/FX59-W834

Recommended Citation

Natsiou, A., Longo, L., & O'Leary, S. (2022). An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms. Technological University Dublin. DOI: 10.21427/FX59-W834

Download

Contact the Author

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons

COinS

Conference Papers

An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

Author ORCID Identifier

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Search

Browse

Author Corner

Conference Papers

An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

Authors

Author ORCID Identifier

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Share

Search

Browse

Author Corner