Sound, Machine Learning, and Deep Generative Models: A Practical Introduction

align-center

The slides are online -> https://ktatar.github.io/2026-02_linneaus_audio

Kıvanç Tatar, Associate Professor in Interactive AI

align-right

Basics of Sound concepts

  • Waveform: dynamics, transients, envelopes
  • Frequency: pitch, overtones, timbre components
  • Time–frequency transforms: spectrograms, mel spectrograms
  • Energy and dynamics over time (envelope, loudness curves etc.)

McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. "librosa: Audio and music signal analysis in python." In Proceedings of the 14th python in science conference, pp. 18-25. 2015.

Kıvanç Tatar, Associate Professor in Interactive AI

Waveforms and Their Features

Kıvanç Tatar, Associate Professor in Interactive AI

Timbre and its features

FFT based

Kıvanç Tatar, Associate Professor in Interactive AI

Wavelet based Timbre Features

[1] Schorkhuber C, Klapuri A (2010) Constant-Q Transform Toolbox
For Music Processing. In: Proceedings of the 7th Sound and Music Computing Conference (SMC 2010), p. 8. Barcelona, Spain

Kıvanç Tatar, Associate Professor in Interactive AI

Music Specific Features

Music specific

Kıvanç Tatar, Associate Professor in Interactive AI

Machine Learning for Sound and Music Computing

Kıvanç Tatar, Associate Professor in Interactive AI

Importance of Feature Representations in Machine Learning

align-center

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.

Kıvanç Tatar, Associate Professor in Interactive AI

Dimensionalty reduction

Kıvanç Tatar, Associate Professor in Interactive AI

Machine Learning

  • Supervised Learning
  • Unsupervised Learning

Detailed resources: https://www.ibm.com/think/machine-learning#605511093

Kıvanç Tatar, Associate Professor in Interactive AI

Supervised Learning - Regression

align-center

Image source: https://python.plainenglish.io/understanding-multiple-linear-regression-in-machine-learning-58e981ce7747

Kıvanç Tatar, Associate Professor in Interactive AI

Supervised Learning - Regression for Interactive Audio

https://www.youtube.com/watch?v=dPV-gCqy9j4

Kıvanç Tatar, Associate Professor in Interactive AI

Supervised Learning - Classification

align-center

Image source: https://medium.com/data-science/image-classification-in-10-minutes-with-mnist-dataset-54c35b77a38d

Kıvanç Tatar, Associate Professor in Interactive AI

Supervised Learning - Audio Classification

Mediapipe: https://ai.google.dev/edge/mediapipe/solutions/audio/audio_classifier

Kıvanç Tatar, Associate Professor in Interactive AI

Unsupervised Learning - Clustering

https://colah.github.io/posts/2014-10-Visualizing-MNIST/

Kıvanç Tatar, Associate Professor in Interactive AI

Unsupervised Learning - Clustering for Audio

Kıvanç Tatar, Associate Professor in Interactive AI

Deep Learning

Two great books to go deeper:

Kıvanç Tatar, Associate Professor in Interactive AI

align-center




align-center

Kıvanç Tatar, Associate Professor in Interactive AI

Deep Learning - Fun Visualizations

Kıvanç Tatar, Associate Professor in Interactive AI

DL based feature representations

Kıvanç Tatar, Associate Professor in Interactive AI

DL based feature representations - Audio Specific

Kıvanç Tatar, Associate Professor in Interactive AI

Summary

  • Features matter,
  • Context specifies the machine learning approach,
  • Various ML algorithms can be applied to a single case,
  • Simple ML approaches are as applicable as more complex ones.
Kıvanç Tatar, Associate Professor in Interactive AI

Thank you!

Feel free to reach out -> tatar@chalmers.se

Kıvanç Tatar, Associate Professor in Interactive AI