Decoding Music from Brain Activity: Exploring the Neural Correlates of Music Perception

Matteo Ferrante*, Matteo Ciferri*, Nicola Toschi

fMRI experiment

3D Isometric Flat  Conceptual Illustration of MRI Tomography, Magnetic Resonance Imaging
music  icon

5 participants listen to 540 songs while brain activity is recorded with fMRI

CLAP

model

Human Brain Illustration

An encoding model of brain activity was built to predict audio responsive regions from audio features extracted with CLAP model. The responsive regions were further be used as inputs for decoding models to decode music from brain activity.

Encoding Pipeline

music  icon

Music Brain

Decoding model

Brain activity is decoded with a retrieval system that outputs musical genre and a candidate song

Decoding Pipeline

Abstract

This study investigates the relationship between music and brain activity patterns, aiming to bridge the gap between music perception and its neural representation. We leverage the GTZen music fMRI dataset, encompassing 5 subjects who listened to 540 tracks (15s each) from 10 genres while undergoing 3T fMRI scans (TR = 1.5s). Despite the limitations of fMRI's temporal resolution, music elicits robust brain responses.

To explore this concept, we constructed a decoding pipeline capable of retrieving musical information from brain activity. Preprocessing was conducted using fMRIprep. Subsequently, an encoding model was built using CLAP to transform audio into feature representations. This was followed by a Ridge regression to predict brain activity (averaged across 15s listening blocks). This model served to identify brain regions responsive to music.

Voxel values from these responsive regions were then used in three experiments:

  1. Predicting Audio Features: Ridge regression was applied to predict CLAP clusters in the audio feature space, achieving an accuracy of 37±4%, significantly exceeding chance level.
  2. Music Genre Classification: Brain activity was used to directly predict music genre, attaining an accuracy of 51±3% relative to the ceiling performance of the CLAP model (74,6%).
  3. Music Retrieval: A brain to latent model was combined with nearest neighbor to directly retrieve music tracks from brain activity. Examples of retrieved music can be found here below.

These findings demonstrate the feasibility of decoding musical information from brain activity patterns. While further research is needed to refine the decoding process, this approach holds promise for advancing our understanding of music perception and its potential applications in music therapy and other domains.


Jazz Stimulus

Decoded

Metal Stimulus

Decoded

Disco Stimulus

Decoded

Pop Stimulus

Decoded