AIT Asian Institute of Technology

1 AIT Asian Institute of Technology

> > >

Music recommender system combining audio features with graph neural networks
Author	Tuladhar, Sunny Kumar
Call Number	AIT Thesis no.DSAI-23-03
Subject(s)	Recommender systems (Information filtering) Music--Data processing Neural networks (Computer science)
Note	A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Data Science and Artificial Intelligence
Publisher	Asian Institute of Technology
Abstract	With the music industry growing faster than ever and listeners gaining easier and faster access to a huge collection of songs, music recommender systems are becoming more and more relevant. The majority of the views for major multimedia platforms such as YouTube have been attributed to recommended content. Many approaches to rec ommendation systems have been explored in the past decade, from content-based fil tering to collaborative filtering and the combination of both. Graph neural networks (GNNs) have been shown to perform very well in recommendation systems using bi partite graphs. GNNs however have not been combined with content and thus perform poorly in cold-start problems. In this thesis, I incorporate audio features extracted from songs as the content fed into GNNs and find that it improves the recall@k value of the system’s recommendations. I use the LightGCN since it is the current state of the art and has only one layer of trainable embeddings. I extract the audio features, using the musicnn library, which was originally designed for audio tagging. I use the Spotify Mil lion Playlist Dataset(MPD) which is a dataset of the playlists and songs they contain. I use preview audio files of songs, extracted audio embeddings, and feed these into the LightGCN as initial song embeddings then perform training for link prediction. The model predicts which playlist a given song would belong to. The results were evaluated using the recall@k metric and compared with the original model initialized with ran dom normal embeddings. The random embeddings gave a recall@500 of 0.51, and the extracted embeddings gave a value of 0.57. The extracted audio embeddings showed a 5% increase in recall for both trained and fixed embedding settings for the songs, which shows that the embedding initializations improve the performance of the LightGCN. The model was also evaluated for results in particular cases of playlist prediction for songs and song prediction for playlists and it showed reasonable results for playlist continuation but unconvincing results for the item cold-start problem.
Year	2023
Type	Thesis
School	School of Engineering and Technology
Department	Department of Information and Communications Technologies (DICT)
Academic Program/FoS	Data Science and Artificial Intelligence (DSAI)
Chairperson(s)	Dailey, Matthew N.
Examination Committee(s)	Mongkol Ekpanyapong;Chaklam Silpasuwanchai
Scholarship Donor(s)	AIT Scholarships
Degree	Thesis (M. Eng.) - Asian Institute of Technology, 2023