1 AIT Asian Institute of Technology

Music recommender system combining audio features with graph neural networks

AuthorTuladhar, Sunny Kumar
Call NumberAIT Thesis no.DSAI-23-03
Subject(s)Recommender systems (Information filtering)
Music--Data processing
Neural networks (Computer science)
NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Data Science and Artificial Intelligence
PublisherAsian Institute of Technology
AbstractWith the music industry growing faster than ever and listeners gaining easier and faster access to a huge collection of songs, music recommender systems are becoming more and more relevant. The majority of the views for major multimedia platforms such as YouTube have been attributed to recommended content. Many approaches to rec ommendation systems have been explored in the past decade, from content-based fil tering to collaborative filtering and the combination of both. Graph neural networks (GNNs) have been shown to perform very well in recommendation systems using bi partite graphs. GNNs however have not been combined with content and thus perform poorly in cold-start problems. In this thesis, I incorporate audio features extracted from songs as the content fed into GNNs and find that it improves the recall@k value of the system’s recommendations. I use the LightGCN since it is the current state of the art and has only one layer of trainable embeddings. I extract the audio features, using the musicnn library, which was originally designed for audio tagging. I use the Spotify Mil lion Playlist Dataset(MPD) which is a dataset of the playlists and songs they contain. I use preview audio files of songs, extracted audio embeddings, and feed these into the LightGCN as initial song embeddings then perform training for link prediction. The model predicts which playlist a given song would belong to. The results were evaluated using the recall@k metric and compared with the original model initialized with ran dom normal embeddings. The random embeddings gave a recall@500 of 0.51, and the extracted embeddings gave a value of 0.57. The extracted audio embeddings showed a 5% increase in recall for both trained and fixed embedding settings for the songs, which shows that the embedding initializations improve the performance of the LightGCN. The model was also evaluated for results in particular cases of playlist prediction for songs and song prediction for playlists and it showed reasonable results for playlist continuation but unconvincing results for the item cold-start problem.
Year2023
TypeThesis
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSData Science and Artificial Intelligence (DSAI)
Chairperson(s)Dailey, Matthew N.
Examination Committee(s)Mongkol Ekpanyapong;Chaklam Silpasuwanchai
Scholarship Donor(s)AIT Scholarships
DegreeThesis (M. Eng.) - Asian Institute of Technology, 2023


Usage Metrics
View Detail0
Read PDF0
Download PDF0