1
An attention and concept hierarchy-based approach to dataset category and tag recommendation | |
Author | Natnaree Sornkongdang |
Call Number | AIT Thesis no.DSAI-22-05 |
Subject(s) | Machine learning Information retrieval Recommender systems (Information filtering) |
Note | A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Data Science and Artificial Intelligence |
Publisher | Asian Institute of Technology |
Abstract | The improper tag organization has been derived by data providers who provide data categories and tags for a dataset to be published on the ThOGD portal. They have currently guided by the available autocomplete function in the portal. With this application, data categories and tags to be suggested to data providers are forecasted from the historical data that was provided by previous data providers. This results to a consequence of several datasets with similar contents but are labeled with different tags in similar meaning are found in the portal. Besides, data consumers cannot get the information being matched to their preference according to the filtering of data category and tag. In this study, our contributions for overcoming the above-mentioned challenges have two main sections, including the attention-based categorical identifier and the topic hierarchy-based categorical concept hierarchies. With the use of Attentive Deep Supervision, there is a weighted effect on loss optimization of the categorical identifier. With the use of Topic Hierarchy, Latent Dirichlet Allocation (LDA) topic modeling is utilized for potential tag term extraction, Heterogeneous Evidences are exploited for relation identification, and Anytree is employed for hierarchy construction. By applying these approaches, the macro average of precision and F1-score of the attention-based identifier improves by 0.6640 % and 0.5570 %, respectively. The micro average improves by 0.8060 %, and 0.6980 %, successively. Meanwhile, the concept hierarchy based categorical concept hierarchies can provide comprehensive tags related to a dataset to be published because of the recommendation strategy that assigning tags with the same highest important weight to the same rank. |
Year | 2022 |
Type | Thesis |
School | School of Engineering and Technology |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Data Science and Artificial Intelligence (DSAI) |
Chairperson(s) | Chutiporn Anutariya; |
Examination Committee(s) | Dailey, Matthew N.;Nuttapong Sanglerdsinlapachai; |
Scholarship Donor(s) | AIT Scholarships; |
Degree | Thesis (M. Sc.) - Asian Institute of Technology, 2022 |