1 AIT Asian Institute of Technology

Applying semantic suffix nets to clustering for search results

AuthorJongkol Janruang
Call NumberAIT Diss. no.CS-13-04
Subject(s)Information storage and retrieval
Electronic information resource searching
Semantics
Web search engines

NoteA dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, School of Engineering and Technology
PublisherAsian Institute of Technology
Series StatementDissertation ; no. CS-13-04
AbstractSearch engines are an invaluable tool to retrieve information from the internet. However, theresults of search engines are an enormous amount of data and there will leads to the timeconsuming task of finding relevant in the huge of data. The most solution to resolve thisproblem is the search results clustering (SRC) techniques, work on snippets; a snippet is ashort text summarizing the context of search results. The search clustering engines aim atcollect search results into different groups of relevant results for users. Users can find thedesired information that they need easily because they have cluster labels as navigators. Inaddition, the search results clustering use snippets and the meaning of the words to identifyclusters of search results.This dissertation presents semantic suffix net (SSN) and generalized semantic suffix net(GSSN) which are a new semantic search structure. There use both string matching andsemantic similarity as conditions to construct a net form which can be used as a structureto represent suffixes of a string. In particular, GSSN is derived by combining the SSN pairsthrough suffix links and directed links, thus it can be used as a structure to represent suffixesof a set of strings. Therefore, semantic suffix net clustering (SSNC) is also proposed as a newsemantic search results clustering which uses both string matching and semantic similarityas conditions to group the semantically similar snippets. The semantic similarity measure isachieved using synsets of WordNet database when its synsets are created by meaning of thewords.The results of SSNC are compared with STC, MSRC, SSTC, STC+GSSN, and CFWMSto evaluate its performance. F-measure, precision and recall are estimated in the term ofeffectiveness and execution time and size of data structure are assessed for efficiency. Thereal-world data testing are created from 30 queries on DMOZ.com, SSNC is executed onthem. For the effectiveness, SSNC returned smaller size cluster when comparing with allalgorithms. According to the recall result and F-measure for SSNC are decrease becauseSSNC uses both label and documents as conditions to calculate the overlap ratio for iden-tifying final clusters. On the other hand, the execution time for SSNC is quicker and datastructures of smaller size. Thus, efficiency of SSNC is better than the current semantic clus-tering for search results
Year2013
Corresponding Series Added EntryAsian Institute of Technology. Dissertation ; no. CS-13-04
TypeDissertation
SchoolSchool of Engineering and Technology (SET)
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSComputer Science (CS)
Chairperson(s)Guha, Sumanta;
Examination Committee(s)Vatcharaporn Esichaikul ;Teerapat Sanguankotchakorn;
Scholarship Donor(s)Rajamangala University of Technology Isan, Thailand;
DegreeThesis (Ph.D.) - Asian Institute of Technology, 2013


Usage Metrics
View Detail0
Read PDF0
Download PDF0