1
Knowledge discovery and self-organizing systems | |
Author | Acharya, Sushil |
Call Number | AIT Diss. no.CS-97-05 |
Subject(s) | Neural networks (Computer science) Self-organizing systems |
Note | A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Engineering |
Publisher | Asian Institute of Technology |
Abstract | Given the increasing number of databases, their proliferation, their growing complexity and their manifestation as distributed knowledge bases, discovering useful knowledge is becoming more and more difficult. For the same reasons research contributions made in this area would have profound and global impact. This research investigates the Data Clustering methodologies and the Knowledge Discovery mechanisms in general and pays special attention to the possibility of formulating a Knowledge Discovery methodology using Self-Organization. Experimental studies have been conducted on databases using Kohonen's Self-Organizing Neural Network paradigm. Data Clustering through Analytical, Self-Organizing Maps, and Adaptive Resonance Theory and Knowledge Discovery through Regularities Search Method and Evident Based Method, have been studied in detail. Possible application areas for such a methodology has also been looked into and the methodology has been used to design a system to promote software reuse by grouping instructions that are functionally close to each other, which presents clear advantages for programmer productivity and program reliability. There are two major issues to be addressed to facilitate Knowledge Discovery. One of these pertains to the representation scheme of the prevailing databases. The second issue is concerning visualization. While formulating our methodology an interesting issue we encountered was the importance of training data sets that are used by Self Organizing Maps to train the networks. The training data sets should be generated in a manner that reduces data representation and network training complexity as well as network training costs. One of the important features of our methodology consists of using a Training Data Set Generator for generating training data sets from the Case database’s attributes domain supplemented by a statistical data analyzer to ensure that the generated data set resembles practical cases. Another interesting area we have researched, and experimentally analyzed is that regarding the order of the input vectors provided as input to the Self Organizing Map. Our experiments have shown that the order changes the distribution of the clusters. Basically we have concentrated on two types of input distribution: Records Distribution and Attributes Distribution. The reason for the difference in the clusters formed has been seen mainly due to: Data Representation, Data Input Order, Cluster Identification Scheme, and Random Numbers Generating Routine. The input statistical distribution is observed to be reflected in the output clusters. Hence a relationship between the input distribution and the output distribution is seen to exist. This relationship supplements our experimental analysis on the clusters generated through statistical processes and Kohonen's algorithm that is seen to maintain the same type of distribution. The discovered knowledge is compared with the results obtained by the Regularities Search Model, 49er b, of Zytkow and Zembowicz. We have seen that our results and the results obtained by the 49er model show similarities. This research has also led to further conclusions related to the convergence and other network related issues. Experimental analysis has been carried out in the convergence issue where appropriate values for the Network Size, Neighborhood Reduction Value and Training Constants are presented and discussed. Besides effects on Network Training of Data Operations like DELETE, APPEND and MODIFY on the Case Databases are also analyzed. The Knowledge Discovery methodology has been tested on large standard databases. Comparisons have been made with other techniques on knowledge bases on the same or similar large databases. Specifically the experiments are carried out on Glass and Breast Cancer Databases. |
Year | 1997 |
Type | Dissertation |
School | School of Engineering and Technology (SET) |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Computer Science (CS) |
Chairperson(s) | Sadananda, R.; |
Examination Committee(s) | Qi, Yulu;Kaew Nualchawee; |
Degree | Thesis (Ph.D.) - Asian Institute of Technology, 1997 |