AIT Asian Institute of Technology

1 AIT Asian Institute of Technology

> > >

A machine learning approach for web caching
Author	Areerat Songwattana
Call Number	AIT Diss. no.CS-14-01
Subject(s)	Cache memory World Wide Web Machine learning
Note	A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Sciences, School of Engineering and Technology
Publisher	Asian Institute of Technology
Series Statement	Dissertation ; no. CS-14-01
Abstract	Web caching has been widely used to alleviate Internet traffic congestion in World Wide Web (WWW) services. To reduce download throughput, an effective strategy for Web cache management is needed to exploit Web usage information in order to make decision for evicting an unnecessary object stored in case of cache saturation. Recently, there have been several strategies proposed, in either replacement or prefetching techniques. However, normally a prefetching method requires an additional cost in finding suitable objects and in preloading them into the cache. These extra actions may increase runtime for finding the objects to be prefetched and the size of storage to keep the prefetched objects, triggering the tradeoff between optimizing resource utilization and maximizing quality of services. For cache replacement techniques, there are several characteristics involving with each replacement policies, which are prioritizing cached objects accordingly to some features; time request response, frequency of references, size, and etc. By this, Web requested objects or documents are evicted according to the suggested preference. Some well-known policies are the Least Recently Used (LRU) policy, the Least Frequently Used (LFU) policy, Greedy-Dual-Size (GDS) policy, and Greedy-Dual-Size-Frequency (GDSF) policy. Several works have mentioned on this tradeoff and have attempted to propose improved techniques for integrating replacement and prefetching. A number of researchers tried to apply statistic methods to learn usage patterns to improve cache management policy. This thesis proposes a so-called Learning Based Replacement Algorithm (LBR), a hybrid approach towards an efficient replacement model for Web caching by combining a LRU replacement method and a machine-learning-based method (naïve Bayes) to predict the probabilities that an existing page will be revisit ed by a succeeding request, from the access history in a web log. The learned probabilities become information on which URL objects in cache should be kept or evicted. The learning-based model predicts the re-reference probability. By experiments with three datasets called MSN, bo2 and sj, the LBR can gain up to 85.53% accuracy and 82.45% f-measure for MSN, 96.78% accuracy and 85.90% f-measure for bo2 and 96.14% accuracy and 67.30% f-measure for sj. In the simulation experiment, compared to the common methods, GDSF, LFU and LRU, the LBR in creases the hit rate up to 1.03%-9.04% for MSN, 1.22%-10.21% for bo2 and 0.72%-8.51% for sj. The byte-hit rate improvement is 0.34%-9.08% for MSN, 0.75%-13.19% for bo2 and 0.09%-10.68% for sj. In the viewpoint of the computational cost, two considered costs are (1) training cost and (2) replacement cost. Both of them are complexity of O(n×log(C)). That is, our proposed Web cache replacement method consumes linear time to classify the revisited group and non-revisit group with efficient prediction results.
Year	2014
Corresponding Series Added Entry	Asian Institute of Technology. Dissertation ; no. CS-14-01
Type	Dissertation
School	School of Engineering and Technology (SET)
Department	Department of Information and Communications Technologies (DICT)
Academic Program/FoS	Computer Science (CS)
Chairperson(s)	Dailey, Matthew N. ;Thanaruk Theeramunkong ;
Examination Committee(s)	Guha, Sumanta ;Kiyoshi, Honda;Tan, Yasuo;
Scholarship Donor(s)	Rangsit University, Thailand;
Degree	Thesis (Ph.D.) - Asian Institute of Technology, 2014