1 AIT Asian Institute of Technology

Bangkok taxi probe's big data processing for traffic hotspot analysis and visualization using the Apache Hadoop distributed system

AuthorRanjit, Saurav
Call NumberAIT Thesis no.RS-14-08
Subject(s)Big data--Data processing
Traffic flow--Thailand--Data processing

NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Remote Sensing and Geographic Information Systems
PublisherAsian Institute of Technology
Series StatementThesis ; no. RS-14-08
AbstractProbe Taxi have been operated in the Bangkok since the July of 2012 by Toyota Tsusho Electronics (Thailand) Co. Ltd. Approximately 10,000 probe taxi are utilized for the real time traffic information monitoring and it provide the meaningful information of the traffic condition in the region such as travel flow information, best routes etc. GPS devices have been installed in the probe taxies to collect spatial and temporal information every 3 to 5 seconds along with other necessary information. It provides the real time traffic information by calculating the spatial and temporal information of these probe taxies. The spatial information includes the latitude and longitude location of the taxies; on the other hand the temporal information includes the UNIX epoch time. At the same time, the other information such as device ID, speed, direction, taximeter, taxi engine state and dilution of precision is collected from probe taxis. The device ID is the International Mobile Station Equipment Identity also known as IMEI that has unique ID. The main challenge of this study is to handle big data. Approximately 50 millions of data is being collected every day with the file size of 3.5 giga byte. To process this big data, it takes lots of time and resources. Also, extract relevant information from this big data is another challenge along with the filtering out of irrelevant and error data. The objective of this study is to find the suitable method to process the big data and produce the relevant information. Apache Hadoop Distributed System is used to process the big data and Java based programming to perform the operation. The Apache Hadoop software library is a framework for distributed computing of large data across clusters of computers using programming models. It is designed to scale up from one machine to hundreds of machines, each offering local computation and storage. The Hadoop library is designed to detect and handle failure. The positioning errors of probe taxis depend upon the accuracy of the device itself and need to be filtered as much as possible.
Year2014
Corresponding Series Added EntryAsian Institute of Technology. Thesis ; no. RS-14-08
TypeThesis
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSRemote Sensing (RS)
Chairperson(s)Nagai, Masahiko
Examination Committee(s)Dailey, Mathew N.;Nakamura, Shinichi
DegreeThesis (M. Eng.) - Asian Institute of Technology, 2014


Usage Metrics
View Detail0
Read PDF0
Download PDF0