1 AIT Asian Institute of Technology

A robust document layout analysis algorithm for Vietnamese documents

AuthorNguyen Duc Thanh
Call NumberAIT Thesis no.CS-05-26
Subject(s)Computer algorithms

NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Science, School of Advanced Technologies
PublisherAsian Institute of Technology
Series StatementThesis ; no. CS-05-26
AbstractDocument Layout Analysis is an important step in OCR (Optical Character Recognition) system. However, because of diversity of document form such as font size, font style, line spacing, physical structure layout, etc, the correctness of segmentation algorithms is effected significantly, and that fact also makes problems in giving out a general document layout analysis algorithm. In this thesis, we recommend an algorithm to use in region segmentation and classification. For region segmentation, we construct a pyramidal quadtree structure corresponding to different resolutions of the image. Regions area segmented throughout analysis the document from image the top to the bottom level of the quad-tree structure. After that, segmented region are classified according to their type using the density and distribution of region's black pixel. We also test our method on a database of 100 Vietnamese document images and the correctness is 92.91% for text segmentation and 98.76% for text line segmentation
Year2005
Corresponding Series Added EntryAsian Institute of Technology. Thesis ; no. CS-05-26
TypeThesis
SchoolSchool of Advanced Technologies (SAT)
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSComputer Science (CS)
Chairperson(s)Afzulpurkar, Nitin V.;
Examination Committee(s)Batanov, Dentcho N.;Guha, Sumanta;
Scholarship Donor(s)MOET of Viet Nam Scholarship;
DegreeThesis (M.Sc.) - Asian Institute of Technology, 2005


Usage Metrics
View Detail0
Read PDF0
Download PDF0