1
Deep learning models for image enhancement and interpretation | |
Author | Farooq, Muhammad |
Call Number | AIT Diss no.CS-21-01 |
Subject(s) | Deep learning (Machine learning) Image processing--Mathematical models |
Note | A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science |
Publisher | Asian Institute of Technology |
Abstract | Deep learning methods have produced remarkable results in solving many classic problems in the fields of computer vision and image processing in the last few years. Many previ ously unsolved problems have been solved using deep learning models that transform input images into output images or other representations such as distributions over a set of cat egories. Super-resolution, denoising, and colorization are some examples from the image processing field. In these examples the input is a degraded image (low-resolution, noisy, or grayscale), whereas the output is a high-quality image. Flow estimation and change detec tion are examples from the computer vision field. In these examples, the input is a pair of images and the output is a flow field or a mask indicating where the two input images are different. This dissertation explores the use of deep learning techniques to solve the specific image enhancement and interpretation problems of face super-resolution (SR) and vehicle change detection (VCD). The goal of face SR is to enhance the quality and resolution of human face images in low quality images such as surveillance video footage. Face super-resolution has important ap plications in problems such as face recognition, 3D face reconstruction, face alignment, and face parsing. The majority of published face SR reconstruction work deals with synthetic data or with videos recorded under carefully controlled conditions, and there is relatively little published work on real-world reconstruction of LR human faces acquired under more challenging conditions such as surveillance camera settings. This dissertation focuses on SR reconstruction of human faces extracted from real world, low quality surveillance video. Change detection is another important problem in computer vision that up till now has been primarily applied in the area of geographic information systems, especially on satellite im agery. Change detection based on satellite images is somewhat simpler than change detection for arbitrary 3D objects, because successive images of the same area can be precisely aligned using planar homographies then compared directly. The challenges are to ignore noise, cloud cover, shadows, and atmospheric differences when making the before/after comparison. 3D objects have similar issues but are also susceptible to out-of-plane rotation, which requires nonlinear warping rather than homography mapping for alignment. The main challenges are to ignore noise, cloud cover, shadows, and atmospheric differences when making the before/after comparison. Change detection can be much more challenging, however, when images of arbitrary 3D objects that have been acquired from different points of view must be compared. I deal with this more challenging problem in this dissertation.Most super-resolution (SR) methods proposed to date do not use real ground truth high resolution (HR) and low-resolution (LR) image pairs to learn models; instead, the vast ma jority of methods use synthetic LR images generated by undersampling the HR images. This approach yields excellent performance on similar synthetic LR data, but on real-world poor quality surveillance video footage, they suffer from performance degradation. A promising alternative is to apply recent advances in style transfer for unpaired datasets, but state-of the-art work along these lines (Bulat et al. 2018) has used LR images and HR images from completely different datasets, introducing more variation between the HR and LR domains than strictly necessary. In this dissertation, I propose methods that overcome both of these limitations, applying unpaired style transfer learning methods to face SR using real-world HR and LR datasets that share important properties. The key is to acquire roughly paired training data from a high quality main stream and a lower quality sub stream of the same IP camera. Based on this principle, I have constructed four datasets comprising more than 400 people, with 1–15 weakly aligned real HR-LR pairs for each subject. I describe a style trans fer Cycle GAN approach that produces impressive super-resolved images for low-quality test images never seen during training. The second problem I target is vehicle change detection (VCD), aiming to solve some of the problems arising in mobility applications related to inspection of a vehicle after it has been used. The vehicle owner’s interest is to detect any damage that may have occurred to his or her vehicle while it was in use, while the user’s interest is to document that any damage visible on the vehicle was pre-existing. In order to address this problem, I again turn to deep learning models: one for car masking, one for image alignment, one for damage image generation, and one for change detection. We design and implement a deep learning model that precisely aligns two images of the same car by warping one image onto the other. I utilize a separate deep learning model to generate sample damage images from undamaged images, and I also design and implement deep learning models that fuse damage patches onto car images in a realistic way. Finally, I apply deep learning models that detect changes on vehicles from a pair of aligned input images. The contributions of the dissertation are 1) I introduce a deep learning model called SR CGAN (an adopted cycle GAN) that produces impressive super-resolved images for real world low-quality test images; 2) I introduce a new deep learning model called IACGAN that precisely aligns two images of the same vehicle by warping one image onto the other; 3) I demonstrate the feasibility of utilizing a deep learning model to generate sample damage images from undamaged images; 4) I develop new deep learning models that fuse damage patches onto car images in a realistic way; 5) I provide a baseline for a deep learning models that detects changes on vehicles from pairs of aligned input images. In this dissertation, I lay the groundwork for a longer term research program on deep learning models for image enhancement and interpretation. In summary, this dissertation addresses the use of deep generative models to address diffi cult problems in multiple visual domains. In particular, I explore the use of deep generative models to solve the face super-resolution (SR) and vehicle change detection (VCD) prob lems. The techniques I introduce could be applied to a wide variety of reconstruction and inference problem in visual domains. The proposed methods for face SR and VCD achieve better results than recent methods. The face SR methods achieve a 45.5% improvment in FID scores for reconstruction of high resolution faces compared to the state of the art on my most degradated dataset. |
Year | 2021 |
Type | Dissertation |
School | School of Engineering and Technology |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Computer Science (CS) |
Chairperson(s) | Dailey, Mathew N.; |
Examination Committee(s) | Manukid Parnichkun;Mongkol Ekpanyapong; |
Scholarship Donor(s) | University of the Punjab (PU) Lahore, Pakistan;AIT Fellowship; |
Degree | Thesis (Ph.D.) - Asian Institute of Technology, 2021 |