PERSPECTIVES ON FULL REFERENCE STEREOSCOPIC IMAGE QUALITY ASSESSMENT

Md, Sameeulla Khan and Channappayya, Sumohana (2018) PERSPECTIVES ON FULL REFERENCE STEREOSCOPIC IMAGE QUALITY ASSESSMENT. PhD thesis, Indian Institute of Technology Hyderabad.

Full text not available from this repository. (Request a copy)

Abstract

In recent years, three-dimensional (3D) multimedia technologies have received wide attention as a result of a great impetus from the industry and academia [1]. Since 3D multimedia is the combination of two single views (or luminance image) its development and utility is on par with 2D multimedia technologies. The success of practical 3D applications can be attributed to the rapid development of 3D technologies, e.g., scene capture, reconstruction, compression, rendering, and display. This wave of 3D technology has enabled 3D capture and viewing capability on mobile phones which in turn has made the broadcast of 3D content over the internet a reality [2]. These advancements make the monitoring and maintenance of the perceptual quality of 3D content an important problem. While 3D content includes both image and video data, in this thesis, the focus will be restricted to stereo 3D (S3D) image quality assessment (IQA) algorithms. Since the ultimate consumer of the 3D content is a human subject, it would be appropriate to assess the quality with respect to subjective opinions. However, obtaining subjective opinion on large volumes of data is time consuming and expensive. These shortcomings demand for objective quality assessment. Objective quality assessment can be classi�ed into three types. (i) Full-reference (FR), (ii) Reduced-reference (RR) and (iii) No-reference (NR). In FR methods, the quality of a test S3D image is computed by comparing it with a reference (pristine/ distortionless) S3D image. In RR methods, the quality of an S3D image is computed by considering the features of reference and test S3D images. NR methods do not make use of the reference S3D image for the quality assessment task. In this thesis, we primarily focus on FR S3D image quality assessment (FRSIQA). S3D image quality measurement is a challenging task because it involves analysis of several perceptual factors such as the quality scores of both views, and the quality of depth perception. For both left and right views, conventional 2D IQA metrics partially solve the FRSIQA problem. Improved performance is achieved only after including 3D features. Traditionally, for measuring the perceptual quality of single viii view 2D images, 2D objective measures such as Peak Signal to Noise Ratio (PSNR) or Mean Squared Error (MSE) are used. However, neither PSNR nor MSE has been shown to correlate well with subjective judgement of image quality ([3], [4]). The structural similarity (SSIM) index [5] and its derivatives ([6], [7] and [8]) paved the way for 2D FRIQA metrics which mostly agree with the human judgment. They are based on the assumption that the human visual system (HVS) is highly sensitive to structural information. Other competitive 2D FRIQA methods include statistical and information theoretic approaches [9] [10], phase congruency approach [11], sparsity based approach [12] [13] and saliency based approach [14]. These 2D FRIQA methods only give a sub-optimal performance when applied to the FRSIQA problem ([3] [15] [16] [17]). Their performance is boosted with the addition of depth information especially in the case of asymmetric distortions. Disparity map is the basic 3D feature that can deal with the depth perception. Subsequently, the e�ectiveness of using disparity maps in conjunction with 2D FRIQA methods for solving the FRSIQA problem has been demonstrated by several authors in the literature [18], [19]. This thesis approaches the FRSIQA problem from di�erent perspectives namely (i) statistical approach, (ii) structural similarity approach, (iii) sparsity based approach and (iv) saliency based approach. All of these approaches utilize disparity information. Apart from solutions to the FRSIQA problem, this thesis has made contributions to 2D FRIQA and S3D FRVQA. In the statistical approach, a natural scene statistical model of stereoscopic images is employed. Empirical studies of the joint statistics of luminance and disparity images (or wavelet coe�cients) of natural stereoscopic scenes have resulted in three important �ndings: a) the marginal statistics are modeled well by the generalized Gaussian distribution (GGD) b) the joint distribution of luminance and disparity subband coe�cients of natural stereoscopic scenes can be modeled using bivariate generalized Gaussian distribution (BGGD) and c) there exists signi�cant correlation ix coe�cients of luminance and disparity. Inspired by these �ndings, we propose a fullreference image quality assessment algorithm dubbed STeReoscopic Image Quality Evaluator (STRIQE). We show that the parameters of the GGD �ts of luminance wavelet coe�cients along with correlation values form excellent features. Importantly, we demonstrate that the use of disparity information (via correlation) results in a consistent improvement in the performance of the algorithm. These features posses excellent distortion discrimination abilities that make them amenable to NRSIQA. The performance of the algorithms are evaluated over popular databases and shown to be competitive with the state-of-the-art reference algorithms. The e�cacy of the both algorithms are further highlighted by its near-linear relation with subjective scores, low root mean squared error (RMSE), and consistently good performance over both symmetric and asymmetric distortions. The structural similarity approach is based on the intuition that both image and depth quality can be estimated by observing the variation of structural information in single views and disparity maps. The e�ect of distortion on luminance perception and depth perception is usually di�erent, even though depth is estimated from luminance images. Therefore, we present an FRSIQA algorithm that rates stereoscopic images in proportion to the quality of individual luminance image perception and the quality of depth perception. The luminance and depth quality is obtained by applying the robust Multiscale-SSIM (MS-SSIM) index on both luminance and disparity maps respectively. We propose a novel multi-scale approach for combining the luminance and depth scores from the left and right images into a single quality score per stereo image. We also demonstrate that a small amount of distortion does not signi�cantly a�ect depth perception. Further, heavy distortion in stereopairs will result in signi�cant loss of depth perception. Our algorithm performs competitively over standard databases and is called the 3D-MS-SSIM index. The sparsity based approach deals with the sparse representation of luminance and disparity maps. The primary challenge lies in dealing with the sparsity of disparity x maps in conjunction with the sparsity of luminance images. Although analysing the sparsity of images is su�cient to bring out the quality of luminance images, the e�ectiveness of sparsity in quantifying depth quality is yet to be fully understood. Though, ideally the principle in dealing with sparsity of luminance and disparity maps are similar, it di�ers signi�cantly in the detailed implementation. We present a full reference Sparsity-based Quality Assessment of Stereo Images (SQASI) that is aimed at this understanding. The saliency approach is based on the intuition that S3D saliency can be segregated into two components namely (i) image saliency and (ii) depth saliency, each of which can be individually studied for �nding luminance and depth quality. When viewing an S3D image, we hypothesize that while most of the contribution to saliency is provided by the 2D image, a small but signi�cant contribution is provided by the depth component. Further, we claim that only a subset of image edges contribute to depth perception while viewing an S3D image. In this thesis, we propose a systematic approach for depth saliency estimation, called Salient Edges with respect to Depth perception (SED) which localizes the depth-salient edges in an S3D image. We demonstrate the utility of SED in full reference stereoscopic image quality assessment (FRSIQA) called as SED based Stereo Quality Index (SSQI). We consider gradient magnitude and inter-gradient maps for predicting structural similarity. A coarse quality estimate is derived �rst by comparing the 2D saliency and gradient maps of reference and test stereo pairs. We re�ne this quality using SED maps for evaluating depth quality. Finally, we combine this luminance and depth quality to obtain an overall stereo image quality. We perform a comprehensive evaluation of our metric on seven publicly available S3D IQA databases. This proposed metric shows competitive performance on all seven databases with state-of-the-art performance on three of them. The insights gained from FRSIQA have allowed us to propose improvements to other visual tasks including FRIQA, FRSVQA and ocular dominance in stereo vision. xi We have made a contribution to 2D FRIQA which primarily focuses on full HD databases. Conventional image quality assessment metrics such as SSIM [5], FSIM [11] etc. perform well on standard de�nition (SD) and enhanced de�nition (ED) images. However, current camera devices support high resolution images such as high de�nition (HD), full HD and ultra HD. In our evaluation we have found that existing 2D FRIQA methods do not perform well over high resolution images, especially over full HD and higher resolution images. To address this shortcoming, we present a saliency based 2D FRIQA approach whose performance is on par with existing methods on SD databases but shows state-of-the-art results over full HD databases. We consider gradient magnitudes and inter-gradient maps for predicting structural similarity. The choice of saliency and gradient methods results in best performance over full HD databases. Our proposed algorithm computes quality at multiple spatial scales and combines these scores for overall quality prediction. The number of scales is made a function of image resolution. We also proposed a full reference stereoscopic video quality assessment (FRSVQA) that is based on the saliency based FRSIQA and 2D FRIQA algorithms. In this work, we present a spatio-depth saliency and motion strength based FRSVQA. Initially, we obtain a spatial distortion map on every video frame, using two di�erent methods, to estimate spatial quality. The spatial distortion map is then re�ned by the depth salient maps to estimate depth quality. We also estimate the temporal quality by re- �ning the spatial distortion map with the inter-frame di�erence map at the locations speci�ed by motion edges. The spatial, depth and temporal qualities are systematically combined and averaged over the frames to estimate the overall stereo video quality metric. In this thesis, we also contributed to measure ocular dominance (OD), which is one of the properties of stereo vision. In all of our FRSIQA perspectives, we pool the left and right features or quality scores of test stereopair using a weighting strategy. These weights re ects the idea of OD. For a given stereopair, both single views look alike xii perceptually. But when viewed stereoscopically, one can experience the dominance of one view over the other. In case of pristine stereo pairs, the dominance may be negligible but it exhibits signi�cant OD in asymmetrically distorted stereopairs. To the best of our knowledge, there do not exist metrics that measure the OD of stereopairs. To address this lacuna, we propose a metric whose performance is initially validated over pristine stereopairs and then over asymmetrically distorted stereo pairs. A potential application for OD is in prioritizing the left or right view for stereoscopic image encoding.

[error in script]

IITH Creators: