Md, Sameeulla Khan and Channappayya, Sumohana
(2018)
PERSPECTIVES ON FULL REFERENCE STEREOSCOPIC IMAGE QUALITY
ASSESSMENT.
PhD thesis, Indian Institute of Technology Hyderabad.
Full text not available from this repository.
(
Request a copy)
Abstract
In recent years, three-dimensional (3D) multimedia technologies have received wide
attention as a result of a great impetus from the industry and academia [1]. Since 3D
multimedia is the combination of two single views (or luminance image) its development
and utility is on par with 2D multimedia technologies. The success of practical
3D applications can be attributed to the rapid development of 3D technologies, e.g.,
scene capture, reconstruction, compression, rendering, and display. This wave of 3D
technology has enabled 3D capture and viewing capability on mobile phones which
in turn has made the broadcast of 3D content over the internet a reality [2]. These
advancements make the monitoring and maintenance of the perceptual quality of 3D
content an important problem. While 3D content includes both image and video data,
in this thesis, the focus will be restricted to stereo 3D (S3D) image quality assessment
(IQA) algorithms. Since the ultimate consumer of the 3D content is a human subject,
it would be appropriate to assess the quality with respect to subjective opinions.
However, obtaining subjective opinion on large volumes of data is time consuming
and expensive. These shortcomings demand for objective quality assessment.
Objective quality assessment can be classi�ed into three types. (i) Full-reference
(FR), (ii) Reduced-reference (RR) and (iii) No-reference (NR). In FR methods, the
quality of a test S3D image is computed by comparing it with a reference (pristine/
distortionless) S3D image. In RR methods, the quality of an S3D image is computed
by considering the features of reference and test S3D images. NR methods do not
make use of the reference S3D image for the quality assessment task.
In this thesis, we primarily focus on FR S3D image quality assessment (FRSIQA).
S3D image quality measurement is a challenging task because it involves analysis of
several perceptual factors such as the quality scores of both views, and the quality
of depth perception. For both left and right views, conventional 2D IQA metrics
partially solve the FRSIQA problem. Improved performance is achieved only after
including 3D features. Traditionally, for measuring the perceptual quality of single
viii
view 2D images, 2D objective measures such as Peak Signal to Noise Ratio (PSNR)
or Mean Squared Error (MSE) are used. However, neither PSNR nor MSE has been
shown to correlate well with subjective judgement of image quality ([3], [4]). The
structural similarity (SSIM) index [5] and its derivatives ([6], [7] and [8]) paved the
way for 2D FRIQA metrics which mostly agree with the human judgment. They are
based on the assumption that the human visual system (HVS) is highly sensitive to
structural information. Other competitive 2D FRIQA methods include statistical and
information theoretic approaches [9] [10], phase congruency approach [11], sparsity
based approach [12] [13] and saliency based approach [14].
These 2D FRIQA methods only give a sub-optimal performance when applied to
the FRSIQA problem ([3] [15] [16] [17]). Their performance is boosted with the addition
of depth information especially in the case of asymmetric distortions. Disparity
map is the basic 3D feature that can deal with the depth perception. Subsequently,
the e�ectiveness of using disparity maps in conjunction with 2D FRIQA methods
for solving the FRSIQA problem has been demonstrated by several authors in the
literature [18], [19].
This thesis approaches the FRSIQA problem from di�erent perspectives namely
(i) statistical approach, (ii) structural similarity approach, (iii) sparsity based approach
and (iv) saliency based approach. All of these approaches utilize disparity
information. Apart from solutions to the FRSIQA problem, this thesis has made
contributions to 2D FRIQA and S3D FRVQA.
In the statistical approach, a natural scene statistical model of stereoscopic images
is employed. Empirical studies of the joint statistics of luminance and disparity
images (or wavelet coe�cients) of natural stereoscopic scenes have resulted in three
important �ndings: a) the marginal statistics are modeled well by the generalized
Gaussian distribution (GGD) b) the joint distribution of luminance and disparity
subband coe�cients of natural stereoscopic scenes can be modeled using bivariate
generalized Gaussian distribution (BGGD) and c) there exists signi�cant correlation
ix
coe�cients of luminance and disparity. Inspired by these �ndings, we propose a fullreference
image quality assessment algorithm dubbed STeReoscopic Image Quality
Evaluator (STRIQE). We show that the parameters of the GGD �ts of luminance
wavelet coe�cients along with correlation values form excellent features. Importantly,
we demonstrate that the use of disparity information (via correlation) results in a
consistent improvement in the performance of the algorithm. These features posses
excellent distortion discrimination abilities that make them amenable to NRSIQA.
The performance of the algorithms are evaluated over popular databases and shown to
be competitive with the state-of-the-art reference algorithms. The e�cacy of the both
algorithms are further highlighted by its near-linear relation with subjective scores,
low root mean squared error (RMSE), and consistently good performance over both
symmetric and asymmetric distortions.
The structural similarity approach is based on the intuition that both image and
depth quality can be estimated by observing the variation of structural information
in single views and disparity maps. The e�ect of distortion on luminance perception
and depth perception is usually di�erent, even though depth is estimated from
luminance images. Therefore, we present an FRSIQA algorithm that rates stereoscopic
images in proportion to the quality of individual luminance image perception
and the quality of depth perception. The luminance and depth quality is obtained
by applying the robust Multiscale-SSIM (MS-SSIM) index on both luminance and
disparity maps respectively. We propose a novel multi-scale approach for combining
the luminance and depth scores from the left and right images into a single quality
score per stereo image. We also demonstrate that a small amount of distortion does
not signi�cantly a�ect depth perception. Further, heavy distortion in stereopairs will
result in signi�cant loss of depth perception. Our algorithm performs competitively
over standard databases and is called the 3D-MS-SSIM index.
The sparsity based approach deals with the sparse representation of luminance and
disparity maps. The primary challenge lies in dealing with the sparsity of disparity
x
maps in conjunction with the sparsity of luminance images. Although analysing the
sparsity of images is su�cient to bring out the quality of luminance images, the
e�ectiveness of sparsity in quantifying depth quality is yet to be fully understood.
Though, ideally the principle in dealing with sparsity of luminance and disparity
maps are similar, it di�ers signi�cantly in the detailed implementation. We present
a full reference Sparsity-based Quality Assessment of Stereo Images (SQASI) that is
aimed at this understanding.
The saliency approach is based on the intuition that S3D saliency can be segregated
into two components namely (i) image saliency and (ii) depth saliency, each
of which can be individually studied for �nding luminance and depth quality. When
viewing an S3D image, we hypothesize that while most of the contribution to saliency
is provided by the 2D image, a small but signi�cant contribution is provided by the
depth component. Further, we claim that only a subset of image edges contribute
to depth perception while viewing an S3D image. In this thesis, we propose a systematic
approach for depth saliency estimation, called Salient Edges with respect to
Depth perception (SED) which localizes the depth-salient edges in an S3D image.
We demonstrate the utility of SED in full reference stereoscopic image quality assessment
(FRSIQA) called as SED based Stereo Quality Index (SSQI). We consider
gradient magnitude and inter-gradient maps for predicting structural similarity. A
coarse quality estimate is derived �rst by comparing the 2D saliency and gradient
maps of reference and test stereo pairs. We re�ne this quality using SED maps for
evaluating depth quality. Finally, we combine this luminance and depth quality to
obtain an overall stereo image quality. We perform a comprehensive evaluation of our
metric on seven publicly available S3D IQA databases. This proposed metric shows
competitive performance on all seven databases with state-of-the-art performance on
three of them.
The insights gained from FRSIQA have allowed us to propose improvements to
other visual tasks including FRIQA, FRSVQA and ocular dominance in stereo vision.
xi
We have made a contribution to 2D FRIQA which primarily focuses on full HD
databases. Conventional image quality assessment metrics such as SSIM [5], FSIM
[11] etc. perform well on standard de�nition (SD) and enhanced de�nition (ED) images.
However, current camera devices support high resolution images such as high
de�nition (HD), full HD and ultra HD. In our evaluation we have found that existing
2D FRIQA methods do not perform well over high resolution images, especially
over full HD and higher resolution images. To address this shortcoming, we present
a saliency based 2D FRIQA approach whose performance is on par with existing
methods on SD databases but shows state-of-the-art results over full HD databases.
We consider gradient magnitudes and inter-gradient maps for predicting structural
similarity. The choice of saliency and gradient methods results in best performance
over full HD databases. Our proposed algorithm computes quality at multiple spatial
scales and combines these scores for overall quality prediction. The number of scales
is made a function of image resolution.
We also proposed a full reference stereoscopic video quality assessment (FRSVQA)
that is based on the saliency based FRSIQA and 2D FRIQA algorithms. In this work,
we present a spatio-depth saliency and motion strength based FRSVQA. Initially, we
obtain a spatial distortion map on every video frame, using two di�erent methods,
to estimate spatial quality. The spatial distortion map is then re�ned by the depth
salient maps to estimate depth quality. We also estimate the temporal quality by re-
�ning the spatial distortion map with the inter-frame di�erence map at the locations
speci�ed by motion edges. The spatial, depth and temporal qualities are systematically
combined and averaged over the frames to estimate the overall stereo video
quality metric.
In this thesis, we also contributed to measure ocular dominance (OD), which is one
of the properties of stereo vision. In all of our FRSIQA perspectives, we pool the left
and right features or quality scores of test stereopair using a weighting strategy. These
weights re
ects the idea of OD. For a given stereopair, both single views look alike
xii
perceptually. But when viewed stereoscopically, one can experience the dominance
of one view over the other. In case of pristine stereo pairs, the dominance may
be negligible but it exhibits signi�cant OD in asymmetrically distorted stereopairs.
To the best of our knowledge, there do not exist metrics that measure the OD of
stereopairs. To address this lacuna, we propose a metric whose performance is initially
validated over pristine stereopairs and then over asymmetrically distorted stereo pairs.
A potential application for OD is in prioritizing the left or right view for stereoscopic
image encoding.
Actions (login required)
|
View Item |