Appina, Balasubramanyam and Channappayya, Sumohana
(2019)
Subjective and Objective Methods for
Stereoscopic Video Quality Assessment.
PhD thesis, Indian institute of technology Hyderabad.
Full text not available from this repository.
(
Request a copy)
Abstract
Stereoscopic 3D (3D or S3D) digital technology has received a lot of attention from the
society due to its ability to render depth. Due to this several industries like film, gaming, education etc., have invested a significant amount of research resources to use 3D
visualization in their work. The developments and advancements in the S3D technology have made it for content creation and these improvements have led to widespread
consumer acceptance. S3D content refers to both stereoscopic image and videos. In
this thesis, the focus is exclusively on S3D videos. S3D video is a combination of
spatial, temporal, depth components and the dependencies among the components.
Like 2D digital video content, 3D content also undergoes several processing stages
such as sampling, quantization, synchronization, visualization/rendering etc., for creation and utilization. These steps lead to a degradation in the quality of the S3D
video which in turn results in poor user experience. Subjective and objective quality assessment techniques provide a systematic framework for assessing perceptual
quality.
In subjective assessment, human subjects or observers perform the quality assessment task which ends up being a cumbersome and time consuming process for a
number of the VQA applications. However, subjective assessment is very important
since most content is meant for human consumption and therefore serves as a benchmark for objective assessment algorithms. Objective assessment is typically classified
into full reference (FR), reduced-reference (RR) and no reference (NR) methods. FR
QA methods utilize the entire information of pristine or reference content, the RR QA
methods utilize the partial information of pristine content, while the NR QA models do not use any information of the pristine content. NR QA models are further
classified into supervised and unsupervised QA algorithms.
This thesis presents both subjective and objective stereoscopic VQA algorithms.
The subjective study experiments are performed to explore the effect of spatial, temporal and depth distortions on the perceptual video quality. In objective assessment,
viii
the dependencies between the motion and depth/disparity components of an S3D
video are explored, and considered as primitive features to estimate the quality of an
S3D video. Apart from solutions to the S3D VQA problem, this thesis has made contributions to objective QA of Super-multiview content with high angular resolution
images.
S3D video projection is classified as Anaglyph 3D (color coding display) and polarized 3D display views. In this thesis, the subjective study is carried out on both
anaglyph 3D and polarized 3D display S3D views. The dataset used for the subjective study on Anaglyph 3D projection consists of 6 pristine and 144 distorted videos.
We limit our attention to H.264 compression artifacts to generate the test stimuli.
The reference video sequences contain a good combination of texture, motion, depth
information and we divided these videos into 2 groups based on depth information.
Further, 19 subjects participated in the subjective assessment task. Based on the
subjective study, we have formulated a conditional relationship between the 2D and
stereoscopic subjective scores as a function of compression rate and depth range. We
call this database the LFOVIAPh1 S3D video database.
In the polarized 3D projection subjective study, we conduct a subjective evaluation
of full high-definition (full HD) stereoscopic video content. This study is comprehensive in terms of the variety of video content, the types of distortions considered and the
number of test stimuli used. Specifically, we consider 12 reference videos that cover
a wide range of texture, motion and depth. These reference videos are subjected to
four commonly occurring distortions viz., H.264 compression, H.265 compression and
blur, and a new temporal distortion called ‘frame freeze’. We generated a total of 288
symmetrically and asymmetrically distorted test stimuli by applying varying levels
of these distortions to the pristine videos. A total of 20 subjects participated in our
study. We call this database the LFOVIAPh2 S3D video database.
In the objective assessment, the thesis presents full reference, and both supervised
and unsupervised no reference objective quality assessment algorithms. In FR S3D
ix
video QA (VQA), we propose two objective quality assessment algorithms. The
algorithms are FLOSIM3D and DeMo3D. In FLOSIM3D, we exploit the separable
representation of motion and binocular disparity in the visual cortex and develop
a four stage algorithm to measure the S3D video quality. First, we compute the
temporal features by using an existing 2D VQA metric which measures the temporal
annoyance based on patch level statistics such as mean, variance and minimum eigen
value and pools them with a frame categorization based non-linear pooling strategy.
Second, a structure based 2D Image Quality Assessment (IQA) metric is used to
compute the spatial quality of the frames. Next, the loss in depth cues is measured
using a structure based metric. Finally, the features for each of the stereo views are
pooled to obtain the final stereo video quality score. Our algorithm is an extension
to the 2D VQA FLOSIM [1], and therefore we termed our algorithm as FLOSIM3D.
The generalized Gaussian density (GGD) and the Gaussian scale mixture (GSM)
density are two models for 2D natural scene statistics [2, 3] that are very popular
and have been widely employed in 2D IQA. Inspired by these approaches, in our
previous 3D IQA work [4], we have modeled the joint dependencies of luminance and
depth subband coefficients using a Bivariate GGD (BGGD). Also, we have shown
that BGGD capture well these dependencies, and computed the BGGD coefficients
to estimate the quality of an S3D image. Motivated from these statistical studies, we
have extended the BGGD model in S3D video quality computation. In this thesis, we
propose different FR and NR (supervised and unsupervised) QA algorithms of S3D
videos using the BGGD parameters as primitive features.
In DeMo3D, we rely on an empirical model for the joint statistics of motion and
depth subband coefficients of an S3D video frame. Specifically, we use a Bivariate Generalized Gaussian Distribution (BGGD) model for the joint statistics. We
compute the coherence scores (Ψ) from the eigenvalues of the covariance matrix to
estimate the amount of directional dependency between the motion and depth components. We show that the coherence scores are distortion type and level discriminable.
x
To estimate the overall spatial quality score, we apply off-the-shelf 2D FR image
QA metrics on a frame-by-frame basis on both the views and average the frame-wise
scores. Finally, we pool the coherence and spatial quality scores to derive the overall
quality for the S3D video. The proposed algorithm is called Depth and Motion based
3D video quality evaluator (DeMo3D).
The performance of the proposed algorithms are evaluated over popular S3D video
databases and shown to be robust and competitive with the state-of-the-art QA algorithms. Performance is measured using the linear correlation coefficient (LCC),
Spearman’s rank order correlation coefficient (SROCC) and root mean score squared
error (RMSE) between difference mean opinion scores (DMOS) and estimated objective quality scores.
In NR S3D VQA, we propose supervised and unsupervised objective algorithms
for stereoscopic videos. The algorithms are called VQUEMODES and MoDi3D.
These works are motivated by our previous empirical findings that motion and depth
statistical dependencies can be accurately modeled using a BGGD. VQUEMODES
is a supervised S3D NR VQA and we demonstrate that the parameters (α, β) of
the BGGD model possess the ability to discern quality variations in S3D videos.
Therefore, the BGGD model parameters are employed as motion and depth quality
features. In addition to these features, we rely on a frame-level spatial quality feature
that is computed using a robust off-the-shelf NR image quality assessment (IQA)
algorithm. These frame-level motion, depth, and spatial features are consolidated
and used with the corresponding S3D video’s DMOS labels for supervised learning
using support vector regression (SVR). The overall quality of an S3D video is computed by averaging the frame-level quality predictions of the constituent video frames.
This algorithm is a Video QUality Evaluation using MOtion and DEpth Statistics
(VQUEMODES).
MoDi3D is an unsupervised (or completely blind) NR S3D VQA algorithm. Like
VQUEMODES, we model the joint statistical dependencies between motion and
xi
disparity components using a BGGD model, and compute the BGGD model parameters α, β and coherence measure Ψ from the eigenvalues of the covariance matrix of
the BGGD. In turn, we model the BGGD parameters (α, β and Ψ) of pristine S3D
videos using a Multivariate Gaussian (MVG) distribution. The likelihood of a test
video’s MVG model parameters coming from the pristine MVG model is computed
and shown to play a key role in the overall quality estimation. We also estimate the
global motion content of each video by averaging the SSIM scores between pairs of
successive video frames. To estimate the test S3D video’s spatial quality, we apply
the popular 2D NR unsupervised NIQE image QA model on a frame-by-frame basis
on both views. The overall quality of a test S3D video is finally computed by pooling
the test S3D video’s likelihood estimates, global motion strength and spatial quality
scores. The proposed algorithm, which is unsupervised (or ‘completely blind,’ requiring no reference videos or training on subjective scores) is called the Motion and
Disparity based 3D video quality evaluator (MoDi3D). The proposed S3D NR VQA
algorithms show robust performance on the popular databases and are competitive
with the state-of-the-art FR and NR algorithms.
In this thesis, we also contributed the FR QA algorithms on Super-multiview content with high angular resolution images. The super-multiview content is a combination of spatial and depth information at a given 3D view. The projection of content
based 3D multiview is different from the regular 2D perception. So the existing 2D
FR IQA algorithms cannot give robust performance on these images. To fill the gap,
we propose a FR objective quality metric. For every 3D view, the proposed metric
combines spatial information from each constituent image and angular information
(depth cues) from consecutive images. Finally, we show that the proposed metric correlates significantly with subjective scores, outperforming existing 2D metrics. The
efficacy of pooling spatial and angular information highlights the fact that angular
information plays a crucial role in 3D perception.
Actions (login required)
|
View Item |