Singh, D and C, Krishna Mohan
(2019)
SCALABALE AND DISTRIBUTED METHODS FOR
LARGE-SCALE VISUAL COMPUTING.
PhD thesis, Indian institute of technology Hyderabad.
Abstract
The objective of this research work is to develop efficient, scalable, and distributed methods to meet the challenges associated with the processing of immense growth in visual data
like images, videos, etc. The motivation stems from the fact that the existing computer
vision approaches are computation intensive and cannot scale-up to carry out analysis on
the large collection of data as well as to perform the real-time inference on the resourceconstrained devices. Some of the issues encountered are: 1) increased computation time for
high-level representation from low-level features, 2) increased training time for classification methods, and 3) carry out analysis in real-time on the live video streams in a city-scale
surveillance network. The issue of scalability can be addressed by model approximation
and distributed implementation of computer vision algorithms. But existing scalable approaches suffer from the high loss in model approximation and communication overhead.
In this thesis, our aim is to address some of the issues by proposing efficient methods for reducing the training time over large datasets in a distributed environment, and for real-time
inference on resource-constrained devices by scaling-up computation-intensive methods
using the model approximation.
A scalable method Fast-BoW is presented for reducing the computation time of bagof-visual-words (BoW) feature generation for both hard and soft vector-quantization with
time complexities O(|h| log2 k) and O(|h| k), respectively, where |h| is the size of the hash
table used in the proposed approach and k is the vocabulary size. We replace the process
of finding the closest cluster center with a softmax classifier which improves the cluster
boundaries over k-means and can also be used for both hard and soft BoW encoding. To
make the model compact and faster, the real weights are quantized into integer weights
which can be represented using few bits (2 − 8) only. Also, on the quantized weights,
the hashing is applied to reduce the number of multiplications which accelerate the entire
process. Further the effectiveness of the video representation is improved by exploiting
the structural information among the various entities or same entity over the time which
is generally ignored by BoW representation. The interactions of the entities in a video
are formulated as a graph of geometric relations among space-time interest points. The
activities represented as graphs are recognized using a SVM with low complexity graph
kernels, namely, random walk kernel (O(n3)) and Weisfeiler-Lehman kernel (O(n)). The
use of graph kernel provides robustness to slight topological deformations, which may
occur due to the presence of noise and viewpoint variation in data. The further issues such
as computation and storage of the large kernel matrix are addressed using the Nystrom
method for kernel linearization.
The second major contribution is in reducing the time taken in learning of kernel supvi
port vector machine (SVM) from large datasets using distributed implementation while
sustaining classification performance. We propose Genetic-SVM which makes use of the
distributed genetic algorithm to reduce the time taken in solving the SVM objective function. Further, the data partitioning approaches achieve better speed-up than distributed
algorithm approaches but invariably leads to the loss in classification accuracy as global
support vectors may not have been chosen as local support vectors in their respective partitions. Hence, we propose DiP-SVM, a distribution preserving kernel SVM where the
first and second order statistics of the entire dataset are retained in each of the partitions.
This helps in obtaining local decision boundaries which are in agreement with the global
decision boundary thereby reducing the chance of missing important global support vectors. Further, the task of combining the local SVMs hinder the training speed. To address
this issue, we propose Projection-SVM, using subspace partitioning where a decision tree
is constructed on a projection of data along the direction of maximum variance to obtain
smaller partitions of the dataset. On each of these partitions, a kernel SVM is trained independently, thereby reducing the overall training time. Also, it results in reducing the
prediction time significantly.
Another issue addressed is the recognition of traffic violations and incidents in real-time
in a city-scale surveillance scenario. The major issues are accurate detection and real-time
inference. The central computing infrastructures are unable to perform in real-time due to
large network delay from video sensor to the central computing server. We propose an efficient framework using edge computing for deploying large-scale visual computing applications which reduces the latency and the communication overhead in a camera network.
This framework is implemented for two surveillance applications, namely, motorcyclists
without a helmet and accident incident detection. An efficient cascade of convolutional
neural networks (CNNs) is proposed for incrementally detecting motorcyclists and their
helmets in both sparse and dense traffic. This cascade of CNNs shares common representation in order to avoid extra computation and over-fitting. The accidents of the vehicles
are modeled as an unusual incident. The deep representation is extracted using denoising
stacked auto-encoders trained from the spatio-temporal video volumes of normal traffic
videos. The possibility of an accident is determined based on the reconstruction error and
the likelihood of the deep representation. For the likelihood of the deep representation, an
unsupervised model is trained using one class SVM. Also, the intersection points of the
vehicle’s trajectories are used to reduce the false alarm rate and increase the reliability of
the overall system. Both the approaches are evaluated on the real traffic videos collected
from the video surveillance network of Hyderabad city in India. The experiments on the
real traffic videos demonstrate the efficacy of the proposed approaches
Actions (login required)
|
View Item |