Krishna, Lolla Sai and Natarajan, Lakshmi Prasad
(2019)
Distributed Inference With Straggler Mitigation.
Masters thesis, Indian institute of technology Hyderabad.
Abstract
In today’s world machine learning has major applications in a wide variety of tasks such as image
classification,object detection and natural language processing.Machine learning models are trained
and deployed in prediction based cloud services which are mostly prediction serving systems.These
systems take input requests from users & return predictions by performing inference on trained
model. These services use a distributed architecture for serving user requests which consist of many
nodes which are inter connected.These nodes face a number of unavailability’s such as temporary
slowdowns and failures.Nodes facing temporary unavailability are known as stragglers.These nodes
delay the entire process of computation. The objective of this thesis work is to design a framework for inference in a distributed setup which is robust to stragglers.The distributed setup is trained in such away that it classifies the image with good accuracy even in the presence of straggling nodes during inference. Distributed setup consists of many neural networks in parallel along with a master neural network also known as decoder which collects the prediction vetor from all nodes.The image to be classified is partitioned into as many number of parts as there are nodes and is given as input to each node.The final predictions are taken from decoder. Two neural network architectures are considered one being base-MLP model and the other one being being CNN model while implementing the distributed setup. The setup is trained for various possible cases of straggler scenarios.During training phase decoder and nodes are learned by back propagating the error & updating weights through gradient descent algorithm. During inference the setup is tested by varying the number of stragglers in the test set after training the setup for stragglers. The distributed setup classifies the image with good accuracies during during inference even in the presence of stragglers in the input as it gets trained for different possible scenarios of stragglers during training phase.
Actions (login required)
|
View Item |