Header menu link for other important links
X
Staleness and Stagglers in Distibuted Deep Image Analytics
A. Ravikumar,
Published in Institute of Electrical and Electronics Engineers Inc.
2021
Pages: 848 - 852
Abstract
Deep learning for image analytics is widely used in many real-world applications. Due to the rapid growth in data and model size there is a need to distribute the models in multiple nodes. Distributed computing of the model helps to increase the scalability, training time and its cost effectiveness. But the distribution can lead to longer computation times in case of stale nodes. The computational time of the distributed nodes are affected by many factors like latency caused dur to communication, network connectivity, resource sharing, computational power etc. The main problem faced in case of distribution is the staleness among the worker nodes. Effect of stragglers cannot be completely avoided in distributed clusters. The failures in storage, disks, imbalanced workloads, resources sharing etc. are the main cause of stragglers. Stragglers can cause longer computation time and reduce the performance of the model. The different methods used to address this issue is described in the paper in detail. The open research problems in this field are also highlighted. © 2021 IEEE.