The 3D-DTW custom IP based FPGA hardware acceleration for action recognition

Vidhyapathi C-M; A.N. Joseph Raj; Sundar S

doi:10.2352/J.IMAGINGSCI.TECHNOL.2021.65.1.010401

This article proposes an implementation of an action recognition system, which allows the user to perform operations in real time. The Microsoft Kinect (RGB-D) sensor plays a central role in this system, which provides the skeletal joint information of humans directly. Computationally efficient skeletal joint position features are considered for describing each action. The dynamic time warping algorithm (DTW) is a widely used algorithm in many applications such as similarity sequence search, classification, and speech recognition. It provides the highest accuracy compared to all other algorithms. However, the computational time of the DTW algorithm is a major drawback in real world applications. To speed up the basic DTW algorithm, a novel three-dimensional dynamic time warping (3D-DTW) classification algorithm is proposed in this work. The proposed 3D-DTW algorithm is implemented in both software and field programmable gate array (FPGA) hardware modeling techniques. The performance of the 3D-DTW algorithm is evaluated for 12 actions in which each action is described with the feature vector size of 576 over 32 frames. From our software modeling results, it has been shown that the proposed algorithm performs the action classification accurately. However, the computation time of the 3D-DTW algorithm increases linearly when we increase either the number of actions or the feature vector size of each action. For further speedup, an efficient custom 3D-DTW intellectual property (IP) core is developed using the Xilinx Vivado high-level synthesis (HLS) tool to accelerate the 3D-DTW algorithm in FPGA hardware. The CPU centric software modeling of the 3D-DTW algorithm is compared with its hardware accelerated custom IP core. It has been shown that the developed 3D-DTW Custom IP core computation time is 40 times faster than its software counterpart. As the hardware results are promising, a parallel hardware software co-design architecture is proposed for the Xilinx Zynq-7020 System on Chip (SoC) FPGA for action recognition. The HLS simulation and synthesis results are provided to support the practical implementation of the proposed architecture. Our proposed approach outperforms many of the existing state-of-the-art DTW based action recognition techniques by providing the highest accuracy of 97.77%. © 2021 Society for Imaging Science and Technology.

Journal	Journal of Imaging Science and Technology
Publisher	Society for Imaging Science and Technology
ISSN	10623701