A Compute Unified Device Architecture (CUDA) implementation of Deep Convolutional Neural Network (DCNN) for a digit recognition system is proposed to reduce the computation time of ANN and achieve high accuracy. A neural network with three layers of convolutions and two fully connected layers is developed by building input, hidden and output neurons to achieve an improved accuracy. The network is parallelized using a dedicated GPU on CUDA platform using Tensor flow library. A comparative analysis of accuracy and computation time is performed for sequential and parallel execution of the network on dual core (4 logical processors) CPU, octa core (16 logical processors) only CPU and octa core (16 logical processors) CPU with GPU systems. MNIST (Modified National Institute of Standards and Technology) and EMNIST (Extended MNIST) database are used for both training and testing. MNIST has 55000 training sets, 10000 testing sets and 5000 validation sets. EMNIST consists of 235000 training, 40000 testing and 5000 validation sets. The network designed requires high computation and hence parallelizing it shows significant improvement in execution time. © 2017 IEEE.