Human activity recognition and video analysis is quite an interesting field and is relevant to a number of research areas. A plethora of applications like (active) video surveillance systems, human-computer interaction, medical analysis, sports (video) interpretation, and video retrieval for content-based search engines depend on extensive video understanding. To determine human actions in highly controlled environments is quite simple and solved; however, this task still remains to be a challenge in an unconstrained environment. In this paper, a temporal super-pixel based convolutional neural network (TS-CNN) is proposed to recognize human activity for enabling real-time monitoring of patients, children, and elderly persons in an unconstrained environment. For every video segment, temporal super-pixels are extracted and output of these pathways are combined with CNN architecture to correctly recognize the human activity. The performance of the proposed method (measured by metrics like accuracy and confusion matrix) is compared with the existing model. © 2019, Springer Nature Singapore Pte Ltd.