The existing 2-D median filters in the literature are computationally intensive. It is proposed to optimally reduce the amount of data handled at the architecture level realization of the basic median filtering operation on images. The proposed architecture reads 4 pixels at a time in the input image, 4 pixels forming a word on a 32-bit hardware processing system; the subsequent processing is carried out by parallel and pipelined median filter architecture. Two read operations process eight input pixels which results in the generation of four output pixels with an initial latency. The proposed architecture offers reduced number of read operations and increased speed. © 2018 IEEE.