Modular arithmetic based processors are widely used in digital signal processing, image processing, cryptography, etc. All major crypto-systems require the use of modular multipliers in one or more stages of computation. Considerable amount of work has been done to optimize the modular multiplication algorithm and improve the implementation to meet various constraints and requirements. Implementation of a radix-4 based interleaved modular multiplier over 256-bit is provided in this paper. Further, the existing algorithm has been modified and proposed a new algorithm with hardware circuit and both implementation done using Xilinx Virtex-7 (XC7VX485T-FFG1761-2) FPGA. The proposed 256-bit modular multiplication meets around 1 μs of computation time at 122.6 MHz and area is 858 slices. © 2019 IEEE.