Background: Multiplication is the basic operation in any signal processing systems and financial applications, all these applications requires multiplication to be performed in a faster and efficient manner on a silicon chip. Methods: This paper describes the algorithm and architecture of a BCD parallel multiplier. The design exploits two properties of redundant BCD codes to speed up its computation. Namely, the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition to this, number of new techniques are used in order to reduce significantly latency, area and for implementation on FPGA compared to existing implementations. Findings: Parallel architecture is used for generating partial products using radix-10 recoding technique for signed-digit of a BCD multiplier having set of digits between the range [-5, 5] and a positive set of multiplicand multiples coded in XS-3. Use of this encoding has various advantages like, as XS-3 is a self-complementing code, finding a negative of it is by just complementing the bits of respective number. Also the redundancy in XS-3 code is utilized for generating multiplicand multiples in a simple, faster and a carry-free way. Implemented design has three stages. Partial product generation, reduction and final conversion to BCD. For to implement the design in hardware the partial product reduction architecture is modified here to use a bank of ripple carry adder trees. ODDS representation uses 4-bit binary encoding technique which is similar to non-redundant BCD code, for this reason conventional VLSI circuit techniques such as carry-save adders and compression trees can be used effectively to perform operations on decimal numbers. Conclusion: To show the advantages of the resulted design, RTL model for 8 × 8-digit and 16 × 16-digits multiplication has been synthesized and implemented in Virtex-5 FPGA device. Results shows that the multiplier is about 10-15% delay efficient with existing work and about 14-18% area efficient.