Integration of Computer Science with Bio Science has led to new field Computational Biology which created an opportunity in speeding up the process of analyzing the Bio-data. DNA sequence analysis especially finding the base pairs that helps in identifying the order of nucleotides present in all living beings, it also helps in forensics for DNA profiling and parenting testing. This sequence analysis has been a challenging task in Computational Biology due to large volumes of data and need of more computational resources. Using a distributed file system with distributed computation of tasks can be one of the solutions to above problem. In this paper, the authors use Spark a query engine for large-scale data processing in analyzing the DNA sequence and extracting the base pairs and also they try to improve base pair extraction with improvised algorithms.