Promoter Prediction in DNA Sequences of Escherichia Coli Using Machine Learning Algorithms

S Anveshrithaa; A Balamurugan; Jaisankar N

The advent of Artificial Intelligence and Machine learning has brought many advancements in the field of computational biology. The significant improvement in the field of Machine Learning has made way for opportunities in demanding fields by enabling machines to automatically learn from data without any explicit programming and improving their ability to solve complex problems through learning and experience. Bioinformatics is one among the many applications of Machine Learning where it is widely utilized especially for classification and identification of patterns in DNA in genomics. The purpose of this research is to implement and improve various Machine learning models including ensemble learning namely boosting and bootstrap aggregation, neural network-based methods, Support Vector Machine, Naïve Bayes, k-nearest neighbors and decision tree for predicting transcription start sites (promoters) in the DNA sequences of a common bacteria, Escherichia coli. The performance of the models is optimized through hyper- parameter tuning for improved prediction. This paper also focuses on the comparison of these machine learning classification models to determine the model that best predicts the promoters in the DNA sequences.

Journal	International Journal of Scientific and Technology Research
Publisher	IJSTR
ISSN	2277-8616
Open Access	No