Header menu link for other important links
X
An approach to predict protein secondary structure using Deep Learning in Spark based Big Data computing framework
X.L. Dencelin,
Published in Institute of Electrical and Electronics Engineers Inc.
2018
Pages: 25 - 30
Abstract
Computational Intelligence techniques are playing and continue to play a major role in Data Analytics including gaining knowledge from unstructured sequence data. Identifying and transforming valuable insights from biological sequences is an important problem in Computational Biology and classifying the structure of the protein, especially secondary structures from its sequence is crucial as the structure in turn identifies the function, which is considered as an important problem definition in proteomics. Earlier approaches such as Machine Learning, Statistical and Probabilistic techniques were widely applied in proteomics to extract knowledge from amino acid sequence. However, handmade feature extraction becomes a tiresome task, which may degrade the accuracy. Our approach focuses on Deep Learning implemented in Distributed Framework for improved accuracy and performance, which provides an efficient solution for structure prediction problem. We used Skip-Gram method to translate the amino acid sequence into words, without losing the position information of each amino acid. This vector is then fed into Stacked Auto Encoder for classification and the classifier output predicts the presence of secondary structures. This approach is tested on GenBank proteins and the entire experiment is implemented in SPARK framework. © 2018 IEEE.