In an e-learning system, the outcome of the learners is achieved through irreversible and continuous activities in successive learning progression. So the sequence of the learners depends on the quality of the referred learning material (learning resource) and learners’ situations. Any out of learning sequence creates increases learning time durations and cause the problem in understanding the concept well. Fortunately, decreasing the time delay due to the completion of learning activities is critical issue in sequencing the learning task towards the learning objective. In this paper, a markov decision process with Q-learning multi agent is proposed to control the learning sequence in such a way that the agent is trained with markov decision process for choosing the next learning activity to be carried out based on the current state of the learner. The Agent is equipped with an adaptive learning mechanism, whose activity is gradually formed based on the current instantaneous learning outcome of the particular state in the specified learning sequence. To this end, the Q-learning method and critical path mechanisms are increased to improve the performance of the learning sequence ‘PREDICTION’. © 2019, Institute of Advanced Scientific Research, Inc.. All rights reserved.