Transitions (MDP)
When presented with a learning problem you are not provided with the Markov decision processes instead you will be provided with transitions which are the outcome of a previous action - i.e. a tuple
where is the start state, is the action taken, is the reward gotten and the is state you ended up in. These can either be pre-computed or given a state the learn can provide the action to find out what the transition was.