Return (RL)

In a Markov decision process the return at time step is defined to be the discounted sum of rewards:

where is our discounted factor and the summation goes as high as the length of the run. (Note here is a random variable as is .) Due to the definition of rewards it has a nice recursive property: