Return (RL)
In a Markov decision process the return
at time step is defined to be the discounted sum of rewards: where
is our discounted factor and the summation goes as high as the length of the run. (Note here is a random variable as is .) Due to the definition of rewards it has a nice recursive property: