Alex's Notes

❯

❯

Return (RL)

May 22, 20251 min read

reinforcement-learning

Return (RL)

In a Markov decision process the return at time step is defined to be the discounted sum of rewards:

where is our discounted factor and the summation goes as high as the length of the run. (Note here is a random variable as is .) Due to the definition of rewards it has a nice recursive property:

Graph View

Created with Quartz v4.5.1 © 2025

GitHub
Discord Community