Week 1 - Smoov & Curly’s Bogus Journey
Review
The first part of this course is a review of:
Week 12 - Reinforcement learning
There is a change of notation within this course to the referenced lecture. Instead of using
Reminder of the below notation.
- Discount factor:
with . - Transition probability: Given you are in state
and you take action is the probability you end up in state . - States:
is the set of all states. - Actions:
is the set of all actions, it could depend on the state therefore we talk about for the actions at state .
Quality
Within the Bellman equation if we take what is within the brackets and set it to a new function
The motivation for doing this will come later, however intuitively this form will be more useful when you do not have access to
Continuations
We can apply a similar trick to derive a 3rd form of the Bellman equation this time we just set
Each of these will enable us to do reinforcement learning in different circumstances - but notice how they relate to one another.
If we find