Finite Markov Decision Process
This summaries the environment that an actor in a discrete Markovian universe experiences. It is given by:
- States: A finite set of states
that the actor can be in. - Actions: For each state
a finite set of actions , sometimes it is convenient to refer to this as . - Rewards: The value the actor gets from doing each action within a state, these are real values
. We assume the actor works on discrete time steps
, at time it is in state , takes action and gets rewards . The actor deterministically chooses when in state but we have a probability distribution that determines the reward and next state read this as, the probability of ending up in state
with reward given they are in state and take action . This is what determines how the world progresses. Notice it is Markovian as it does not depend on . It is sometimes useful to think of the state you are going to be in at time step
as a random variable we refer to as , similarly for rewards as .