Finite Markov Decision Process

This summaries the environment that an actor in a discrete Markovian universe experiences. It is given by:

  • States: A finite set of states that the actor can be in.
  • Actions: For each state a finite set of actions , sometimes it is convenient to refer to this as .
  • Rewards: The value the actor gets from doing each action within a state, these are real values .

We assume the actor works on discrete time steps , at time it is in state , takes action and gets rewards . The actor deterministically chooses when in state but we have a probability distribution that determines the reward and next state

read this as, the probability of ending up in state with reward given they are in state and take action . This is what determines how the world progresses. Notice it is Markovian as it does not depend on .

It is sometimes useful to think of the state you are going to be in at time step as a random variable we refer to as , similarly for rewards as .