Policy (MDP)
In a Markov decision processes a policy is how an actor will behave in a given situation, given by
where . This concept can extend to become a probabilistic policy. Let be the set of probability distributions over . Then a probabilistic policy is given by where if is non-zero then .