Q-learning

Q-learning is a reinforcement learning class of algorithms which are value function based. It uses the approach of Incremental learning of the Q-function (RL). We use the model of transitions where the learning can provide the action each iteration.

Pick an initial estimation of the Q-function (RL).

We need to pick a learning rate such that

Lastly pick how we will choose an action for a given state.

Then we incrementally learn from our choices by

Note as time changes we switch which state we look at, and will choose different actions.

Correctness

There is a theorem that states for a Markov decision process if we apply learning where a given a state is visited infinitely often, the states are sampled using the transition probabilities and the rewards are distributed correctly. Then and Q-learning converges correctly.