Optimistic exploration
Optimistic exploration is a way of choosing actions in Q-learning. Here we set the initial values of our
to all be very high and we always pick the action that maximises . Then in uncertainty it will explore actions it does not know about.