Epsilon-greedy exploration

-greedy exploration is a way of choosing actions in Q-learning. For a sequence at time step you choose action