Minimax-Q

This is a generalisation of Q-learning to Stochastic games and but is defined for each player. Where we incrementally learn this value