Minimax-Q
This is a generalisation of Q-learning to Stochastic games and but is defined for each player.
Where we incrementally learn this value
Minimax-Q
This is a generalisation of Q-learning to Stochastic games and but is defined for each player.
Where we incrementally learn this value