Week 13 - Game theory
Game theory
Link to originalGame theory
Game theory is the study of systems where there are more than one rational players.
2-player zero-sum finite game of perfect information
This is a lot of words but these games are like tick tack toe or noughts and crosses. Whilst these are fun they all boil down to a payoff matrix for the different strategies you take. Like the following
to read this table, if the first player at the top picks action
Zero-sum game
Link to originalZero-sum game
A game is zero-sum if the total losses and gains for the players of all participants are equal to zero.
We call the options
Minmax decision
Link to originalMinmax decision
If you choose to minmax, you want to minimise the maximum loss. In terms of gain you want to maximise the minimum gains, which is why it is sometimes referred to maximin.
Optimum play exists for 2-player zero-sum games with perfect information
Statement
Lemma
In a 2-player zero-sum games with perfect information there exists an optimal pure strategy for each player.
Proof
Link to original
This means we can determine the value of the game by both players following their best strategy we can work out the value of the game.
Warning
I don’t really understand what happens in the case
What would be the optimal pure strategy?
No more perfect information
Perfect information means that all the knowledge is known by all players at the start of the game. A good example where this is not the case is poker.
Mini-poker
Suppose we are in a game where the following happens:
- Player one is dealt a card, 50% of the time it is red and the other 50% it is black.
- If player one is dealt a black card they can resign or keep playing. If they are dealt a red card they have to keep playing. If they resign they get -20 score.
- Player two then gets to choose to resign or for player one to show them their card.
- If player two resigns player one gets 10 points.
- If player one shows their card and it is back they get -40 points.
- If player one shows their card and it is red they get 30 points.
What is the optimum play for either party?
In this case we can still calculate the payoff matrix. Note if player one chooses a resign strategy by they get dealt a red card they have to keep going.
Don't understand
I don’t really get how this situation is any different to the above ones. The lecturers say now the value of the game differ for both players but it still boils down to a matrix that you could have gotten with a normal game?
Mix strategies
Pure strategy
Link to originalPure strategy
In Game theory a pure strategy is deterministic i.e. it is a map
from states of the game to actions.
However you can have non-deterministic strategies.
Mixed strategy
Link to originalMixed strategy
In Game theory a pure strategy is non-deterministic i.e. you have access to a random variable
that can be used to make your strategy from states of the game and the result of the random variable to actions.
In the game above, if player one holds with probability
If you do the same for player two, picking to see with probability
Removing zero-sum
Prisoner's dilemma
Link to originalPrisoner's dilemma
Two people are caught mid crime and put in separate cells where they can’t talk to each other. They are both offered a plea deal if they testify against their companion, where:
- If they both cooperate with each other and stay silent they both get one month in jail,
- If one of them defects and testifies and the other doesn’t whoever testify gets off free whilst the other stays in jail for 3 months, however
- If they both defect and testify they both get sent away for 2 months.
This can be summarised in the following matrix
If the player stays silent they pay an expected cost of 2, whereas if they testify they pay an expected cost of 1. So the optimum strategy is to testify. However by other testifying combined they are paying the largest cost.
In the Prisoner’s dilemma the dominating strategy is for them both to testify. Even if any player could change their decision after finding out the outcome they wouldn’t want to.
Nash equilibrium
Link to originalNash equilibrium
In a game with
players where player has choice of strategies in . The state is in Nash equilibrium if for all we have Where is the utility function for player .
This works for both pure or mixed strategies.
Dominating strategies
Strictly dominated strategy
Link to originalStrictly dominated strategy
In a game with
players where player has choice of strategies in and has utility function . For a player strategy strictly dominates if and only if for all we have
This could be used to get Nash equilibrium.
Elimination and Nash Equilibrium
Statement
Lemma
In an
-player pure strategy game, if elimination of Strictly dominated strategy eliminates all but one combination of strategies, that combination is the unique Nash equilibrium. Similarly a Nash equilibrium can never be strictly dominated. Proof
Link to original
Though this doesn’t guarantee finding it. Though there is always a Nash equilibrium in finite games.
Existance of Nash equilibrium
Statement
Lemma
In an
-player game where is finite and each player has finite choices of pure strategy then there exists a (maybe mixed) Nash equilibrium. Proof
Link to original
Note
Normally the best way to get out of the prisoners dilemma is to change the structure of the game.
Multiple rounds
Suppose now you are in a situation where you are playing iterative rounds of a game with the same player.
Known number of rounds
One approach to solving this issue is to combine possible series of strategies as a new series and to investigate the composite game. Though an easier solution to this situation can be found be thinking of them separately.
If the number of rounds are known beforehand, then consider the last round. As there are no games afterwards - there is no incentive to not behave rationally, so we will use the Nash equilibrium strategy.
Though given this last game is already decided we might as well consider playing one less game. Though this inductively gives that each player will play a Nash equilibrium strategy at each step of the game.
However, which Nash equilibrium is an interesting problem!
Unknown number of rounds
At the end of a round suppose there is a
Note
Discounted rewards is perfectly built for this. Suppose
is the random variable for the value of playing this game, let be the reward for the round then Therefore if we pick a strategy that would pay off in round if it were played then the total expected value of this strategy would be
The expected number of round in this set up is
Tit for Tat
When in the above situation there are some meta strategies you can adopt. For example you can always choose one option - like in the known number of rounds example. However, you can now adopt strategies that depend on the opponents move.
Tit for Tat
Link to originalTit for Tat
In a symmetric two player game, the tit for tat strategy just copies the opponents move. In the first round it will do something random.
Suppose we do this with the prisoners dilemma. Now lets consider 4 strategies, always stay silent (silent), always testify (testify) or tit for tat (tft) where you start silent or testify. Lets consider the strategy matrix (to simplify things I took a factor of
out)
Lets consider the preferred strategy when your opponent has already picked (this depends on
Opponent | |||
---|---|---|---|
Silent | Testify | Testify | Testify |
Testify | Testify, tft-testify | Testify, tft-testify | Testify, tft-testify |
tft-silent | testify | Any | tft-silent, silent |
tft-testify | Testify, tft-testify | Any | silent |
Here no matter what
- for
tft-testify is also a Nash equilibrium, and - for
tft-silent is also a Nash equilibrium.
Picturing Mixed strategies for repeated games
Now we are repeating games we could adopt a Mixed strategy of meta strategies. i.e. decide a probability distribution over the strategies and use them dependent on a random variable.
Minmax profile
Link to originalMinmax profile
For a game, the minmax profile is the expected score a player can achieve assuming your opponents only want to harm you. They are allowed to take any strategy, this can be a Pure strategy or Mixed strategy.
Battle of the sexes
Link to originalBattle of the sexes
Suppose two friends
and are going to a town to see a concert. There is two concerts on and but unfortunately they forgot to agree which one to go to and can’t communicate. They will both be unhappy if they go to a different concert and both get a score . slightly prefers concert and if they go there together will get a score of whereas gets a score - whereas it is vice versa for concert . In summary this can be represented in a payoff table as follows.
To calculate the Minmax profile of the Battle of the sexes we have to think about how one player could punish the other player the most. (This game is simpler as it is symmetric.) Using a Pure strategy if
However, this is not the worst player
This gives the following Minmax profile.
\begin{array}{c|c} \ \mbox{A} & \mbox{B} \ \hline \frac{2}{3} & \frac{2}{3} \ \end{array}
We see from this no matter what happened last round the Pavlov strategy self rights itself into the cooperate cooperate state.
Computational Folk theorem
Statement
Lemma
For any 2-player discounted reward repeated game it is possible to build a Povlov-like machine with is Subgame perfect and can achieve Nash equilibrium in polynomial time. This is done in one of three ways
- Pavlov if it is possible,
- Make it zero-sum (like) and solve using a linear programme, or
- at most one player can improve.
Proof
Link to original
Stochastic games
Stochastic games are the multi-agent analogy to a Markov decision process.
Stochastic games
Link to originalStochastic games
A Stochastic game is defined by the following data:
- A number of players
, - A set
of states the game can be in (normally thought of as tuples), - A set of actions
for each player this may depend on the state we refer to the joint set of actions as , - A model which determines for each set of states
and action what the transition probabilities there are from to given action , i.e. , - A reward function for each player
for being at a given state. and - A discount factor
for discounted rewards. When provided with this we are looking for a strategy for each player
that determines what action we take in each state.
Semi-wall stochastic game
Link to originalSemi-wall stochastic game
The semi-wall game is a Gridworld game played on a 3x3 grid. With player
starting in the bottom left corner and player starting in the bottom right. The goal of the game is to arrive at the money in the centre top - whoever does that first gets a point. If two players try to enter the same spot with even odds one player moves in and the other stays where they are. There is a semi-wall above each player to start with which has a 50% chance of letting them through, otherwise they stay where they are. The game looks as follows:
Generalisations for zero-sum Stochastic games
In the zero-sum setting we assume only two players, so
Minimax-Q
Link to originalMinimax-Q
This is a generalisation of Q-learning to Stochastic games and but is defined for each player.
Where we incrementally learn this value
This has some nice properties
- Value iteration works,
- It converges,
has a unique solution for each , - Policies can be computed independently from one another,
- We can update in polynomial time, and
- Q-functions are sufficient to specify the policy.
Generalisation to Stochastic games
You can do the same for non-Zero-sum games using Nash equilibrium instead of minmax but it doesn’t have the nice properties.
This has some nice properties
- Value iteration doesn’t works,
- It doesn’t converges,
has a multiple solution for each (as there is no unique Nash equilibrium), - Policies cannot be computed independently from one another,
- We can update in PPAD time, and
- Q-functions are not sufficient to specify the policy.
Though there are ideas to correct for this:
- We can use repeated Stochastic games,
- You can talk to players,
- You can limit players computational power, and
- Allowing players to pay each other some of their reward.