Week 15 - Markov Chains

To talk about Markov chain it is good to recap what a probability matrix is:

Stochastic matrix

Stochastic matrix

A matrix is called stochastic if for all with we have and for and we have . Therefore every row forms a probability distribution.

Link to original

Now we can define a Markov chain.

Markov chain

Markov chain

A Markov chain is specified by a number of states and a transition probability matrix . Intuitively think of this as a state machine which at state has probability of transitioning to state .

Link to original

Suppose we have 4 states with the following probabilities to go between them. This is commonly represented as a edge weighted directed graph.

Transclude of simple_markov_chain
In terms of a probability matrix it would be represented as

where represents the edge weight going from to in the graph above.

Steps through the graph

In a Markov chain ‘s term represents the probability of going from to in one step. However ‘s terms represent the probability of going from to in steps, so this is still a probability matrix. In the example above

as you can notice there is no path from state to in 2 steps.

Stationary distribution

In the example above as gets large converges in some sense to a single matrix. For very large we may have

for some stationary distribution . For the example above we have

Formalising what we mean her we have.

Stationary distribution (Markov Chains)

Stationary distribution

For a Markov chain given by , a station distribution is a probability distribution given by a vector (this is also a probability matrix) such that

Link to original

The connection follows as to calculate for each row we calculate something similar to .

Algebraic view

For some Markov chain given by if you start at state with then the probability distribution of where you are is given by the ‘th row of .

Equivalently, if is the vector with ‘s in all entries other than in the ‘th column then its probability distribution is given by

with the probability distribution of it after steps being

However, we can pick any probability distribution over the states for . Therefore a stationary distribution is just one in which this doesn’t change over time.

From the definition any stationary distribution is simply an eigenvector for with eigenvalue 1.

When does a Markov chain not have a stationary distribution?

If the state directed graph is a bipartite graph then notice if we start on one side at every even step we are on that side - however for every odd step we are on the other side. This is a classic example of when no stationary distribution will exist. This can be summarised by the following definitions.

Periodic state (markov chain)

Periodic state (markov chain)

For a Markov chain given by a state with is said to be aperidoic if and is periodic otherwise.

Link to original

Periodic Markov chain

Periodic Markov chain

A Markov chain is said to be periodic if it has a periodic state and aperiodic otherwise.

Link to original

We can show periodic Markov chains have no stationary distribution.

When does a Markov chain not have a unique stationary distribution?

Suppose we have a Markov chain with multiple strongly connected component sinks in the strongly connected component graph. Then for each of these we will get a stationary distribution on the vertices involved. Then any weighted sum of these stationary distribution such that they sum to a probability distribution would be a stationary distribution on the whole Markov chain and it wouldn’t be unique.

Irreducible Markov chain

Irreducible Markov chain

A Markov chain given by is irreducible if the directed graph on given by non-zero values of has a single strongly connected component.

Link to original

Markov chain with a unique stationary distribution

Now we know what to avoid we define the following.

Ergodic Markov chain

Ergodic Markov chain

A Markov chain is said to be ergodic if it is both aperiodic and irreducible.

Link to original

These Markov chains have the desired property.

Statement

Lemma

Link to original

They also have the observed limiting form.

Statement

Lemma

\ \cdots & \pi & \cdots\ \cdots & \pi & \cdots\ \vdots & \vdots & \vdots\ \cdots & \pi & \cdots\ \end{array}\right ).$$

Link to original

What is the stationary distribution?

There is one class of Markov chain that has an easy stationary distribution to compute.

Symmetric Markov chain

Symmetric Markov chain

A Markov chain given by is called symmetric if .

Link to original

Statement

Lemma

Link to original

There is another form Markov chain that has an explicit form.

Reversible Markov chain

Reversible Markov chain

A Markov chain given by is reversible if if and only if .

Link to original

Though in general it does not have anything explicit.

Page Rank algorithm

There is an algorithm invented in 1998 that was used to rate the importance of webpages.

The way it determines importance has nice applications for Markov chains. It was used in search engines at the beginning of the popularisation of the internet.

For this lets start by defining the object we are working with.

Webgraph

Webgraph

We define a directed graph to represent a web network. Define its vertices to be the set of webpages and the directed edges to be the hyperlinks so if webpage has a hyperlink to .

Link to original

We think of this graph in an extended Adjacency list form,

  • , and
  • .

The problem is to define to be the “rank” of a page - which will be a measure of importance.

How to construct the page rank

We want the page rank that obays some simple ideas.

  • the value of a page should be linked to the pages linking to it,
  • the value of a link should be inversely proportional to how many outbound links it has, and
  • the value of a link should be proportional to the value of the page.

The simplest formula obeying these rules would be This is recursive however, so we don’t know if there will be a solution for . We define this as the page rank.

Though the suggestive notation of and the idea of a limit should give you an idea that we will construct this via a stationary distribution of a Markov chain. In this world we already know when stationary distributions exist.

Random walk

Suppose we play a game on the Webgraph where we start at one page, then uniformly at random we hit one of the outbound hyperlinks on that webpage. This defines us a Markov chain on the Webgraph where Then consider its stationary distribution from the perspective of then for a particular position on the righthand side we have Giving us the page rank we defined above.

Technical tweaks

The web we might be on might have horrible irregularities within it so we get periodic states or strongly connected components. Therefore we add the possibility to randomly jump from the page you are on to one picked uniformly at random from all the pages on the web. We will do this with probability for some .

Therefore we get the following Markov chain There is one last problem with this definition. What do we do when a page has no outgoing links? There are a couple approaches taken:

  1. Self-loop with all its weight,
  2. Remove such nodes (recursively), and
  3. Set for these nodes. The first solution makes these pages have a much higher page rank. The second solution leaves us with pages with no page rank. The third solution is practically what happens within the Page Rank algorithm.

This Markov chain is strongly connected as every vertex connects to every other - so it is irreducible. Every vertex has a self-loop so it is an aperiodic state - making the Markov chain aperiodic.

This tweaked Markov chain is an Ergodic Markov chain so has a unique stationary distribution. We use this to find the page rank.

How to compute

This is the Page rank algorithm.

Computing is hard as therefore is huge. You will need to compute for some large . There are some tricks that will help

  • if is sufficiently small, then doesn’t have to be too large,
  • if we use an old approximation of for it will mix faster, and
  • matrix multiplication doesn’t need to take instead we can get it on the order of .

What value to set

If then we have a Symmetric Markov chain with just the uniform stationary distribution. So in some sense is reflecting how much the tweaked Markov chain is reflecting the original idea of the page rank.

The trade off for increasing is that it is slower to converge to a stationary distribution. Google are believed to use .