A Markov chain is specified by a number of states and a transition probability matrix. Intuitively think of this as a state machine which at state has probability of transitioning to state .
where represents the edge weight going from to in the graph above.
Steps through the graph
In a Markov chain‘s term represents the probability of going from to in one step. However ‘s terms represent the probability of going from to in steps, so this is still a probability matrix. In the example above
as you can notice there is no path from state to in 2 steps.
Stationary distribution
In the example above as gets large converges in some sense to a single matrix. For very large we may have
If the state directed graph is a bipartite graph then notice if we start on one side at every even step we are on that side - however for every odd step we are on the other side. This is a classic example of when no stationary distribution will exist. This can be summarised by the following definitions.
Periodic state (markov chain)
Periodic state (markov chain)
For a Markov chain given by a state with is said to be aperidoic if
and is periodic otherwise.
Though in general it does not have anything explicit.
Page Rank algorithm
There is an algorithm invented in 1998 that was used to rate the importance of webpages.
The way it determines importance has nice applications for Markov chains. It was used in search engines at the beginning of the popularisation of the internet.
For this lets start by defining the object we are working with.
Webgraph
Webgraph
We define a directed graph to represent a web network. Define its vertices to be the set of webpages and the directed edges to be the hyperlinks so if webpage has a hyperlink to .
We think of this graph in an extended Adjacency list form,
, and
.
The problem is to define to be the “rank” of a page - which will be a measure of importance.
How to construct the page rank
We want the page rank that obays some simple ideas.
the value of a page should be linked to the pages linking to it,
the value of a link should be inversely proportional to how many outbound links it has, and
the value of a link should be proportional to the value of the page.
The simplest formula obeying these rules would be
This is recursive however, so we don’t know if there will be a solution for . We define this as the page rank.
Suppose we play a game on the Webgraph where we start at one page, then uniformly at random we hit one of the outbound hyperlinks on that webpage. This defines us a Markov chain on the Webgraph where
Then consider its stationary distribution from the perspective of
then for a particular position on the righthand side we have
Giving us the page rank we defined above.
Technical tweaks
The web we might be on might have horrible irregularities within it so we get periodic states or strongly connected components. Therefore we add the possibility to randomly jump from the page you are on to one picked uniformly at random from all the pages on the web. We will do this with probability for some .
Therefore we get the following Markov chain
There is one last problem with this definition. What do we do when a page has no outgoing links? There are a couple approaches taken:
Self-loop with all its weight,
Remove such nodes (recursively), and
Set for these nodes.
The first solution makes these pages have a much higher page rank. The second solution leaves us with pages with no page rank. The third solution is practically what happens within the Page Rank algorithm.