Statement

Lemma

Given a Markov decision process , let be the value estimate for a state for the ‘th state. If we update this using the following update rule:

where is a noisy sample of the true value with noise of mean 0, and is a learning rate. Then the incrementally learned will converge in the limit provided that for every state is visited infinitely often:

The sum of the learning rates diverge: , and

The sum of the squared learning rates converges: .

Alex's Notes

Explorer

Learning rate convergence

Statement

Proof

Graph View

Table of Contents

Backlinks