Q-function (RL)

Suppose we are in a Markov decision process and using discounted rewards define the Q-function as the solution to the following equations