# Uniqueness of the optimal value function for an MDP

by jakab922   Last Updated November 17, 2017 17:19 PM

Suppose we have a Markov decision process with a finite state set and a finite action set. We calculate the expected reward with a discount of $\gamma \in [0,1]$. The Sutton & Barto book states in chapter 3.8 that there always exists at least one optimal policy but it doesn't prove why. Obviously the various optimal policies yield the same optimal value function at least this is what would make sense and also assumed in the book.

Can someone give me a proof for the above statement or a link to a proof?

Tags :