When is a deterministic optimal policy in MDPs attainable?

10 minute read

Published:

In non-theory research papers, I often see that the optimal (deterministic Markov) policy for MDPs is defined as

\[\pi^* \in \arg\max_\pi v_\pi(s), \quad \forall s \in S,\]

where for a real-valued function $g$ on set $X$ $\arg\max$ is defined as

\[\arg\max_{x \in X} g(x) := \{x' \in X: g(x') \geq g(x), \: \forall x \in X \}.\]

But, if the $\max$ is unattainable the definition of the optimal policy is ill-defined. To rectify that, we can define the optimal policy $\pi^*$ as the policy such that

\[v^*: = v_{\pi^*}(s) = \sup_\pi v_\pi(s), \quad \forall s \in S.\]

In this post, I wanna dig into Proposition 4.4.3 in Martin L. Puterman’s book. So, here is the question: When is the supremum defined below attainable, i.e., $\sup = \max$?

\[u^*_t(h_t) = \sup_{a \in A_s} \left\{r_t(s_t, a) + \sum_{j \in S}p_t(j \mid s_t, a)u^*_{t + 1}(h_t, a, j)\right\}.\]

In the above display, $u_t^*$ is optimal history dependent value function at time step $t$, $h_t = s_0, a_0, \dots, s_t$ is the history until time step $t$, $s_t$ is the state at time step $t$, $r_t$ is the deterministic reward function at time step $t$, $p_t$ is the transition dynamics at time step $t$, and $A_s$ is the set of available actions at state $s \in S$.

Let’s first revisit the background we need.

Background

We will take limits, hence we need to make sure we can. For this, we need define the concepts of completeness and separability. I’ve already defined completeness in this post about Sobolev space, so I only revisit separability here.

A separable metric space

First we need to know what is a dense a set.


A dense set

Let $(X, D)$ be a metric space. Let $E \subset X$ be as subset and $\bar{E}$ be its closure. $E$ is dense if $\bar{E} = X$.


A topological space X (which naturally can be thought of being generated by a metric distance that are open with respect to the metric distance) is separable if it has a countable dense subset, e.g., irrationals in reals. Or said differently: You can “approximate” any point in the space by points from a countable subset.

Weak convergence

Let {$\mu_n$}$_{n \in \mathbb{N}}$ be a sequence of probability measures on $(X, \mathcal{F})$. We say that $\mu_n$ converges weakly to a probability measure $\mu$ on $(X, \mathcal{F})$ if $\int f d\mu_n = \int f d\mu$ for all bounded $\mathcal{F}$-measurable functions $f$.

Semicontinuous functions (a.k.a. almost everywhere continuous functions, not jumpy functions)

First we need to know what an open set is.


An open set

Let $(X, d)$ be a metric space. Given any $x \in X$ and $r > 0$, define the open ball $B(x, r)$ centred at $x$ with radius $r$ to be the set of all $y \in X$ such that $d(x, y) < r$. Given a set $E$, we say $x$ is an interior point of $E$ if there is some open ball centred at $x$, which is contained in $E$. A set is open if every point is an interior point


Let $X$ be a complete separable metric space and $f$ a real-valued function on $X$. We say that $f$ is upper semicontinuous (u.s.c.) if, for any sequence ${x_n}$ of $X$’s elements which converges to $x^*$,

\[\lim\sup_{n \to \infty} f(x_n) \leq f(x^*).\]

Like

\[f(x) = \begin{cases}x + 1 & \mathrm{if}\, x \geq 1 \\ x & x < 1 \end{cases}\]

and $x_n \to 1$. Similarly, $f$ is said to be lower semicontinuous (l.s.c.) whenever $-f$ is u.s.c., or equivalently, $\lim\inf_{n \to \infty} f(x_n) \geq f(x^*).$ A continuous function is both u.s.c and l.s.c.

\[f(x) = \begin{cases}x + 1 & x > 1\\ x & x \leq 1\end{cases}\]

when $x_n \to 1$ is l.s.c.


Lemma 1. Let $X$ be a complete separable metric space. Then,

  1. If $f \geq 0$ and $g \geq 0$ are u.s.c. on $X$, then $fg$ is u.s.c. on $X$.
  2. If $f,g$ are u.s.c. on $X$, then $f + g$ is u.s.c. on X.
  3. If ${f_n}$ is a decreasing sequence of nonpositive u.s.c. functions on $X$, then $\lim_{n \to \infty} f_n$ is u.s.c. on $X$.

Proof. [ChatGPT]

Part (1). One convenient characterization of upper semicontinuity is that: $h$ is u.s.c. iff for every $\alpha \in \mathbb{R}$, the set {$x: h(x) < \alpha$} is open.

We will show that for every $c \in \mathbb{R}$ the set ${x \in X: f(x)g(x) < c}$ is open. If $c \leq 0$, then $U_c = \varnothing$ because $f,g$ are nonnegative, and $\varnothing$ is open. So, fix $c > 0$. We have

\[\{x \in X: f(x)g(x) < c\} = \cup_{r > 0} \{x \in X: f(x) < r\; \mathrm{and} \;g(x) < \frac{c}{r}\}.\]

So, $f,g$ are u.s.c, the sets ${x \in X: f(x) < r } \; \mathrm{and} \; {x \in x: g(x) < \frac{c}{r}}$ are open for any $r$, hence their intersection is open, and the union of open sets are open. $\square$

Part (2). A proof can be given similar to Part (1), however the result is also immediate from the definition of u.s.c.

Part(3). Note that $f(x) := \lim_{n \to \infty}f_n(x) = \inf_{n}f_n(x)$. This indicates $f_1(x) \geq f_2(x) \geq \dots f_n(x) \geq \dots f(x)$. Hence since the sequence is decreasing, for each of the $f_n(x)$ that $f_n(x) < c$ holds, it will hold for all $m > n$, where $c$ is a real number. Hence,

\[\{x \in X: f(x) < c\} = \cup_{n =1}^\infty \{x \in X: f_n(x) < c\}.\]

The proof is completed by stating that since $f_n$s are u.s.c., ${x \in X: f_n(x) < c}$ is open and the union of open sets is open. $\square$

Proposition 1. Suppose $C$ is a compact subset of a complete separable metric space $X$, and $f$ is u.s.c. on $X$. Then there exists an $x^*$ in $C$ such that $f(x^*) \geq f(x)$ for all $x \in C.$

Proof. Let $y^* = \sup_{x \in C}f(x)$ and the corresponding $x$ by $x^*$. Let ${x_n}$ be a sequence in $C$ for which $\lim_{n \to \infty} f(x_n) = y^*$. Then, since $C$ is a compact subset of a complete separable metric space, there exists a subsequence ${x_{n_k}}$ which has a limit $x^*$. By since $f$ is u.s.c., then $f(x^*) \geq \lim_{k \to \infty}f(x_{n_k}) = y^*$. Hence, $f(x^*) = y^*$. $\square$

Proposition 2. Let $X$ be a countable set, $Y$ a complete separable metric space and $q(x, y)$ a bounded nonnegative real-valued function that is l.s.c. in $y$ for each $x \in X$. Let, $f(x)$ be bounded nonpositive real-values function on $X$ for which $\sum_{x \in X}f(x)$ is finite. Then,

\[h(y) = \sum_{x \in X}f(x)q(x, y)\]

is u.s.c on $Y$.

Proof. Based on the part (1) of the above Lemma 1, for each $x \in X, \; f(x)q(x, y) \leq 0$ is u.s.c. Let ${x_n}$ be an increasing sequence of finite subsets of $X$ such that $\cup_{n =1}^\infty X_n = X$. Then, by Lemma 1 we have that $h_n(y) := \sum_{x \in X_n} f(x)q(x, y)$ is u.s.c. in $y$ for each $n$. Since $h_n(y)$ is decreasing in $n$, by Part (3) of Lemma 1, $h(y) = \lim_{n \to \infty} h_n(y)$ is u.s.c. $\square$

The following corollary. generalizes the previous proposition to nondiscrete sets. Note that a kernel $q(\cdot \mid y)$ on Borel subset of $X$ is continuous if $q(\cdot \mid y_n)$ converges weakly to $q(\cdot \mid y)$ whenever ${y_n}$ converges to $y$. It means for a bounded measurable real-valued function $f$ on $X$,

\[\lim_{n \to \infty} \int_X f(x) q(dx \mid y_n) = \int_X f(x)q(dx \mid y).\]

Corollary. Let $X, Y$ be complete separable metric spaces, $f(x)$ a bounded u.s.c. function on $X$, and $q(\cdot \mid y)$ a continuous kernel on the Borel sets of $X$. Then, $h(y) := \int_X f(x) q(dx \mid y)$ is u.s.c. on $Y$.

Main body

Proposition. Assume $S$ is finite or countable, and that

  1. $A_s$ is finite for each $s \in S$, or
  2. $A_s$ is a compact subset of a complete separable metric space, $r_t(s, a)$ is continuous in $a$ for each $s \in S$, there exists an $M < \infty$ for which $|r_t(s_t, a)| < M$ for all $a \in A_s$ and $s \in S$, and $p_t(j \mid s, a )$ is continuous in $a$ for each $j \in S$ and $s \in S$ and $t = 1,2,. . . , N$, or
  3. $A_s$ is a compact subset of a complete separable metric space, $r_t(s, a)$ is u.s.c. in $a$ for each $s \in S$, there exists an $M < \infty$ for which $|r_t(s_t, a)| < M$ for all $a \in A_s$ and $s \in S$, and $p_t(j \mid s, a )$ is l.s.c. in $a$ for each $j \in S$ and $s \in S$ and $t = 1,2,. . . , N$. Then there exists a deterministic Markov policy which is optimal.

Proof.

We need to show that for each state $s$, there exists an action $a’ \in A_s$, for which

\[r_t(s, a') + \sum_{j \in S}p_t(j \mid s, a') u^*_{t + 1}(j) = \sup_{a \in A_s} \left\{r_t(s, a) + \sum_{j \in S}p_t(j \mid s, a) u^*_{t + 1}(j)\right\}.\]

If $A_s$ is finite the result is immediate. So, Part (1) follows naturally. Now consider the setting of Part (3). Since $|r_t(s_t, a)| < M$ for all $a \in A_s$ and $s \in S$ and $t \in [N]$, Therefore, for each $t$, $u^*_t(s) - NM \leq 0$. Now we apply Proposition 2, where $X = S, f(x) = u^*_{t + 1}(x), \, \mathrm{and}\, q(x, y) = p_t(j \mid s, y)$ for a fixed state $x$. Then,

\[\sum_{j \in S}p_t(j \mid s, a)\left[u_{t + 1}^*(j) - NM\right]\]

is u.s.c., from which we can conclude that $\sum_{j \in S}p_t(j \mid s, a)u_{t + 1}^*(j)$ is u.s.c. because shifting by a constant doesn’t change the continuity. By, Part (2) of Lemma 1 we can conclude that $r_t(s, a) + \sum_{j \in S}p_t(j \mid s, a)u_{t + 1}^*(j)$ is u.s.c. in $a$ for each state $s$. Hence, by Proposition 1, the supermum over $A_s$ is attained. Part (2) is a special case of Part(3) since continuous functions are both upper and lower s.c. $\square$

References