What I don’t know

less than 1 minute read

Published: January 01, 2999

Bounded rationality
Non-parametric bandits: Arms’ distribution can be multi modal, not conforming to a single parameter exponential family.
Thomson Sampling only works for parameteric bandits.
Statistical bootstrap (ET94)
Pure exploration
Ambient dimension
Optimal Design
Round robin
Compact MDPs
KWIK learners
Tower rule
Mansour, Littman, et al. (Lunch 2023)
IMPACT (Trimbach 2019)
Mirror Descent
Paper: Recursive Reward Aggregation
Rich’s Dyna with function approximation.
Banach Spaces
Black-well optimality
discrete vs countable
e values/processes
Why $n-1$ for the empirical variance?
ergodic, i.e., aperiodic, recurrent, and irreducible MDPs
Use of the law of total variance in RL
$\sigma$-algebra vs. a topology
Why $\mathrm{rank}(A \otimes B) = \mathrm{rank}(A) \cdot \mathrm{rank}(B)$
Bayesian vs. Frequenist analysis of TS
Gradient vs. derivative
t-student distribution
Exchanging derivative and expectation?