12 What is the Bellman operator in reinforcement learning? 2019-03-06T14:07:16.067

7 Why state-action value function as an expected value of the return and state value function, does not need to follow policy? 2020-06-06T08:55:32.493

6 Why do Bellman equations indirectly create a policy? 2017-12-18T13:27:20.397

5 Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction? 2020-07-23T17:32:14.873

4 Why is $G_{t+1}$ is replaced with $v_*(S_{t+1})$ in the Bellman optimality equation? 2020-06-04T19:27:43.360

4 Why are the Bellman operators contractions? 2020-07-31T02:48:34.320

3 What is the proof that policy evaluation converges to the optimal solution? 2020-04-16T06:44:00.997

2 Why is there an expectation sign in the Bellman equation? 2020-04-03T18:43:07.627

2 Why can the Bellman equation be turned into an update rule? 2020-04-10T22:07:35.663

2 Are these two definitions of the state-action value function equivalent? 2020-05-07T09:58:45.690

2 Equation not satisfied in Policy Iteration Algorithm 2020-06-06T07:34:06.170

2 Why we don't use importance sampling in tabular Q-Learning? 2020-06-13T19:18:49.340

2 Why doesn't value iteration use $\pi(a \mid s)$ while policy evaluation does? 2020-08-25T12:35:26.587

1 How are the Bellman optimality equations and minimax related? 2020-04-22T15:17:19.507

1 If the transition model is available, why would we use sample-based algorithms? 2020-07-09T15:05:03.133