Study Log (2020.01)

3 minute read

2020-01-30

  • Reinforcement Learning
    • Chapter 12. Eligibility Traces
      • 12.5 True Online TD($\lambda$)
      • 12.6 Dutch Traces in Monte Carlo Learning
      • 12.7 Sarsa($\lambda$)
        • Sarsa($\lambda$) with binary features and linear function approximation
        • True online Sarsa($\lambda$)
      • 12.8 Variable $\lambda$ and $\gamma$
      • 12.9 Off-policy Traces with Control Variates
      • 12.10 Watkins’s Q($\lambda$) to Tree-Backup($\lambda$)
      • 12.11 Stable Off-policy Methods with Traces
    • Page #316

2020-01-27

  • Reinforcement Learning
    • Chapter 12. Eligibility Traces
      • 12.3 n-step Truncated $\lambda$-return Methods
      • 12.4 Redoing Updates: Online $\lambda$-return Algorithm
    • Page #299
  • 팡요랩
    • Lecture #4
    • Lecture #5
      • DP vs. TD (1)
      • DP vs. TD (2)

2020-01-25


2020-01-24


2020-01-23


2020-01-22

  • Reinforcement Learning
    • Chapter 11. Off-policy Methods with Approximation
      • 11.7 Gradient-TD Methods
      • 11.8 Emphatic-TD Methods
      • 11.9 Reducing Variance
      • 11.10 Summary
    • Page #287
  • 팡요랩
    • Lecture #1

2020-01-21

  • Reinforcement Learning
    • Chapter 11. Off-policy Methods with Approximation
      • 11.6 The Bellman Error is Not Learnable
    • Page #278

2020-01-20

  • Reinforcement Learning
    • Chapter 11. Off-policy Methods with Approximation
      • 11.2 Examples of Off-policy Divergence
      • 11.3 The Deadly Triad
      • 11.4 Linear Value-function Geometry
      • 11.5 Gradient Descent in the Bellman Error
    • Page #272

2020-01-19

  • Reinforcement Learning
    • Chapter 11. Off-policy Methods with Approximation
      • 11.2 Examples of Off-policy Divergence
    • Page #263

2020-01-18

  • Reinforcement Learning
    • Chapter 11. Off-policy Methods with Approximation
      • 11.2 Examples of Off-policy Divergence
    • Page #262

2020-01-17

  • Reinforcement Learning
    • Chapter 11. Off-policy Methods with Approximation
      • 11.1 Semi-gradient Methods
      • 11.2 Examples of Off-policy Divergence
    • Page #260

2020-01-16

  • Reinforcement Learning
    • Chapter 10. On-policy Control with Approximation
      • 10.5 Differential Semi-gradient n-step Sarsa
      • 10.6 Summary
    • Page #257

2020-01-15

  • Reinforcement Learning
    • Chapter 10. On-policy Control with Approximation
      • 10.3 Average Reward: A New Problem Setting for Continuing Tasks
      • 10.4 Deprecating the Discounted Setting
    • Page #255

2020-01-14


2020-01-13

  • Reinforcement Learning
    • Chapter 9. On-policy Prediction with Approximation
      • 9.9 Memory-based Function Approximation
      • 9.10 Kernel-based Function Approximation
      • 9.11 Looking Deeper at On-policy Learning: Interest and Emphasis
      • 9.12 Summary
    • Chapter 10. On-policy Control with Approximation
      • 10.1 Episodic Semi-gradient Control
      • 10.2 Semi-gradient n-step Sarsa
    • Page #247

2020-01-11

  • Reinforcement Learning
    • Chapter 9. On-policy Prediction with Approximation
      • 9.7 Nonlinear Function Approximation: Artificial Neural Networks
      • 9.8 Least-Squares TD
    • Page #228

2020-01-10

  • Reinforcement Learning
    • Chapter 9. On-policy Prediction with Approximation
      • 9.5 Feature Construction for Linear Methods
        • 9.5.5 Radial Basis Functions
      • 9.6 Selecting Step-Size Parameters Manually
    • Page #223

2020-01-09


2020-01-08

  • Reinforcement Learning
    • Chapter 9. On-policy Prediction with Approximation
      • 9.4 Linear Methods
      • 9.5 Feature Construction for Linear Methods
        • 9.5.1 Polynomials
        • 9.5.2 Fourier Basis
        • 9.5.3 Coarse Coding
    • Page #217

2020-01-07


2020-01-06


2020-01-05

  • Reinforcement Learning
    • Chapter 9. On-policy Prediction with Approximation
      • 9.1 Value-function Approximation
    • Page #199

2020-01-04

  • Reinforcement Learning
    • Chapter 8. Planning and Learning with Tabular Methods
      • 8.7 Real-time Dynamic Programming
      • 8.8 Planning at Decision Time
      • 8.9 Heuristic Search
      • 8.10 Rollout Algorithms
      • 8.11 Monte Carlo Tree Search
      • 8.12 Summary of the Chapter
      • 8.13 Summary of Part I: Dimensions
    • Page #195

2020-01-03


2020-01-02


2020-01-01


Template

Updated:

Comments