Study Log (2019.12)

4 minute read

2019-12-31

Reinforcement Learning
- Chapter 8. Planning and Learning with Tabular Methods
  - 8.1 Models and Planning
  - 8.2 Dyna: Integrated Planning, Acting, and Learning
    - Dyna-Q
- Page #166

2019-12-30

Reinforcement Learning
- Chapter 7. n-step Bootstrapping
  - 7.4 Per-decision Methods with Control Variates
  - 7.5 Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm
  - 7.6 A Unifying Algorithm: n-step Q($\sigma$)
  - 7.7 Summary
- Chapter 8. Planning and Learning with Tabular Methods
  - 8.1 Models and Planning
- Page #161
endtoendAI
- Chapter 7: n-step Bootstrapping

2019-12-29

Reinforcement Learning
- Chapter 7. n-step Bootstrapping
  - 7.4 Per-decision Methods with Control Variates
- Page #151
N-step TD Method
- Source Code
모두를 위한 머신러닝/딥러닝 강의
- Lecture #40) ML lab11-3: CNN Class, Layers, Ensemble
- Lecture #41

2019-12-28

Reinforcement Learning
- Chapter 7. n-step Bootstrapping
  - 7.1 n-step TD Prediction
    - 1) n-step까지 Discounted Reward 합계 G 계산 : $G \leftarrow \sum\nolimits_{i=\tau+1}^{min(\tau+n,T)} \gamma^{i-\tau-1} R_i$
    - 2) n-step에서의 Value 계산 (n-step 이후의 Reward 함축) : $G \leftarrow G + \gamma^n V(\color{red}{ S_{\tau+n} })$
    - 3) V 업데이트 : $V(S_\tau) \leftarrow V(S_\tau) + \alpha [G - V(\color{red}{ S_\tau })]$
    - RandomWalk.py
  - 7.2 n-step Sarsa
  - 7.3 n-step Off-policy Learning
  - 7.4 Per-decision Methods with Control Variates
- Page #150

2019-12-27

Reinforcement Learning
- Chapter 6. Temporal-Difference Learning
  - 6.6 Expected Sarsa
  - 6.7 Maximization Bias and Double Learning
  - 6.8 Games, Afterstates, and Other Special Cases
  - 6.9 Summary
- Chapter 7. n-step Bootstrapping
  - 7.1 n-step TD Prediction
- Page #143
모두를 위한 머신러닝/딥러닝 강의
- Lecture #33) lec10-4: 레고처럼 넷트웍 모듈을 마음껏 쌓아 보자
- Lecture #34
- Lecture #35
- Lecture #36
- Lecture #37
- Lecture #38
- Lecture #39

2019-12-25

Reinforcement Learning
- Chapter 6. Temporal-Difference Learning
  - 6.1 TD Prediction
  - 6.2 Advantages of TD Prediction Methods
  - 6.3 Optimality of TD(0)
  - 6.4 Sarsa: On-policy TD Control
  - 6.5 Q-learning: Off-policy TD Control
- Page #133

2019-12-24

SanghyukChun’s Blog
Reinforcement Learning
- Chapter 5. Monte Carlo Methods
  - 5.7 Off-policy Monte Carlo Control
  - 5.8 Discounting-aware Importance Sampling
  - 5.9 Per-decision Importance Sampling
  - 5.10 Summary
- Page #119

2019-12-23

SanghyukChun’s Blog

2019-12-22

2019-12-21

2019-12-20

숨니의 무작정 따라하기

2019-12-19

숨니의 무작정 따라하기
- [Ch.7] Off-Policy Control

2019-12-17

숨니의 무작정 따라하기
- [Ch.6] Temporal Difference Methods
  - SARSA

2019-12-16

숨니의 무작정 따라하기
- [Ch.4] Dynamic Programming
  - Policy Iteration
  - Value Iteration
- [Ch.5] Monte-Calro Methods
  - MC Control Agent

2019-12-15

2019-12-14

Reinforcement Learning
- Chapter 5. Monte Carlo Methods
  - 5.5 Off-policy Prediction via Importance Sampling
  - 5.6 Incremental Implementation
- Page #110
강화학습 기초부터 DQN까지
- Page #90

2019-12-10

Reinforcement Learning
- Chapter 4. Dynamic Programming
  - 4.2 Policy Improvement
  - 4.3 Policy Iteration
  - 4.4 Value Iteration
  - 4.5 Asynchronous Dynamic Programming
  - 4.6 Generalized Policy Iteration
  - 4.7 Efficiency of Dynamic Programming
  - 4.8 Summary
- Chapter 5. Monte Carlo Methods
  - 5.1 Monte Carlo Prediction
  - 5.2 Monte Carlo Estimation of Action Values
  - 5.3 Monte Carlo Control
  - 5.4 Monte Carlo Control without Exploring Starts
  - 5.5 Off-policy Prediction via Importance Sampling
- Page #104

2019-12-04

Reinforcement Learning
- Chapter 3. Finite Markov Decision Processes
  - 3.1 The Agent–Environment Interface
  - 3.2 Goals and Rewards
  - 3.3 Returns and Episodes
  - 3.4 Unified Notation for Episodic and Continuing Tasks
  - 3.5 Policies and Value Functions
  - 3.6 Optimal Policies and Optimal Value Functions
  - 3.7 Optimality and Approximation
  - 3.8 Summary
- Chapter 4. Dynamic Programming
  - 4.1 Policy Evaluation (Prediction)
  - 4.2 Policy Improvement
- Page #79

2019-12-03

Reinforcement Learning
- Chapter 2. Multi-armed Bandits
  - 2.7 Upper-Confidence-Bound Action Selection
  - 2.8 Gradient Bandit Algorithms
  - 2.9 Associative Search (Contextual Bandits)
  - 2.10 Summary
- Page #47

2019-12-02

모두를 위한 머신러닝/딥러닝 강의
- Lecture #30) lec10-1: Sigmoid 보다 ReLU가 더 좋아
Reinforcement Learning
- Chapter 1. Introduction
  - 1.4 Limitations and Scope
  - 1.5 An Extended Example: Tic-Tac-Toe
  - 1.6 Summary
- Chapter 2. Multi-armed Bandits
  - 2.1 A k-armed Bandit Problem
  - 2.2 Action-value Methods
  - 2.3 The 10-armed Testbed
  - 2.4 Incremental Implementation
  - 2.5 Tracking a Nonstationary Problem
  - 2.6 Optimistic Initial Values
- Page #35

2019-12-01

Fundamental of Reinforcement Learning
- Chapter 4. Dynamic Programming
- Chapter 5. Monte-Carlo Methods
모두를 위한 머신러닝/딥러닝 강의
- Lecture #25) lec9-1: XOR 문제 딥러닝으로 풀기
- Lecture #26
- Lecture #27
- Lecture #28
- Lecture #29
Reinforcement Learning
- Chapter 1. Introduction
  - 1.1 Reinforcement Learning
  - 1.2 Examples
  - 1.3 Elements of Reinforcement Learning
- Page #7

2019-11-30

모두를 위한 머신러닝/딥러닝 강의
- Lecture #14) ML lec 6-1 - Softmax Regression: 기본 개념 소개
- Lecture #15
- Lecture #16
- Lecture #17
- Lecture #18
- Lecture #19
- Lecture #20
- Lecture #21
- Lecture #22
- Lecture #23
- Lecture #24
파이썬과 케라스로 배우는 강화학습
- 1. 강화학습 심화 3: 아타리

2019-11-29

강화학습 관련 자료
- 자료 정리
Fundamental of Reinforcement Learning
- Chapter 1. Introduction
- Chapter 2. Markov Decision Process
- Chapter 3. Bellman Equation

Twitter Facebook LinkedIn

Sang Hun Kim

Study Log (2019.12)

2019-12-31

2019-12-30

2019-12-29

2019-12-28

2019-12-27

2019-12-25

2019-12-24

2019-12-23

2019-12-22

2019-12-21

2019-12-20

2019-12-19

2019-12-17

2019-12-16

2019-12-15

2019-12-14

2019-12-10

2019-12-04

2019-12-03

2019-12-02

2019-12-01

2019-11-30

2019-11-29

Comments

You May Also Enjoy

Study Log (2022.09)

Study Log (2022.09)

Study Log (2022.08)

Study Log (2022.07)