Study Log (2021.02)

5 minute read

2021-02-28

바닥부터 배우는 강화학습
- 1. 알파고와 MCTS
    - 10.1 알파고
    - 10.2 알파고 제로
- 1. 블레이드 & 소울 비무 AI 만들기
    - 11.1 블레이드 & 소울 비무
    - 11.2 비무에 강화학습 적용하기
    - 11.3 전투 스타일 유도를 통한 새로운 방식의 Self-Play 학습

2021-02-27

바닥부터 배우는 강화학습
- 1. 정책 기반 에이전트
    - 9.3 액터-크리틱

2021-02-26

바닥부터 배우는 강화학습
- 1. 정책 기반 에이전트
    - 9.2 REINFORCE 알고리즘

2021-02-25

바닥부터 배우는 강화학습
- 1. 정책 기반 에이전트
    - 9.1 Policy Gradient

2021-02-24

바닥부터 배우는 강화학습
- 1. 가치 기반 에이전트
    - 8.2 딥 Q러닝

2021-02-23

바닥부터 배우는 강화학습
- 1. Deep RL 첫 걸음
    - 7.1 함수를 이용한 근사
    - 7.2 인공 신경망의 도입
- 1. 가치 기반 에이전트
    - 8.1 밸류 네트워크의 학습

2021-02-22

수학으로 풀어보는 강화학습 원리와 알고리즘
- 1장. 강화학습 수학
  - 1.4 가우시안 분포
  - 1.5 랜덤 시퀀스
    - 1.5.1 정의
    - 1.5.2 평균함수와 자기 상관함수
    - 1.5.3 마르코프 시퀀스
  - 1.6 선형 확률 차분방정식
  - 1.7 표기법
  - 1.8 중요 샘플링
  - 1.9 엔트로피
  - 1.10 KL 발산
  - 1.11 추정기
    - 1.11.1 최대사후 추정기
    - 1.11.2 최대빈도 추정기
  - 1.12 벡터와 행렬의 미분
    - 1.12.1 벡터로 미분
    - 1.12.2 행렬로 미분

2021-02-21

바닥부터 배우는 강화학습
- 1. MDP를 모를 때 최고의 정책 찾기
    - 6.3 TD 컨트롤 2 - Q러닝
수학으로 풀어보는 강화학습 원리와 알고리즘
- 1장. 강화학습 수학
  - 1.1 확률과 랜덤 변수
    - 1.1.1 확률
    - 1.1.2 랜덤 변수
    - 1.1.3 누적분포함수와 확률밀도함수
    - 1.1.4 결합 확률함수
    - 1.1.5 조건부 확률함수
    - 1.1.6 독립 랜덤 변수
    - 1.1.7 랜덤 변수의 함수
    - 1.1.8 베이즈 정리
    - 1.1.9 샘플링
  - 1.2 기댓값과 분산
    - 1.2.1 기댓값
    - 1.2.2 분산
    - 1.2.3 조건부 기댓값과 분산
  - 1.3 랜덤벡터
    - 1.3.1 정의
    - 1.3.2 기댓값과 공분산 행렬
    - 1.3.3 샘플 평균

2021-02-20

바닥부터 배우는 강화학습
- 1. MDP를 모를 때 최고의 정책 찾기
    - 6.1 몬테카를로 컨트롤
    - 6.2 TD 컨트롤 1 - SARSA

2021-02-18

바닥부터 배우는 강화학습
- 1. MDP를 모를 때 밸류 평가하기
    - 5.1 몬테카를로 학습
    - 5.2 Temporal Difference 학습
    - 5.3 몬테카를로 vs TD
    - 5.4 몬테카를로와 TD의 중간?

2021-02-17

바닥부터 배우는 강화학습
- 1. MDP를 알 때의 플래닝
    - 4.1 밸류 평가하기 - 반복적 정책 평가
    - 4.2 최고의 정책 찾기 - 정책 이터레이션
    - 4.3 최고의 정책 찾기 - 밸류 이터레이션

2021-02-16

바닥부터 배우는 강화학습
- 1. 벨만 방정식
    - 3.1 벨만 기대 방정식
    - 3.2 벨만 최적 방정식

2021-02-15

바닥부터 배우는 강화학습
- 1. 강화학습이란
    - 1.1 지도학습과 강화학습
    - 1.2 순차적 의사결정 문제
    - 1.3 보상
    - 1.4 에이전트와 환경
    - 1.5 강화학습의 위력
- 1. 마르코프 결정 프로세스 (Markov Decision Process)
    - 2.1 마르코프 프로세스 (Markov Process)
    - 2.2 마르코프 리워드 프로세스 (Markov Reward Process)
    - 2.3 마르코프 결정 프로세스 (Markov Decision Process)
    - 2.4 Prediction과 Control

2021-02-09

S-K RL
- train_FT10_ppo_node_only.py
  - do_simulate_on_aggregated_state()
  - value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
  - eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
  - val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
  - def get_swapping_ops(blocking_op, machine_dict)
  - class blMachine(Machine)
  - class blMachineManager(MachineManager)
  - class blSimulator(Simulator)
  - def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
  - def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
  - def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
  - def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
  - def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
  - def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
  - def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
  - def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
  - def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
팡요랩
- 강화학습 10강 - Classic Games

2021-02-08

S-K RL
- train_FT10_ppo_node_only.py
  - do_simulate_on_aggregated_state()
  - value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
  - eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
  - val_performance = validation(agent, path, mode=’node_mode’)
팡요랩
- 강화학습 9강 - Exploration and Exploitation

2021-02-07

S-K RL
- train_FT10_ppo_node_only.py
  - do_simulate_on_aggregated_state()
  - value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
  - eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
  - val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
  - def get_swapping_ops(blocking_op, machine_dict)
  - class blMachine(Machine)
  - class blMachineManager(MachineManager)
  - class blSimulator(Simulator)
  - def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
  - def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
  - def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
  - def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
  - def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
  - def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
  - def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
  - def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
  - def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
팡요랩
- 강화학습 8강 - Integrating Learning and Planning
Cross Entropy 관련
- LOGISTIC, CROSS-ENTROPY LOSS의 확률론적 의미
- Cross Entropy의 정확한 확률적 의미

2021-02-06

S-K RL
- SBJSSP_report_results.ipynb
  - def get_swapping_ops(blocking_op, machine_dict)
  - class blMachine(Machine)
  - class blMachineManager(MachineManager)
  - class blSimulator(Simulator)
  - def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
  - def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
  - def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
  - def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
  - def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
  - def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
  - def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
  - def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
  - def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
팡요랩
- 강화학습 7강 - Policy Gradient

2021-02-05

S-K RL
- SBJSSP_report_results.ipynb
  - def get_swapping_ops(blocking_op, machine_dict)
  - class blMachine(Machine)
  - class blMachineManager(MachineManager)
  - class blSimulator(Simulator)
  - def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
  - def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
  - def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
  - def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
  - def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
  - def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
  - def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
  - def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
  - def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)

2021-02-04

S-K RL
- train_FT10_ppo_node_only.py
  - do_simulate_on_aggregated_state()
  - value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
  - eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
  - val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
  - def get_swapping_ops(blocking_op, machine_dict)
  - class blMachine(Machine)
  - class blMachineManager(MachineManager)
  - class blSimulator(Simulator)
  - def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
  - def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
  - def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
  - def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
  - def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
  - def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
  - def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
  - def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
  - def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
팡요랩
- 강화학습 6강 - Value Function Approximation

2021-02-03

S-K RL
- train_FT10_ppo_node_only.py
  - do_simulate_on_aggregated_state()
  - value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
  - eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
  - val_performance = validation(agent, path, mode=’node_mode’)
팡요랩
- 강화학습 5강 - Model Free Control

2021-02-02

S-K RL
- train_FT10_ppo_node_only.py
  - do_simulate_on_aggregated_state()
  - value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
  - eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
  - val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
  - def get_swapping_ops(blocking_op, machine_dict)
  - class blMachine(Machine)
  - class blMachineManager(MachineManager)
  - class blSimulator(Simulator)
  - def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
  - def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
  - def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
  - def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
  - def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
  - def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
  - def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
  - def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
  - def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)

2021-02-01

S-K RL
- train_FT10_ppo_node_only.py
  - do_simulate_on_aggregated_state()
  - value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
  - eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
  - val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
  - def get_swapping_ops(blocking_op, machine_dict)
  - class blMachine(Machine)
  - class blMachineManager(MachineManager)
  - class blSimulator(Simulator)
  - def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
  - def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
  - def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
  - def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
  - def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
  - def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
  - def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
  - def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
  - def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
팡요랩
- 강화학습 4강 - Model Free Prediction

Template

Twitter Facebook LinkedIn

Sang Hun Kim

Study Log (2021.02)

2021-02-28

2021-02-27

2021-02-26

2021-02-25

2021-02-24

2021-02-23

2021-02-22

2021-02-21

2021-02-20

2021-02-18

2021-02-17

2021-02-16

2021-02-15

2021-02-09

2021-02-08

2021-02-07

2021-02-06

2021-02-05

2021-02-04

2021-02-03

2021-02-02

2021-02-01

Template

Comments

You May Also Enjoy

Study Log (2022.09)

Study Log (2022.09)

Study Log (2022.08)

Study Log (2022.07)