Study Log (2021.02)
5 minute read
2021-02-28
- 바닥부터 배우는 강화학습
-
- 알파고와 MCTS
-
- 블레이드 & 소울 비무 AI 만들기
- 11.1 블레이드 & 소울 비무
- 11.2 비무에 강화학습 적용하기
- 11.3 전투 스타일 유도를 통한 새로운 방식의 Self-Play 학습
2021-02-27
2021-02-26
2021-02-25
2021-02-24
2021-02-23
- 바닥부터 배우는 강화학습
-
- Deep RL 첫 걸음
- 7.1 함수를 이용한 근사
- 7.2 인공 신경망의 도입
-
- 가치 기반 에이전트
2021-02-22
- 수학으로 풀어보는 강화학습 원리와 알고리즘
- 1장. 강화학습 수학
- 1.4 가우시안 분포
- 1.5 랜덤 시퀀스
- 1.5.1 정의
- 1.5.2 평균함수와 자기 상관함수
- 1.5.3 마르코프 시퀀스
- 1.6 선형 확률 차분방정식
- 1.7 표기법
- 1.8 중요 샘플링
- 1.9 엔트로피
- 1.10 KL 발산
- 1.11 추정기
- 1.11.1 최대사후 추정기
- 1.11.2 최대빈도 추정기
- 1.12 벡터와 행렬의 미분
- 1.12.1 벡터로 미분
- 1.12.2 행렬로 미분
2021-02-21
- 바닥부터 배우는 강화학습
- 수학으로 풀어보는 강화학습 원리와 알고리즘
- 1장. 강화학습 수학
- 1.1 확률과 랜덤 변수
- 1.1.1 확률
- 1.1.2 랜덤 변수
- 1.1.3 누적분포함수와 확률밀도함수
- 1.1.4 결합 확률함수
- 1.1.5 조건부 확률함수
- 1.1.6 독립 랜덤 변수
- 1.1.7 랜덤 변수의 함수
- 1.1.8 베이즈 정리
- 1.1.9 샘플링
- 1.2 기댓값과 분산
- 1.2.1 기댓값
- 1.2.2 분산
- 1.2.3 조건부 기댓값과 분산
- 1.3 랜덤벡터
- 1.3.1 정의
- 1.3.2 기댓값과 공분산 행렬
- 1.3.3 샘플 평균
2021-02-20
- 바닥부터 배우는 강화학습
-
- MDP를 모를 때 최고의 정책 찾기
- 6.1 몬테카를로 컨트롤
- 6.2 TD 컨트롤 1 - SARSA
2021-02-18
- 바닥부터 배우는 강화학습
-
- MDP를 모를 때 밸류 평가하기
- 5.1 몬테카를로 학습
- 5.2 Temporal Difference 학습
- 5.3 몬테카를로 vs TD
- 5.4 몬테카를로와 TD의 중간?
2021-02-17
- 바닥부터 배우는 강화학습
-
- MDP를 알 때의 플래닝
- 4.1 밸류 평가하기 - 반복적 정책 평가
- 4.2 최고의 정책 찾기 - 정책 이터레이션
- 4.3 최고의 정책 찾기 - 밸류 이터레이션
2021-02-16
2021-02-15
- 바닥부터 배우는 강화학습
-
- 강화학습이란
- 1.1 지도학습과 강화학습
- 1.2 순차적 의사결정 문제
- 1.3 보상
- 1.4 에이전트와 환경
- 1.5 강화학습의 위력
-
- 마르코프 결정 프로세스 (Markov Decision Process)
- 2.1 마르코프 프로세스 (Markov Process)
- 2.2 마르코프 리워드 프로세스 (Markov Reward Process)
- 2.3 마르코프 결정 프로세스 (Markov Decision Process)
- 2.4 Prediction과 Control
2021-02-09
- S-K RL
- train_FT10_ppo_node_only.py
- do_simulate_on_aggregated_state()
- value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
- eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
- val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
- def get_swapping_ops(blocking_op, machine_dict)
- class blMachine(Machine)
- class blMachineManager(MachineManager)
- class blSimulator(Simulator)
- def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
- def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
- def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
- def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
- def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
- def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
- def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
- def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
- def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
- 팡요랩
2021-02-08
- S-K RL
- train_FT10_ppo_node_only.py
- do_simulate_on_aggregated_state()
- value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
- eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
- val_performance = validation(agent, path, mode=’node_mode’)
- 팡요랩
2021-02-07
- S-K RL
- train_FT10_ppo_node_only.py
- do_simulate_on_aggregated_state()
- value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
- eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
- val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
- def get_swapping_ops(blocking_op, machine_dict)
- class blMachine(Machine)
- class blMachineManager(MachineManager)
- class blSimulator(Simulator)
- def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
- def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
- def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
- def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
- def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
- def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
- def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
- def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
- def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
- 팡요랩
- Cross Entropy 관련
2021-02-06
- S-K RL
- SBJSSP_report_results.ipynb
- def get_swapping_ops(blocking_op, machine_dict)
- class blMachine(Machine)
- class blMachineManager(MachineManager)
- class blSimulator(Simulator)
- def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
- def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
- def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
- def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
- def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
- def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
- def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
- def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
- def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
- 팡요랩
2021-02-05
- S-K RL
- SBJSSP_report_results.ipynb
- def get_swapping_ops(blocking_op, machine_dict)
- class blMachine(Machine)
- class blMachineManager(MachineManager)
- class blSimulator(Simulator)
- def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
- def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
- def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
- def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
- def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
- def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
- def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
- def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
- def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
2021-02-04
- S-K RL
- train_FT10_ppo_node_only.py
- do_simulate_on_aggregated_state()
- value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
- eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
- val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
- def get_swapping_ops(blocking_op, machine_dict)
- class blMachine(Machine)
- class blMachineManager(MachineManager)
- class blSimulator(Simulator)
- def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
- def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
- def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
- def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
- def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
- def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
- def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
- def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
- def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
- 팡요랩
2021-02-03
- S-K RL
- train_FT10_ppo_node_only.py
- do_simulate_on_aggregated_state()
- value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
- eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
- val_performance = validation(agent, path, mode=’node_mode’)
- 팡요랩
2021-02-02
- S-K RL
- train_FT10_ppo_node_only.py
- do_simulate_on_aggregated_state()
- value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
- eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
- val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
- def get_swapping_ops(blocking_op, machine_dict)
- class blMachine(Machine)
- class blMachineManager(MachineManager)
- class blSimulator(Simulator)
- def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
- def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
- def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
- def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
- def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
- def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
- def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
- def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
- def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
2021-02-01
- S-K RL
- train_FT10_ppo_node_only.py
- do_simulate_on_aggregated_state()
- value_loss, action_loss, dist_entropy = agent.fit(eval=0, reward_setting=’utilization’, device=device, return_scaled=False)
- eval_performance = evaluate_agent_on_aggregated_state(simulator=sim, agent=agent, device=’cpu’, mode=’node_mode’)
- val_performance = validation(agent, path, mode=’node_mode’)
- SBJSSP_report_results.ipynb
- def get_swapping_ops(blocking_op, machine_dict)
- class blMachine(Machine)
- class blMachineManager(MachineManager)
- class blSimulator(Simulator)
- def evaluate_agent_on_aggregated_state(simulator, agent, device, mode=’edge_mode’)
- def evaluate_agent_on_aggregated_state_DR(simulator, mode=’MTWR’)
- def SBJSSP_validation(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None)
- def compare_with_optimum(makespans, files, plot=False, scheduler_name=None)
- def evaluate_agent_on_aggregated_state_DR_interrupted(simulator, mode=’MTWR’, shutdown_prob=0.2)
- def evaluate_agent_on_aggregated_state_interrupted(simulator, agent, device, mode=’edge_mode’, shutdown_prob=0.2)
- def SBJSSP_validation_interrupted(agent, path, device=’cpu’, optimums=None, num_val=100, new_attr=False, mode=’edge_mode’, special=None, DR=None, shutdown_prob=0.2)
- def random_simulator(min_m=5, max_m=10, max_job=10, new_attr=False, special=’SBJSSP’):
- def do_simulate_on_aggregated_state_interrupted(simulator, agent, episode_index, device, reward=’utilization’, scaled=False, mode=’edge_mode’,shutdown_prob=0.2)
- 팡요랩
Template
Comments