Handmade RL -2

2018-05-30

Handmade RL - Second Story

환경구성

Wonseok Jung

3D Balance Ball Environment 실행

아래의 링크를 따라 ml-agents를 다운로드 받는다. https://github.com/Unity-Technologies/ml-agents
\ml-agents-master\unity-environment\Assets\ML-Agents\Examples\3DBall 의 경로에서 3DBall Scene파일을 실행시킨다.

Hierarchy 에서 Ball3DBrain 을 검색
Ball3Dbrain double click
Brain(script)에서 BrainType을 External로 설정한다.

Set up scene to play correctly when the training process launches our environment executable
- Player setting을 연다. Edit-> Project stteings ->player
- Resolution and Presentation :
- Run in Background 체크
- Display Resoution Dialog : Disabled

File-> Build Setting 를 연다
3DBall의 scene만 체크
Build Click
ml-agent 디렉토리의 python 폴더를 선택한뒤 save

mlagents-buildwindow

Training the Brain with Reinforcement Learning

Use Proximal Policy Optimization (PPO) to train an agent
learn.py : wrapper script 제공
command line이동: python3 python/learn.py <env_file_path> --run-id=<run-identifier> --train
Tensorflow 최신버전에서 에러가 났음 : pip install tensorflow==1.5.0으로 해결

Training with PPO

ml-agent 디렉토리에서 커맨드창을 열고 다음의 명령어 입력

python3 python/learn.py <env_file_path> --run-id=<run-identifier> --train

또는

python python/learn.py <env_file_path> --run-id=<run-identifier> --train

env_file_path : 위에서 build한 3Dball 파일의 경로
- example : python python/learn.py C:\Users\wonseok\Desktop\ml-agents-master\3Dball/3dball --run-id=test --train

Observing Training Progress

After you start your training using learn.py, ml-agent folder will contain a summaries directory.
You can use tensorboard in order to observe the training process.
Run this code

tensorboard --logdir=summaries

Using Tensorflow

위의 ``tensorboard –logdir=summaries` 커맨드를 하면 local host의 주소가 나온다.
예 : http://DESKTOP-5NM5TIB:6006
그 주소를 웹브라우저 주소창에 입력하면 Tensorboard가 나오며, 훈련과정을 그래프로 볼수 있다.

현재 훈련과정

그래프 설명

Lesson - only interesting when performing curriculum training. This is not used in the 3D Balance Ball environment.
Cumulative Reward - The mean cumulative episode reward over all agents. Should increase during a successful training session.

Entropy - How random the decisions of the model are. Should slowly decrease during a successful training process. If it decreases too quickly, the beta hyperparameter should be increased.
Episode Length - The mean length of each episode in the environment for all agents.
Learning Rate - How large a step the training algorithm takes as it searches for the optimal policy. Should decrease over time.

Policy Loss - The mean loss of the policy function update. Correlates to how much the policy (process for deciding actions) is changing. The magnitude of this should decrease during a successful training session.
Value Estimate - The mean value estimate for all states visited by the agent. Should increase during a successful training session.
Value Loss - The mean loss of the value function update. Correlates to how well the model is able to predict the value of each state. This should decrease during a successful training session.

Embedding the Trained Brain into the Unity Environment (Experimental)

트레이닝이 완료되면 모델이 save된다.
그 save된 모델을 사용하여 agent의 Internal brain type으로 사용할수 있다.

Setting up TensorFlowSharp Support

먼저 TensorFlowSharp를 설치하여야 한다. 다음의 링크를 통하여 받을수 있다.

https://s3.amazonaws.com/unity-ml-agents/0.3/TFSharpPlugin.unitypackage

다운받은 파일을 실행한다
패키지를 import한다.
Edit -> Project Settings -> Player
For each of the platforms you target (PC, Mac and Linux Standalone, iOS or Android): Go into Other Settings.

Select Scripting Runtime Version to Experimental (.NET 4.6 Equivalent)

In Scripting Defined Symbols, add the flag ENABLE_TENSORFLOW. After typing in, press Enter.

Go to File -> Save Project
Restart the Unity Editor.
The trained model is stored in models/ in the ml-agents folder. Once the training is complete, there will be a .bytes file in that location where is the name of the executable used during training.
Move .bytes from python/models/ppo/ into unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/.

Open the Unity Editor, and select the 3DBall scene as described above.
Select the Ball3DBrain object from the Scene hierarchy.
Brain Type을 Interanl로 변경

bandicam 2018-05-30 09-35-11-301

Graph Model에서 3Dball을 트레이닝 시킨 .bytes 모델로 변경

bandicam 2018-05-30 09-37-38-359

성공!!!

References

https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md

https://github.com/wonseokjung

 bayesian statistics - 1 Handmade RL -3 