Handmade RL -2
Handmade RL - Second Story
환경구성
Wonseok Jung
3D Balance Ball Environment 실행
-
아래의 링크를 따라 ml-agents를 다운로드 받는다. https://github.com/Unity-Technologies/ml-agents
-
\ml-agents-master\unity-environment\Assets\ML-Agents\Examples\3DBall 의 경로에서 3DBall Scene파일을 실행시킨다.
- Hierarchy 에서 Ball3DBrain 을 검색
- Ball3Dbrain double click
- Brain(script)에서 BrainType을 External로 설정한다.
-
Set up scene to play correctly when the training process launches our environment executable
- Player setting을 연다. Edit-> Project stteings ->player
- Resolution and Presentation :
- Run in Background 체크
- Display Resoution Dialog : Disabled
- File-> Build Setting 를 연다
- 3DBall의 scene만 체크
- Build Click
- ml-agent 디렉토리의 python 폴더를 선택한뒤 save
Training the Brain with Reinforcement Learning
-
Use Proximal Policy Optimization (PPO) to train an agent
- learn.py : wrapper script 제공
-
command line이동:
python3 python/learn.py <env_file_path> --run-id=<run-identifier> --train
-
Tensorflow 최신버전에서 에러가 났음 : pip install tensorflow==1.5.0으로 해결
Training with PPO
- ml-agent 디렉토리에서 커맨드창을 열고 다음의 명령어 입력
python3 python/learn.py <env_file_path> --run-id=<run-identifier> --train
또는
python python/learn.py <env_file_path> --run-id=<run-identifier> --train
-
env_file_path : 위에서 build한 3Dball 파일의 경로
- example :
python python/learn.py C:\Users\wonseok\Desktop\ml-agents-master\3Dball/3dball --run-id=test --train
- example :
Observing Training Progress
- After you start your training using learn.py, ml-agent folder will contain a summaries directory.
-
You can use tensorboard in order to observe the training process.
- Run this code
tensorboard --logdir=summaries
Using Tensorflow
-
위의 ``tensorboard –logdir=summaries` 커맨드를 하면 local host의 주소가 나온다.
-
예 :
http://DESKTOP-5NM5TIB:6006
-
그 주소를 웹브라우저 주소창에 입력하면 Tensorboard가 나오며, 훈련과정을 그래프로 볼수 있다.
- 현재 훈련과정
그래프 설명
-
Lesson - only interesting when performing curriculum training. This is not used in the 3D Balance Ball environment.
-
Cumulative Reward - The mean cumulative episode reward over all agents. Should increase during a successful training session.
-
Entropy - How random the decisions of the model are. Should slowly decrease during a successful training process. If it decreases too quickly, the beta hyperparameter should be increased.
-
Episode Length - The mean length of each episode in the environment for all agents.
-
Learning Rate - How large a step the training algorithm takes as it searches for the optimal policy. Should decrease over time.
-
Policy Loss - The mean loss of the policy function update. Correlates to how much the policy (process for deciding actions) is changing. The magnitude of this should decrease during a successful training session.
-
Value Estimate - The mean value estimate for all states visited by the agent. Should increase during a successful training session.
-
Value Loss - The mean loss of the value function update. Correlates to how well the model is able to predict the value of each state. This should decrease during a successful training session.
Embedding the Trained Brain into the Unity Environment (Experimental)
- 트레이닝이 완료되면 모델이 save된다.
- 그 save된 모델을 사용하여 agent의 Internal brain type으로 사용할수 있다.
Setting up TensorFlowSharp Support
- 먼저 TensorFlowSharp를 설치하여야 한다. 다음의 링크를 통하여 받을수 있다.
https://s3.amazonaws.com/unity-ml-agents/0.3/TFSharpPlugin.unitypackage
-
다운받은 파일을 실행한다
-
패키지를 import한다.
- Edit -> Project Settings -> Player
- For each of the platforms you target (PC, Mac and Linux Standalone, iOS or Android): Go into Other Settings.
- Select Scripting Runtime Version to Experimental (.NET 4.6 Equivalent)
- In Scripting Defined Symbols, add the flag ENABLE_TENSORFLOW. After typing in, press Enter.
- Go to File -> Save Project
-
Restart the Unity Editor.
-
The trained model is stored in models/
in the ml-agents folder. Once the training is complete, there will be a .bytes file in that location where is the name of the executable used during training. - Move
.bytes from python/models/ppo/ into unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/.
-
Open the Unity Editor, and select the 3DBall scene as described above.
- Select the Ball3DBrain object from the Scene hierarchy.
- Brain Type을 Interanl로 변경
- Graph Model에서 3Dball을 트레이닝 시킨
.bytes 모델로 변경
성공!!!
References
https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md
https://github.com/wonseokjung