OpenAI Gym 《四》

雖然樹莓派 3B+ 可以跑 Denny Britz 所寫的

reinforcement-learning/TD/

Model-Free Prediction & Control with Temporal Difference (TD) and Q-Learning

Learning Goals

  • Understand TD(0) for prediction
  • Understand SARSA for on-policy control
  • Understand Q-Learning for off-policy control
  • Understand the benefits of TD algorithms over MC and DP approaches
  • Understand how n-step methods unify MC and TD approaches
  • Understand the backward and forward view of TD-Lambda

 

筆記本︰

 

但是跑不了

reinforcement-learning/DQN/

Deep Q-Learning

Learning Goals

  • Understand the Deep Q-Learning (DQN) algorithm
  • Understand why Experience Replay and a Target Network are necessary to make Deep Q-Learning work in practice
  • (Optional) Understand Double Deep Q-Learning
  • (Optional) Understand Prioritized Experience Replay

 

呦!

………

 反思 DQN 動則需要好幾 G  記憶體,怕也太為難小樹莓了吧☺