Lecture 12 | Actor-Critic Method
Tags
Date
Asynchronous n-Step Q-Learning
β’
Accumulating N step trajectory reward
Deep Deterministic Policy Gradient
β’