
Deep reinforcement learning (RL) algorithms typically parameterise the policy as a deep network that outputs either a deterministic action or a stochastic one, modeled as a Gaussian distribution, hence restricting learning to a single behavioural mode. This project presents a novel actor-critic algorithm that learns multimodal policies as diffusion models from scratch while maintaining versatile behaviours.
~ DDiffPG
Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient, by Zechu Li, Rickmer Krohn, Tao Chen, Anurag Ajay, Pulkit Agrawal, Georgia Chalvatzaki
See you all enjoying the snow at Vancouver, 2024 ❄️.
