New paper accepted at NeurIPS 2024!

1–2 minutes

Deep reinforcement learning (RL) algorithms typically parameterise the policy as a deep network that outputs either a deterministic action or a stochastic one, modeled as a Gaussian distribution, hence restricting learning to a single behavioural mode. This project presents a novel actor-critic algorithm that learns multimodal policies as diffusion models from scratch while maintaining versatile behaviours.

~ DDiffPG

Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient, by Zechu Li, Rickmer Krohn, Tao Chen, Anurag Ajay, Pulkit Agrawal, Georgia Chalvatzaki

See you all enjoying the snow at Vancouver, 2024 ❄️.

Discover more from Interactive Robot Perception & Learning

Subscribe now to keep reading and get access to the full archive.

Continue reading