Self-Supervised Multisensory Pretraining for Contact-Rich Robot RL

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning
Rickmer Krohn, Vignesh Prasad, Gabriele Tiboni, Georgia Chalvatzaki

TLDR: Multisensory pretraining enhances RL for contact-rich tasks by learning expressive representations through masked autoencoding.

A robotic arm positioned over a blue box, with various icons representing vision, touch, and a connected network labeled 'MSDP' to the right.

Contact-rich robot manipulation demands tight integration of vision, force, and proprioception. Our new work, Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning, introduces MSDP — a framework that uses masked autoencoding and cross-modal sensor fusion to learn expressive multisensory representations, paired with a novel asymmetric actor-critic architecture for efficient real-robot RL.
MSDP achieves ~90% success on challenging manipulation tasks using only 6,000 real-robot interactions, with the full pipeline completing in under 55 minutes. Adding a force-torque sensor alone improves performance by 14%. The method is robust to sensor noise, variable stiffness, external disturbances, and varying lighting conditions.

Checkout the Website for robot videos: https://msdp-pearl.github.io

Interactive Robot Perception & Learning

New Paper accepted @ RA-L!

Like this:

New Paper accepted @ RA-L!

Share this:

Like this:

Discover more from Interactive Robot Perception & Learning