Policy Decorator (Ours) vs. RL

Our refined policy, learned through Policy Decorator, achieves high success rates while preserving the base policy's strengths. We compare it to an RL-trained policy, showing that our refined policy behaves more smoothly and naturally. Since SAC with sparse rewards failed on most tasks, we use RLPD, a state-of-the-art RL method that incorporates demonstrations, to train policies and generate visualizations.

Key Observations

RL policies produce noisy, jerky motions due to sparse reward training without motion constraints, making them hard to transfer to the real world. In contrast, our refined policies, guided by the base policy and the bounded residual action strategy, exhibit smooth, natural behavior by inheriting the favorable attributes of base policies trained on smooth human teleoperation or motion planning demonstrations.

Peg Insertion Task

The Peg Insertion task requires precise manipulation with only 3mm of clearance, making it more challenging than similar tasks. The RL policy produces jerky motions, risking damage when transferred to the real world. In contrast, our refined Policy Decorator policy ensures smooth motions, eliminating the risk of damaging the peg or box.

Turn Faucet Task

In the Turn Faucet task, the RL policy exhibits erratic behavior, causing collisions with the ground and faucet, which complicates sim-to-real transfer. In contrast, our refined Policy Decorator policy ensures smooth motions, avoiding these collisions.