Policy Decorator (Ours) vs. Jump-Start RL

Jump-Start RL (JSRL) performs well in a few scenarios (5 out of 18) but significantly worse in the rest. It doesn't improve the base policy but learns a new one, failing to preserve desired traits like smooth and natural motion. In contrast, our refined policies maintain smooth, natural behavior by staying close to the base policy through the bounded residual action strategy.

Key Observations

JSRL policies are noisy and jerky because they use the base policy only for the learning curriculum, not for imitation. As a result, they lack the smooth, natural motion of the base policy. In contrast, our refined policies stay close to the base policy, inheriting its smooth behavior from human teleoperation or motion planning demonstrations.

Peg Insertion Task

The Peg Insertion task requires highly precise manipulation, with the hole having only 3mm of clearance. The JSRL policy exhibits notably shaky motions, particularly when the robot arm attempts to pick up the peg and insert it into the hole. In contrast, our refined policy demonstrates smooth motions throughout the entire trajectory.

Turn Faucet Task

The Turn Faucet task, which involves turning a faucet, shows erratic behavior with the JSRL policy. This results in collisions with the faucet, posing risks for sim-to-real transfer. In contrast, our refined policy demonstrates remarkably smooth motions, avoiding collisions and potential damage.