Overview. While state-of-the-art Diffusion Policy struggles with finer task details—such as failing to precisely insert a peg into a hole—Policy Decorator refines their performance to near 100%.
Click the "cc" button at the lower right corner to show captions.
The offline-trained base policies can reproduce the natural and smooth motions recorded in demonstrations but may have suboptimal performance.
Policy Decorator (ours) not only achieves remarkably high success rates but also preserves the favorable attributes of the base policy.
Policies solely learned by RL, though achieving good success rates, often exhibit jerky actions, rendering them unsuitable for real-world applications.
click blue texts for visualizations
What are the pros of Policy Decorator compared to base policy?
What are the pros pf Policy Decorator compared to RL policy?
JSRL policy also performs well on some tasks, why not use it?
What happens if we fine-tune base policy with a randomly initialized critic?