Policy Decorator

This page illustrates the failure of vanilla Residual RL. Random residual actions in early training stages cause the agent to deviate significantly from the base policy. This deviation leads to not getting any success signals for guiding learning.

Random Residual Actions

Base Policy

Base Policy + Random Residual Actions