This page illustrates the failure of vanilla Residual RL. Random residual actions in early training stages cause the agent to deviate significantly from the base policy. This deviation leads to not getting any success signals for guiding learning.