Hyperparameter tuning for online reinforcement learning

June 29, 2024

news

Hyperparameter tuning for online reinforcement learning

Anna Hakhverdyan, a Master’s student at the University of Alberta, held a seminar on hyperparameter tuning for online reinforcement learning on June 28th at Aerial Robotics Research and Education Center of NPUA.

Reinforcement Learning (RL) studies how an agent can learn to make decisions by interacting with an environment. The agent learns by trial and error where the goal is to maximize the cumulative reward. Online reinforcement learning concerns itself with agents that learn while interacting with the environment. Unfortunately, the performance of most RL agents depends on the hyperparameters, which are many and hard to tune. Most work in online reinforcement learning tunes hyperparameters in an offline phase without accounting for the interaction. This empirical methodology is reasonable for assessing how well algorithms can perform but is limited when evaluating algorithms for practical deployment in the real world. In many applications, one can't do exhaustive hyperparameter searches in the environment, and typical evaluations do not characterize how much data is required for such searches. We try to explore online tuning in this talk, where the agent must select hyperparameters during online interaction. Hyperparameter tuning is part of the agent rather than done in a separate (hidden) tuning phase. We layer sequential optimization techniques on standard RL algorithms and assess behavior when tuning hyperparameters online.

Are you interested what we do?

Become our dream student!

LEARN MORE