June 29, 2024
news
Hyperparameter tuning for online reinforcement learning
Anna Hakhverdyan, a Master’s student at the University of Alberta, held a seminar on hyperparameter tuning for online reinforcement learning օn June 28th at Aerial Robotics Research and Education Center of NPUA.
Reinforcement Learning (RL) studies how an agent can learn to make decisions by interacting with an environment. The agent learns by trial and error where the goal is to maximize the cumulative reward. Online reinforcement learning concerns itself with agents that learn while interacting with the environment. Unfortunately, the performance of most RL agents depends on the hyperparameters, which are many and hard to tune. Most work in online reinforcement learning tunes hyperparameters in an offline phase without accounting for the interaction. This empirical methodology is reasonable for assessing how well algorithms can perform but is limited when evaluating algorithms for practical deployment in the real world. In many applications, one can't do exhaustive hyperparameter searches in the environment, and typical evaluations do not characterize how much data is required for such searches. We try to explore online tuning in this talk, where the agent must select hyperparameters during online interaction. Hyperparameter tuning is part of the agent rather than done in a separate (hidden) tuning phase. We layer sequential optimization techniques on standard RL algorithms and assess behavior when tuning hyperparameters online.
Anna Hakhverdyan is a Master’s student at the University of Alberta under the supervision of Martha White. She mainly focuses on making reinforcement learning agents more computationally efficient by making the hyperparameter tuning part of the agent itself, opening an avenue for never-ending agents. Previously, she earned her bachelor's degree from the National University of Polytechnic of Armenia.