Gerenciamento de um cruzamento semaforizado utilizando reinforcement learning e options framework

BORGES, Dimitrius Guilherme Ferreira

Gerenciamento de um cruzamento semaforizado utilizando reinforcement learning e options framework

BORGES, Dimitrius Guilherme Ferreira; http://lattes.cnpq.br/2322507982634726

URI: https://repositorio.unifei.edu.br/jspui/handle/123456789/2275

Data: 2020-12-18

Resumo:

The number of vehicles on the streets across the world has quickly grown in the last decade, directly impacting how urban traffic is managed. The signalized junctions control is a vastly known and studied problem. Although an increasing number of technologies is explored and used to solve it, there still are challenges and opportunities to deal with it, especially when considering the inefficiency of the widely known fixed time traffic controllers, which are incapable of dealing with dynamic events. This study aims to apply Hierarchical Reinforcement Learning (HRL) on the control of a signalized vehicular junction and compare its performance with a fixed time traffic controller, configured using the Webster Method. HRL is a Reinforcement Learning (RL) variation, where secondary objectives, represented by sub-policies, are organized and proposed in a hierarchical model, managed by a macro-policy, responsible for selecting said sub-policies when those are capable of reaching its best results, where The Q-Learning Framework rules both sub and macro policies. Hierarchical Reinforcement Learning was chosen because it combines the ability to learn and make decisions while taking observations from the environment, in real-time, a typical ability from Reinforcement Learning, with a Divide to Conquer approach, where the problem is divided into sub-problems. These capabilities bring to a highly dynamic problem a more significant power of adaptability, which is impossible to be taken into account when using deterministic models like the Webster Method. The test scenarios, composed of several vehicle fluxes applied to a cross of two lanes, were built using the SUMO simulation tool. HRL, its sub-policies and the Webster Method are applied and assessed through these scenarios. According to the obtained results, HRL shows better results than the Webster Method and its isolated sub-policies, indicating a simple and efficient alternative.

Mostrar registro completo