TY - GEN
T1 - Generative Reward Machine for Reinforcement Learning for Physical Internet Distribution Centre
AU - Rezaei, Saeid
AU - N. Brown, Kenneth
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Reinforcement learning (RL) has demonstrated significant potential in addressing challenges within logistics and the Physical Internet domain. Nevertheless, the applicability of existing research to the Physical Internet remains limited due to unrealistic assumptions that may not hold in practical scenarios. This paper outlines the characteristics expected in real-world applications and introduces Gym-DC, an RL environment designed for OpenAI-Gym that simulates a Distribution Centre within the Physical Internet context. We assess the complexity of implementing RL pipelines solutions for these characteristics and categorize them by difficulty. For each category, we detail specific simulator configurations and test the efficacy of adjusted RL pipeline alongside certain heuristics. Our findings reveal that while RL outperforms traditional heuristics in simpler settings, it struggles to achieve good performance in more complex scenarios. To address these limitations, we propose integrating a generative reward machine into the RL pipeline, demonstrating its superior performance compared to conventional RL approaches.
AB - Reinforcement learning (RL) has demonstrated significant potential in addressing challenges within logistics and the Physical Internet domain. Nevertheless, the applicability of existing research to the Physical Internet remains limited due to unrealistic assumptions that may not hold in practical scenarios. This paper outlines the characteristics expected in real-world applications and introduces Gym-DC, an RL environment designed for OpenAI-Gym that simulates a Distribution Centre within the Physical Internet context. We assess the complexity of implementing RL pipelines solutions for these characteristics and categorize them by difficulty. For each category, we detail specific simulator configurations and test the efficacy of adjusted RL pipeline alongside certain heuristics. Our findings reveal that while RL outperforms traditional heuristics in simpler settings, it struggles to achieve good performance in more complex scenarios. To address these limitations, we propose integrating a generative reward machine into the RL pipeline, demonstrating its superior performance compared to conventional RL approaches.
KW - Generative Reward Machine
KW - Reinforcement Learning
KW - Simulation
UR - https://www.scopus.com/pages/publications/105000694989
U2 - 10.1007/978-3-031-82481-4_22
DO - 10.1007/978-3-031-82481-4_22
M3 - Conference proceeding
AN - SCOPUS:105000694989
SN - 9783031824807
T3 - Lecture Notes in Computer Science
SP - 317
EP - 332
BT - Machine Learning, Optimization, and Data Science - 10th International Conference, LOD 2024, Revised Selected Papers
A2 - Nicosia, Giuseppe
A2 - Ojha, Varun
A2 - Giesselbach, Sven
A2 - Pardalos, M. Panos
A2 - Umeton, Renato
PB - Springer Science and Business Media Deutschland GmbH
T2 - 10th International Conference on Machine Learning, Optimization, and Data Science, LOD 2024
Y2 - 22 September 2024 through 25 September 2024
ER -