Adaptive routing in agricultural supply chains: Harnessing Q-learning for optimal decision-making in dynamic environments

Authors

DOI:

https://doi.org/10.14719/pst.5426

Keywords:

Markov Decision Process (MDP), logistics, Q-learning, routing optimization

Abstract

In this study, the authors try to emphasize how Q-learning, a model-free reinforcement learning (RL) technique can be used for optimizing routing in a grid-based environment. This study aims to assess the efficacy of Q-learning in enhancing routing for agricultural supply chains, investigate its flexibility in dynamic environments, and compare its performance across several real-world scenarios. In this specific case of the banana chain, an agent is moving through various entities in the system - from local growers to small traders and warehouses. It models the routing problem as a Markov Decision Process (MDP) and the goal is to optimize cumulative reward. Several possible cases are simulated, e.g. the finding of an optimal route for a given visit sequence that optimizes charging time and non-drivable paths left over when unexpected blockages occur to avoid energy wear penalties as well as how to best save costs; These results demonstrate the adaptability and durability of Q-learning in dynamic environments to obtain near-optimal solutions across diverse settings. Indeed, the present study adds to a growing body of research on the application of RL in logistics and supply chain management, highlighting its potential to enhance decision-making in complex and variable environments. The findings suggest that Q-learning can effectively balance multiple objectives, such as minimizing distance, reducing costs, and avoiding high-wear areas, making it a valuable tool for optimizing routing in real-world supply chains. Future work will explore broader applications and other RL algorithms in similar contexts.

Downloads

Download data is not yet available.

References

Watkins CJCH, Dayan P. Q-Learning.Mach Learn.1992;8:279-292 https://doi.org/10.1007/BF00992698

Pallottino S, Scutellà MG. Shortest path algorithms in transportation models: classical and innovative aspects. In: Marcotte P, Nguyen S, editors. Equilibrium and Advanced Transportation Modelling:Centre for Research on Transportation. Springer, Boston, MA; 1998.p245-81. https://doi.org/10.1007/978-1-4615-5757-9_11.

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529-33. https://doi.org/10.1038/nature14236

Tijms HC. A First Course in Stochastic Models.John Wiley & Sons, Ltd;2003.

Sewak M. Temporal difference learning, SARSA and Q-Learning. Deep Reinforcement Learning.Springer;2019. https://doi.org/10.1007/978-981-13-8285-7

Azar NA, Shahmansoorian A, Davoudi M. Uncertainty-aware path planning using reinforcement learning and deep learning methods.Journal of Computer and Knowledge Engineering.2020;3(1):25-37.

Rodrigues P, Vieira SM. Optimizing agent training with deep Q-learning on a self-driving reinforcement learning environment. 2020 IEEE Symposium Series on Computational Intelligence (SSCI). 2020:745-52. https://doi.org/10.1109/SSCI47803.2020.9308525

Sutton RS, Barto AG. Reinforcement learning: an introduction. 2nd ed. A Bradford Book, Cambridge; 2018. (edited based on the link https://www.scirp.org/reference/referencespapers?referenceid=2465216)

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Driessche GVD, et al. Mastering the game of Go with deep neural networks and tree search. Nature. 2016;529(7587):484-9. https://doi.org/10.1038/nature16961

White III CC, White DJ. Markov decision processes. European Journal of Operational Research. 1989 Mar 6;39(1):1-6.

Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. Openai gym. arXiv 2016. arXiv preprint arXiv:1606.01540. 2020.

Chenatti S, Previato G, Cano G, Prudencio R, Leite G, Oliveira T, et al. Deep reinforcement learning in robotics logistic task coordination. In 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR), and 2018 Workshop on Robotics in Education (WRE) 2018; 326-332.https://doi.org/10.1109/LARS/SBR/WRE.2018.00066

Published

09-12-2024

How to Cite

1.
Chow MS, Prahadeeswaran M, Karthick V, Sumathi C, Patil S. Adaptive routing in agricultural supply chains: Harnessing Q-learning for optimal decision-making in dynamic environments. Plant Sci. Today [Internet]. 2024 Dec. 9 [cited 2024 Dec. 22];11(sp4). Available from: https://horizonepublishing.com/journals/index.php/PST/article/view/5426

Most read articles by the same author(s)

Similar Articles

You may also start an advanced similarity search for this article.