Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 2
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

This paper presents how Q-learning algorithm can be applied as a general-purpose selfimproving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
Go to article

Bibliography

[1] H. Boubertakh, S. Labiod, M. Tadjine and P.Y. Glorennec: Optimization of fuzzy PID controllers using Q-learning algorithm. Archives of Control Sciences, 18(4), (2008), 415–435
[2] I.Carlucho, M. De Paula, S.A. Villar and G.G.Acosta: Incremental Qlearning strategy for adaptive PID control of mobile robots. Expert Systems With Applications, 80, (2017), 183–199, DOI: 10.1016/j.eswa.2017.03.002.
[3] K. Delchev: Simulation-based design of monotonically convergent iterative learning control for nonlinear systems. Archives of Control Sciences, 22(4), (2012), 467–480.
[4] M. Jelali: An overview of control performance assessment technology and industrial applications. Control Eng. Pract., 14(5), (2006), 441–466, DOI: 10.1016/j.conengprac.2005.11.005.
[5] M. Jelali: Control Performance Management in Industrial Automation: Assessment, Diagnosis and Improvement of Control Loop Performance. Springer-Verlag London, (2013)
[6] H.-K. Lam, Q. Shi, B. Xiao, and S.-H. Tsai: Adaptive PID Controller Based on Q-learning Algorithm. CAAI Transactions on Intelligence Technology, 3(4), (2018), 235–244, DOI: 10.1049/trit.2018.1007.
[7] D. Li, L. Qian, Q. Jin, and T. Tan: Reinforcement learning control with adaptive gain for a Saccharomyces cerevisiae fermentation process. Applied Soft Computing, 11, (2011), 4488–4495, DOI: 10.1016/j.asoc.2011.08.022.
[8] M.M. Noel and B.J. Pandian: Control of a nonlinear liquid level system using a new artificial neural network based reinforcement learning approach. Applied Soft Computing, 23, (2014), 444–451, DOI: 10.1016/j.asoc.2014.06.037.
[9] T. Praczyk: Concepts of learning in assembler encoding. Archives of Control Sciences, 18(3), (2008), 323–337.
[10] M.B. Radac and R.E. Precup: Data-driven model-free slip control of antilock braking systems using reinforcement Q-learning. Neurocomputing, 275, (2017), 317–327, DOI: 10.1016/j.neucom.2017.08.036.
[11] A.K. Sadhu and A. Konar: Improving the speed of convergence of multi-agent Q-learning for cooperative task-planning by a robot-team. Robotics and Autonomous Systems, 92, (2017), 66–80, DOI: 10.1016/j.robot.2017.03.003.
[12] N. Sahebjamnia, R. Tavakkoli-Moghaddam, and N. Ghorbani: Designing a fuzzy Q-learning multi-agent quality control system for a continuous chemical production line – A case study. Computers & Industrial Engineering, 93, (2016), 215–226, DOI: 10.1016/j.cie.2016.01.004.
[13] K. Stebel: Practical aspects for the model-free learning control initialization. in Proc. of 2015 20th International Conference on Methods and Models in Automation and Robotics (MMAR), Poland, (2015), DOI: 10.1109/MMAR.2015.7283918.
[14] R.S. Sutton and A.G. Barto: Reinforcement learning: An Introduction, MIT Press, (1998)
[15] S. Syafiie, F. Tadeo, and E. Martinez: Softmax and "-greedy policies applied to process control. IFAC Proceedings, 37, (2004), 729–734, DOI: 10.1016/S1474-6670(16)31556-2.
[16] S. Syafiie, F. Tadeo, and E. Martinez: Model-free learning control of neutralization process using reinforcement learning. Engineering Applications of Artificial Intelligence, 20, (2007), 767–782, DOI: 10.1016/j.engappai.2006.10.009.
[17] S. Syafiie, F. Tadeo, and E. Martinez: Learning to control pH processes at multiple time scales: performance assessment in a laboratory plant. Chemical Product and Process Modeling, 2(1), (2007), DOI: 10.2202/1934- 2659.1024.
[18] S. Syafiie, F. Tadeo, E. Martinez, and T. Alvarez: Model-free control based on reinforcement learning for a wastewater treatment problem. Applied Soft Computing, 11, (2011), 73–82, DOI: 10.1016/j.asoc.2009.10.018.
[19] P. Van Overschee and B. De Moor: RAPID: The End of Heuristic PID Tuning. IFAC Proceedings, 33(4), (2000), 595–600, DOI: 10.1016/S1474- 6670(16)38308-8.
[20] M. Wang, G. Bian, and H. Li: A new fuzzy iterative learning control algorithm for single joint manipulator. Archives of Control Sciences, 26(3), (2016), 297–310. DOI: 10.1515/acsc-2016-0017.
[21] Ch.J.C.H. Watkins and P. Dayan: Technical Note: Q-learning. Machine Learning, 8, (1992), 279–292, DOI: 10.1023/A:1022676722315.
Go to article

Authors and Affiliations

Jakub Musial
1
Krzysztof Stebel
1
Jacek Czeczot
1

  1. Silesian University of Technology, Faculty of Automatic Control, Electronics and Computer Science, Department of Automatic Control and Robotics, 44-100 Gliwice, ul. Akademicka 16, Poland
Download PDF Download RIS Download Bibtex

Abstract

Multimedia networks utilize low-power scalar nodes to modify wakeup cycles of high-performance multimedia nodes, which assists in optimizing the power-toperformance ratios. A wide variety of machine learning models are proposed by researchers to perform this task, and most of them are either highly complex, or showcase low-levels of efficiency when applied to large-scale networks. To overcome these issues, this text proposes design of a Q-learning based iterative sleep-scheduling and fuses these schedules with an efficient hybrid bioinspired multipath routing model for largescale multimedia network sets. The proposed model initially uses an iterative Q-Learning technique that analyzes energy consumption patterns of nodes, and incrementally modifies their sleep schedules. These sleep schedules are used by scalar nodes to efficiently wakeup multimedia nodes during adhoc communication requests. These communication requests are processed by a combination of Grey Wolf Optimizer (GWO) & Genetic Algorithm (GA) models, which assist in the identification of optimal paths. These paths are estimated via combined analysis of temporal throughput & packet delivery performance, with node-to-node distance & residual energy metrics. The GWO Model uses instantaneous node & network parameters, while the GA Model analyzes temporal metrics in order to identify optimal routing paths. Both these path sets are fused together via the Q-Learning mechanism, which assists in Iterative Adhoc Path Correction (IAPC), thereby improving the energy efficiency, while reducing communication delay via multipath analysis. Due to a fusion of these models, the proposed Q-Learning based Iterative sleep-scheduling & hybrid Bioinspired Multipath Routing model for Multimedia Networks (QIBMRMN) is able to reduce communication delay by 2.6%, reduce energy consumed during these communications by 14.0%, while improving throughput by 19.6% & packet delivery performance by 8.3% when compared with standard multimedia routing techniques.
Go to article

Authors and Affiliations

Minaxi Doorwar
1
P Malathi
1

  1. SPPU, E&TC Department, India

This page uses 'cookies'. Learn more