Issues of organizing computations in multicomputersystems with the software-controlled failure- and fault-tolerance. Part 1
Authors: Asharina I.V.
Published in issue: #6(114)/2021
DOI: 10.18698/2308-6033-2021-6-2088
Category: Aviation and Rocket-Space Engineering | Chapter: Innovation Technologies of Aerospace Engineering
This three-part paper analyzes existing approaches and methods of organizing failure- and fault-tolerant computing in distributed multicomputer systems (DMCS), identifies and provides rationale for a list of issues to be solved. We present the concept of fault tolerance proposed by A. Avizienis, explicate its dissimilarity from the modern concept and the reason for its inapplicability with regard to modern distributed multicomputer systems. We justify the necessity to refine the definition of fault tolerance approved by the State Standards, as well as the necessity to specify three input parameters to be taken into account in the DMCS design methods: permitted fault models, permitted multiplicity of faults, permitted fault sequence capabilities. We formulate the questions that must be answered in order to design a truly reliable, fault-tolerant system and consider the application areas of the failure- and fault-tolerant control systems for complex network and distributed objects. System, functional, and test diagnostics serve as the basis for building unattended failure- and fault-tolerant systems. The concept of self-managed degradation (with the DMCS eventually proceeding to a safe shutdown at a critical level of degradation) is a means to increase the DMCS active life. We consider the issues related to the diagnosis of multiple faults and present the main differences in ensuring fault tolerance between systems with broadcast communication channels and systems with point-to-point communication channels.
The first part of the work mainly deals with the analysis of existing approaches and methods of organizing failure- and fault-tolerant computing in DMCS and the definition of the concept of fault-tolerance.
References
[1] Tanenbaum A., Wetherall D. Computer Networks. 5th ed. Pearson, 2010, 960 p. [In Russ.: Tanenbaum A., Wetherall D. Kompyuternye seti. 5-e izd. St. Petersburg, Piter Publ., 2016, 955 p.].
[2] Avizienis A. TIIER — Proceedings of the IEEE, 1978, vol. 66, no. 10, pp. 1109–1125.
[3] Yambulatov E.I. Razrabotka otkazoustojchivyh raspredelennyh sistem upravlenija telekommunikacionnymi setjami. Avtoref. dis. … kand. tekhn. nauk [Design of fault-tolerant distributed control systems for telecommunication networks. Cand. Sc. (Eng.) author’s abstract]. Stavropol, 2014, 22 p.
[4] Melnik E.V., Ivanov D.Ya., Gandurin V.A., Klimenko A.B. Izvestiya YuFU. Tekhnicheskie nauki — Izvestiya SFedU. Engineering Sciences. Section III. Distributed computing and systems, 2016, no. 12, pp. 129–143.
[5] Shishov A. Razrabotka i vnedrenie ASU "Kuznetsov" s primeneniem setetsentricheskogo podkhoda i mul’tiagentnykh tekhnologiy [Development and implementation of ACS “Kuznetsov” using network-centric approach and multi-agent technology]. XII Vserossiyskoe soveschanie po problemam upravleniya VSPU–2014 [XII All-Russian conference on control problems VSPU–2014]. Moscow, 2014, vol. 16, pp. 9050–9062.
[6] Fedoseev S.A. Setetsentricheskiy podkhod k zadache upravleniya zakazami na promyshlennom predpriyatii [Network-centric approach to task order management at industrial enterprise]. XII Vserossiyskoe soveshchanie po problemam upravleniya VSPU–2014 [XII All-Russian conference on control problems VSPU–2014]. Moscow, 2014, pp. 7524–7528.
[7] Fedoseev S.A., Stolbov V.Yu., Pustovoyt K.S. Model gruppovogo upravleniya v setetsentricheskikh proizvodstvennykh sistemakh [Model of group management in network-centric manufacturing systems]. Materialy konferentsii “Upravlenie v tekhnicheskikh, ergaticheskikh, organizatsionnykh i setevykh sistemakh” (UTE0SS–2012) [Proceedings of conference “Control in technical, ergatic, organizational and network systems” (UTE0S–2012)]. St. Petersburg, Concern CSRI Electropribor, JSC, 2012, pp. 1240–1243.
[8] Korobkin V.V., Serogodskiy A.I. Bezopasnost funktsionirovaniya programmnogo obespecheniya v upravlyayuschikh sistemakh na vysokoriskovykh promyshlennykh ob"ektakh [Safe operation of software in control systems on high-risk industrial objects]. Shestaya Vserossiyskaya multikonferentsiya po problemam upravleniya (30 sentyabrya — 5 oktyabrya 2013 g.): Materialy multikonferentsii v 4 t. [6th All-Russian multiconference on control problems (September 30th — October 5th, 2013): Proceedings of the multiconference in 4 vols.]. Taganrog, YuFedU, 2013, pp. 228–232.
[9] Mashoshin A.I. Algoritmy upravleniya integrirovannoy setetsentricheskoy sistemoy podvodnogo nablyudeniya [The computer complex for the spatial distributed surveying system modeling]. Shestaya Vserossiyskaya mul’tikonferentsiya po problemam upravleniya (30 sentyabrya — 5 oktyabrya 2013 g.): Materialy multikonferentsii v 4 t. [6th All-Russian multiconference on control problems (September 30th — October 5th, 2013): Proceedings of the multiconference in 4 vols.]. Taganrog, YuFedU, 2013, pp. 112–116.
[10] Peshekhonov V.G., Braga Yu.A., Mashoshin A.I. Izvestiya YuFU. Tekhnicheskie nauki — Izvestiya SFedU. Engineering Sciences, 2012, no. 3 (128), pp. 219–227.
[11] Zaborovskiy V.S., Guk M.Yu., Mulyukha V.A., Ilyashenko A.S. Nauchno-tekhnicheskie vedomosti SPbGPU. Informatika. Telekommunikatsii. Upravlenie — Scientific and technical Gazette of SPSPU. Informatics. Telecommunications. Management, 2013, vol. 186, no. 6, pp. 17–26.
[12] Dado E., Koenders E.A.B., Carvalho D.B.F. Compos. Their Prop. Hu N., ed. InTech, 2012, pp. 227–244.
[13] Zakharov I.V., Zabuzov V.S., Sokolovskiy A.N., Esaulov K.A. Naukoemkie tehnologii v kosmicheskikh issledovaniyakh Zemli — High tech in Earth space research, 2016, no. S1, pp. 60–65.
[14] Dimitriev Yu.K. Automation and Remote Control, 2012, vol. 73, no. 5, pp. 862–872.
[15] Mamedli E.M., Sobolev N.A. Avtomatika i telemekhanika — Automation and Remote Control, 2000, vol. 61, no. 2, pp. 337–347.
[16] Harchenko V.S. Avtomatika i telemekhanika — Automation and Remote Control, 2000, vol. 61, no. 12, pp. 2081–2094.
[17] Eliseev V.V., Ignatushсhenko V.V., Podshivalova I.Yu. Avtomatika i telemekhanika — Automation and Remote Control, 2007, vol. 68, no. 6, pp. 1083–1099.
[18] Ignatushchenko V.V. A principle of dynamic control of parallel computing processes on the basis of static forecasting. Proc. 10th Int. Conf. Parallel Distributed Comput. Svst. (PDCS’97). New Orleans, USA, Oct. 1997, pp. 593–597.
[19] Eliseev V.V., Ignatushchenko V.V. Problemy upravleniya — Control Sciences, 2006, no. 6, pp. 6–18.
[20] Ignatushchenko V.V., Podshivalova I.Yu. Avtomatika i telemekhanika — Automation and Remote Control, 1997, no. 6, pp. 160–173.
[21] Ignatushchenko V.V., Podshivalova I.Yu. Avtomatika i telemekhanika — Automation and Remote Control, 1999, no. 6, pp. 142–157.
[22] Lobanov A.V. Avtomatika i telemekhanika — Automation and Remote Control, 2009, vol. 70, no.2, pp. 328–343.
[23] Lobanov A.V. Avtomatika i telemekhanika — Automation and Remote Control, 2009, vol. 61, no. 12, pp. 2059–2067.
[24] Karavay M.F. Avtomatika i telemekhanika — Automation and Remote Control, 1996, vol. 57, no. 6, pp. 899–910.
[25] Lobanov A.V., Sirenko V.G. Avtomatika i telemekhanika — Automation and Remote Control, 2000, vol. 61, no. 8, pp. 1390–1396.
[26] GOST R ISO/MJeK 25010—2015. Trebovaniya i otsenka kachestva sistem i programmnogo obespecheniya (SQuaRE). Modeli kachestva sistem i programmnyh produktov [State Standard R ISO/MJeK 25010—2015 Information technology. Systems and software engineering. Systems and software Quality Requirements and Evaluation (SQuaRE). System and software quality models]. Moscow, Standartinform Publ., 2018.
[27] GOST R 56526—2015 Trebovaniya nadezhnosti i bezopasnosti kosmicheskikh sistem, kompleksov i avtomaticheskikh kosmicheskikh apparatov edinichnogo (melkoseriynogo) izgotovleniya s dlitelnymi srokami aktivnogo suschestvovaniya. [State Standard R 56526—2015 Reliability and safety requirements for space systems, complexes and unmanned spacecrafts of unique (small series) production with long life of active operation]. Moscow, Standartinform Publ., 2016.
[28] Boyce R., Griffin D. Studies in Systems, Decision and Control, 2018, vol. 117, pp. 355–364.
[29] Li F., Shi P., Wu L. Fault detection for underactuated manipulators modeled by MJS. Studies in Systems, Decision and Control, 2017, no. 81, pp. 171–193.
[30] Kormushev P., Ahmadzadeh S.R. Studies in Systems, Decision and Control, 2015, vol. 42, pp. 1–28.
[31] Artemenko Y.N., Karpenko A.P., Belonozhko P.P. Studies in Systems, Decision and Control, 2019, vol. 174, pp. 217–227.
[32] Belonozhko P.P. Studies in Systems, Decision and Control, 2019, vol. 174, pp. 287–296.
[33] Burenin A.N., Legkov K.E. Naukoemkie tekhnologii v kosmicheskikh issledovaniyakh Zemli — High tech in Earth space research, 2015, vol. 7, no. 3, pp. 46–61.
[34] Pavlov D.A. Proceedings of the Mozhaisky Military Aerospace Academy. St. Petersburg, Mozhaysky MAA, 2015, no. 649, pp. 37–47.
[35] Fateev V.F., ed. Malye kosmicheskie apparaty informatsionnogo obespecheniya [Small spacecraft for information support]. Moscow, Radiotekhnika, 2010, 320 p.
[36] Sollogub A.V., Skobelev P.O., Simonova Ye.V., Tsarev A.V., Stepanov M.Ye., Zhilyaev A.A. Informatsionno-upravlyayuschie sistemy — Information and Control Systems, 2013, no. 1, pp. 16–26.
[37] Kontseptsiya razvitiya rossiyskoy kosmicheskoy sistemy distantsionnogo zondirovaniya Zemli na period do 2025 goda [Concept for the development of the Russian space system for Earth remote sensing for the period up to 2025]. Moscow, 2006, 77 p.
[38] Serebrenny V., Shereuzhev M. Dependence of Dynamics of Multi-robot System on Control Architecture. In: Kravets A.G., ed. Robotics: Industry 4.0 Issues & New Intelligent Control Paradigms. Studies in Systems, Decision and Control. 2020, vol. 272, pp. 125–132. Springer, Cham. https://doi.org/10.1007/978-3-030-37841-7_10
[39] Krestovnikov K., Cherskikh E., Ronzhin A. Mathematical Model of a Swarm Robotic System with Wireless Bi-directional Energy Transfer. In: A.G. Kravets, ed. Robotics: Industry 4.0 Issues & New Intelligent Control Paradigms, Studies in Systems, Decision and Control. Springer Nature Switzerland AG, 2020, vol. 272, pp. 13–23. https://doi.org/10.1007/978-3-030-37841-7_2
[40] Kernbach S., Kernbach O. Robot. Auton. Syst., 2011, vol. 59 (12), pp. 1090–1101.
[41] Lerman K., Galstyan A., Martinoli A., Ijspeert A. Artif. Life, 2001, vol. 7 (4), pp. 375–393.