Issues of organizing computations in multicomputer systems with the software-controlled failure- and fault-tolerance. Part II
Authors: Asharina I.V.
Published in issue: #7(115)/2021
DOI: 10.18698/2308-6033-2021-7-2097
Category: Aviation and Rocket-Space Engineering | Chapter: Innovation Technologies of Aerospace Engineering
This three-part paper analyzes existing approaches and methods of organizing failure- and fault-tolerant computing in distributed multicomputer systems (DMCS), identifies and provides rationale for a list of issues to be solved. We review the application areas of failure- and fault- tolerant control systems for complex network and distributed objects. The second part further investigates the issues of organizing failure- and fault- tolerance in the DMCS. The systemic, functional, and test diagnostics are viewed as the basis for building unattended failure- and fault-tolerant systems. We introduce the concept of self-managed degradation (when the DMCS eventually proceeds to a safe shutdown at a critical level of degradation) as a means to increase the DMCS active life.
References
[1] Asharina I.V. Inzhenerny zhurnal: nauka i innovatsii — Engineering Journal: Science and Innovation, 2021, iss. 6. DOI: 10.18698/2308-6033-2021-6-2088
[2] Dimitriev Yu.K. Avtomatika i telemekhanika — Automation and Remote Control, 2007, vol. 68, no. 3, pp. 545−556.
[3] Toffoli T., Margolus N. Cellular Automata Machines. MIT Press, 1987, 276 p. [In Russ.: Toffoli T., Margolus N. Mashiny kletochnykh avtomatov. Moscow, Mir Publ., 1991, 280 p.].
[4] Dimitriev Yu.K. Avtomatika i telemekhanika — Automation and Remote Control, 2012, vol. 73, no. 5, pp. 862−872.
[5] Dimitriev Yu.K. Avtomatika i telemekhanika — Automation and Remote Control, 2015, vol. 76, no. 7, pp. 1260−1270.
[6] Dimitriev Yu.K. Avtomatika i telemekhanika — Automation and Remote Control, 2016, vol. 77, no. 6, pp. 1060−1070.
[7] Preparata F.P., Metze G., Chien R.J. On Connection Assignment Problem of Diagnosable Systems. IEEE Trans. El. Comput., 1967, vol. EC-16, no. 12, pp. 848–854.
[8] Parkhomenko P.P., Sogomonian E.S. Osnovy tekhnicheskoi diagnostiki Optimizatsiia algoritmov diagnostirovaniia, apparaturnye sredstva [Fundamentals of technical diagnostics Optimization of diagnostic algorithms, hardware]. Moscow, Energiya Publ., 1981, 319 p.
[9] Barsi F., Grandoni F., Maestrini P. Theory of Diagnosability of Digital Systems. IEEE Trans. Comput, 1976, vol. C-25, no. 6, pp. 585–593.
[10] Chwa K.Y., Hakimi S.L. Schemes for Fault Tolerant Computing — A Comparison of Modularly Redundant and t-diagnosable Systems. Inf. Control., 1981, vol. 49, no. 3, pp. 212–238.
[11] Malek M. A Comparison Connection Assignment for Diagnosis of Multiprocessor System. Proc. 7th Int. Symp. Comput. Archit., La Baule, USA, May 6–8, 1980. New York, Association for Computing Machinery, 1980, pp. 31–35.
[12] Kavianpour A., Friedman A. A different diagnostic models for multiprocessor systems. Proc. Information processing 80 by IFIP Congress 80. Tokyo, Japan, October 6–9, 1980 and Melbourne, Australia, October 14–17, 1980. North-Holland, IFIP, 1980, pp. 157–162.
[13] Dimitriev Yu.K. Avtomatika i telemekhanika — Automation and Remote Control, 2012, vol. 73, no. 5, pp. 862−872.
[14] Sirenko V.G. Avtomatika i telemekhanika — Automation and Remote Control, 2005, vol. 66, no. 11, pp. 1824−1840.
[15] Karibskii V.V., Parkhomenko P.P., Sogomonian Ye.S. Osnovy tekhnicheskoy diagnostiki. Modeli obyektov, metody i algoritmy diagnoza [Fundamentals of technical diagnostics. Object models, methods and algorithms for diagnosis]. Moscow, Energiya Publ., 1976, 464 p.
[16] Lala J.H., Alger L.S., Gauthier R.J., Dzwonczyk M.J. A fault tolerant processor to meet rigorous failure requirements. AIAA/IEEE Digital Avionics Systems Conf., October 13–16, 1986, Fort Worth, TX, USA. Fort Worth, IEEE, 1986, pp. 555–562.
[17] Mamedli E.M., Samedov R.Ya., Sobolev N.A. Avtomatika i telemekhanika — Automation and Remote Control, 1992, vol.53, no. 5, pp. 734−744.
[18] Lobanov A.V. Avtomatika i telemekhanika — Automation and Remote Control, 1999, vol. 60, no. 1, pp. 127−131.
[19] Lobanov A.V. Avtomatika i telemekhanika — Automation and Remote Control, 1998, vol. 59, no. 1, pp. 129−135.
[20] Pease M., Shostak R., Lamport L. Reaching agreement in the presence of faults. J. ACM., 1980, vol. 27, no. 2, pp. 228–234.
[21] Rennels D. Fault-tolerant computing-concepts and examples. IEEE Tr. Comp., 1984, vol. C-32, no. 12, pp. 1116–1129.
[22] Lobanov A.V., Sirenko V.G. Avtomatika i telemekhanika — Automation and Remote Control, 2000, vol. 61, no. 8, pp. 1390−1396.
[23] Mikeladze M.A. Avtomatika i telemekhanika — Automation and Remote Control, 1995, vol. 56, no. 5, pp. 611−623.
[24] Kuhl J.G., Reddy S.M. Fault-Diagnosis in Fully Distributed Systems. Proc. 11 th Int. Symp. Fault-Tolerant Computing, June 24–26 1981, Portland, Maine. New York, IEEE, 1981, pp. 100−105.
[25] Liaw C.C., Maliya Y.K., Su S.Y.H. Self-Diagnosis in Nonhomogeneous Distributed Systems. Proc. of the 12th Int. Symp. Fault-Tolerant Computing. June 15–18, 1982, Los Angeles, CA. New York, IEEE, 1982, pp. 223–233.
[26] Geninson B.A., Pankova L.A., Trakhtengerts E.A. Avtomatika i telemekhanika — Automation and Remote Control, 1989, vol. 50, no. 5, pp. 579−590.
[27] Lobanov A.V. Avtomatika i telemekhanika — Automation and Remote Control, 2000, vol. 61, no. 12, pp. 2059−2067.
[28] Grishin V.Yu., Lobanov A.V., Sirenko V.G. Avtomatika i telemekhanika — Automation and Remote Control, 2002, vol. 63, no. 1, pp. 139−144.
[29] Grishin V.Yu., Lobanov A.V., Sirenko V.G. Avtomatika i telemekhanika —Automation and Remote Control, 2005, vol. 66, no. 2, pp. 304−312.