SMT-based Approaches to Model Checking of Distributed Broadcast Algorithms: some case studies


Valid XHTML 1.0 Strict
Valid CSS!

  Abstract

The automated, formal verification of distributed algorithms is a crucial, although challenging, task. This kind of algorithms may support coordinated actions of distributed systems in critical applications such as, e.g., industrial plant monitoring through wireless sensors and actuators networks, fleet coordination, intelligent transport applications, or aerospace applications. Hence, guaranteeing certain safety properties of the algorithms is compulsory. The processes executing these algorithms communicate to one another, their actions depend on the messages received, and their number is arbitrary. These characteristics are captured by so called reactive parameterized systems. The task of validating or refuting properties of these systems is daunting, due to the difficulty of limiting the possible evolutions, thus having to deal with genuinely infinite-state systems.

In this work, we study the properties of distributed algorithms solving the reliable broadcast problem in various failure models. In particular, our goal is to analyze the extent to which a direct encoding into Satisfiability Modulo Theories (SMT) problems can be adopted for automated formal verification of these algorithms. To this aim, we first study the properties of these algorithms using Model Checker Modulo Theories (MCMT) [1]. We make our formalizations according to two paradigms: array-based systems and counter specification. The former uses unbounded arrays to represent the states of the processes. The latter uses integer variables as counters to exploit a characteristic of the considered algorithms, which consists in determining the behavior of the processes depending on the number of messages they receive from other processes (threshold-guarded algorithms). The algorithms are modeled according to both paradigms. We report a performance evaluation of the computation with the different models, and (for the counter specification paradigm) compare the results with equivalent modelization in both μZ and nuXmv, obtained by translating our models.

The comparison is interesting because it shows on one hand that state-of-the-art SMT-based technology can handle problems arising in fault-tolerant environments and on the other hand that different heuristics and search strategies (e.g. acceleration versus abstraction/refinement) can still have considerable practical impact.

Follow this link to get the pdf with more details about this work.

  Index

In the next sections (Code section and Results section) there are respectively the files used with MCMT and some experimental statistics. In the Translator section, the tools used to translate the MCMT counter specification models into equivalent models for μZ are available.

schema.png

In [2], three algorithms are proposed to solve the reliable broadcast problem in the presence of crash, send-omission, and general-omission failures. We modeled them with counter specification, in order to validate the following safety property:

Agreement property: If a correct process decides to deliver a message m, then all correct processes decide to deliver m.

We started by formalizing the first algorithm presented in [2], which works correctly with the Stopping-failure model. This algorithm is referenced in the next sections as "1stalgorithm, crash failures" and it is represented as Pr1 in the figure.

Following the iterative strategy used in [2], we have changed the failure model of the environment - from the Stopping-failure model to the Send-Omission failure model - to show that the previous algorithm is not safe in this new failure model. This step is referenced in the next sections as "1stalgorithm, send-omission failures".

In order to overcome the failure of the previous algorithm in this new failure model, the authors of [2] propose an alternative algorithm derived by the previous one by adding the use of negative acknowledgments (nacks) and the choice of the most recently diffused estimate by a successive coordinator. This step is referenced in the next sections as "2nd algorithm, send-omission failures" and it is represented as Pr2 in the figure.

In [2], the 2nd algorithm is shown to not tolerate general-omission failures. We produced a model where the algorithm works in the presence of general-omission failures; in the following, it is indicated as "2nd algorithm, general-omission failures".

The algorithm proposed in [2] for general-omission failures substitutes negative with positive acknowledgments (acks), and uses decided processes as well to help the current coordinator in determining the most recently diffused estimate. We modeled this algorithm, which is indicated in the following with "3rd algorithm, general-omission failures". It is represented as Pr3 in the figure.

The last algorithm we studied is the broadcast primitive proposed in [3], aiming at substituting message authentication obtained via unforgeable signatures. This primitive must guarantee three properties, namely:

For the purposes of modelization, we broke the Relay property into two parts, and expressed all the properties as safety properties. They are represented as the four Pr4 circles in the figure, and are indicated as "byzantine broadcast primitive" in the following.

  Code

In this section, the models developed with the counter specification are supplied, in either a (higher level and more readable) intermediate representation*, or in MCMT language.

Algorithm Intermediate representation MCMT ready
*Read the LICENSE and follow this link to get the compiler for these files, follow this other to get the parser.
1st algorithm, crash failures - [Download]
1st algorithm, send-omission failures - [Download]
2nd algorithm, send-omission failures [Download] [Download]
2nd algorithm, general-omission failures [Download] [Download]
3rd algorithm, general-omission failures [Download] [Download]
byzantine broadcast primitive, unforgeability property [Download] [Download]
byzantine broadcast primitive, correctness property [Download] [Download]
byzantine broadcast primitive, relay property (part I) [Download] [Download]
byzantine broadcast primitive, relay property (part II) [Download] [Download]

  Translators

In order to guarantee a fair comparison amongst various approaches, we have developed translators that take as input a MCMT model, and produce in output an equivalent model for the following tools (the LICENSE is here):

All the translators are coded in Python 3 using this parser and the pyparsing module.

  Results

The tool MCMT comes with a lot of options. The most important options for these experiments are those dealing with acceleration and simplification.
More array-based models can be downloaded here.

Algorithm Result Time (seconds)* Max Depth Nodes File
*Timings have been obtained on an Intel Core i5-2500 @ 3.30 GHz with 8 Gb Sdram running Debian linux. MCMT (v. 2.5.2) has been executed without options; Z3 (v. 4.4.2) with "fixedpoint.pdr.simplify_formulas_pre=true" option; nuXmv (v 1.0.1) with this file used with -source option.
1st algorithm, crash failures, array-based SAFE 0.44 13 113 [Download]
1st algorithm, crash failures, counter specification SAFE 0.38 8 39 [Download]
1st algorithm, crash failures, μZ SAFE 0.66 - - [Download]
1st algorithm, crash failures, nuXmv SAFE 0.41 - - [Download]
1st algorithm, send-omission failures, array-based UNSAFE 17.66 12 464 [Download]
1st algorithm, send-omission failures, counter specification UNSAFE 0.14 6 33 [Download]
1st algorithm, send-omission failures, μZ UNSAFE 0.14 - - [Download]
1st algorithm, send-omission failures, nuXmv UNSAFE 0.16 - - [Download]
2nd algorithm, send-omission failures, array-based SAFE 2317 39 11206 [Download]
2nd algorithm, send-omission failures, counter specification SAFE 77 21 1772 [Download]
2nd algorithm, send-omission failures, μZ SAFE 628 - - [Download]
2nd algorithm, send-omission failures, nuXmv SAFE 19 - - [Download]
2nd algorithm, general-omission failures, counter specification UNSAFE 0.28 8 76 [Download]
2nd algorithm, general-omission failures, μZ UNSAFE 0.26 - - [Download]
2nd algorithm, general-omission failures, nuXmv UNSAFE 0.48 - - [Download]
3rd algorithm, general-omission failures, counter specification SAFE 6308 42 10102 [Download]
3rd algorithm, general-omission failures, μZ SAFE >24h - - [Download]
3rd algorithm, general-omission failures, nuXmv SAFE >24h - - [Download]
3rd algorithm, general-omission failures, wrong treshold N≥2T, counter specification UNSAFE 2523 22 6953 [Download]
3rd algorithm, general-omission failures, wrong treshold N≥2T, μZ UNSAFE 953 - - [Download]
3rd algorithm, general-omission failures, wrong treshold N≥2T, nuXmv UNSAFE 25 - - [Download]
broadcast primitive, byzantine failures, unforgeability property, array specification SAFE 2.07 17 368 [Download]
broadcast primitive, byzantine failures, unforgeability property, counter specification SAFE 0.48 7 51 [Download]
broadcast primitive, byzantine failures, unforgeability property, μZ SAFE 0.20 - - [Download]
broadcast primitive, byzantine failures, unforgeability property, nuXmv SAFE 0.03 - - [Download]
broadcast primitive, byzantine failures, unforgeability property, wrong treshold F≤T+1, counter specification UNSAFE 0.10 4 24 [Download]
broadcast primitive, byzantine failures, unforgeability property, wrong treshold F≤T+1, μZ UNSAFE 0.05 - - [Download]
broadcast primitive, byzantine failures, unforgeability property, wrong treshold F≤T+1, nuXmv UNSAFE 0.14 - - [Download]
broadcast primitive, byzantine failures, correctness property, array specification SAFE 0.03 3 16 [Download]
broadcast primitive, byzantine failures, correctness property, counter specification SAFE 2.55 7 131 [Download]
broadcast primitive, byzantine failures, correctness property, μZ SAFE 2.54 - - [Download]
broadcast primitive, byzantine failures, correctness property, nuXmv SAFE 0.21 - - [Download]
broadcast primitive, byzantine failures, correctness property, wrong treshold F≤T+1, counter specification UNSAFE 0.24 4 37 [Download]
broadcast primitive, byzantine failures, correctness property, wrong treshold F≤T+1, μZ UNSAFE 0.10 - - [Download]
broadcast primitive, byzantine failures, correctness property, wrong treshold F≤T+1, nuXmv UNSAFE 0.30 - - [Download]
broadcast primitive, byzantine failures, relay property (part I), array specification SAFE 0.04 2 26 [Download]
broadcast primitive, byzantine failures, relay property (part I), counter specification SAFE 0.22 6 38 [Download]
broadcast primitive, byzantine failures, relay property (part I), μZ SAFE 0.04 - - [Download]
broadcast primitive, byzantine failures, relay property (part I), nuXmv SAFE 0.08 - - [Download]
broadcast primitive, byzantine failures, relay property (part I), wrong treshold N≥3T, array specification UNSAFE 5.39 8 494 [Download]
broadcast primitive, byzantine failures, relay property (part I), wrong treshold N≥3T, counter specification UNSAFE 0.01 1 2 [Download]
broadcast primitive, byzantine failures, relay property (part I), wrong treshold N≥3T, μZ UNSAFE 0.03 - - [Download]
broadcast primitive, byzantine failures, relay property (part I), wrong treshold N≥3T, nuXmv UNSAFE 0.12 - - [Download]
broadcast primitive, byzantine failures, relay property (part II), array specification SAFE 0.03 4 21 [Download]
broadcast primitive, byzantine failures, relay property (part II), counter specification SAFE 4.97 8 185 [Download]
broadcast primitive, byzantine failures, relay property (part II), μZ SAFE 0.08 - - [Download]
broadcast primitive, byzantine failures, relay property (part II), nuXmv SAFE 0.17 - - [Download]
broadcast primitive, byzantine failures, relay property (part II), wrong treshold F≤T+1, counter specification UNSAFE 0.02 1 2 [Download]
broadcast primitive, byzantine failures, relay property (part II), wrong treshold F≤T+1, μZ UNSAFE 0.05 - - [Download]
broadcast primitive, byzantine failures, relay property (part II), wrong treshold F≤T+1, nuXmv UNSAFE 0.11 - - [Download]

  Bibliography

[1]MCMT web page
[2]T. D. Chandra and S. Toueg.
Time and message efficient reliable broadcasts.
In Proceedings of the 4th international workshop on Distributed algorithms, 289-303, 1991.
[3]T.K. Srikanth, S. Toueg.
Simulating authenticated broadcasts to derive simple fault-tolerant algorithms.
In Distributed Computing,2(2):80-94, 1987.
[4]A. Orsini.
Verifica parametrica tramite tecniche SMT.
Master Thesis, Università degli Studi di Milano, April 2015. (In Italian.)

  Authors