Thinking About Consensus II: Types of Consensus Algorithms

Now that we've gone over a brief history of blockchain and cryptocurrency, and how that relates to consensus, let's dive in deeper.

The Byzantine Generals' problem is a good place to start thinking about how to solve consensus problems in environments where trust can break down between nodes. Thinking about these problems leads us to consider not only the nature of consensus, but also the number of nodes needed to reach consensus.

We'll then go over:

Convergence with JavaScript
Types of Consensus Algorithms

before moving to the third post in this series (which covers consensus for smart contracts and EVMs).

Byzantine Generals' Problem

In the paper by Lamport, Shostak, and Pease (SRI) 1982, we are asked to imagine a group of generals planning an attack on an a city. In the scenario, generals may only communicate by messenger. Lamport, et al. define oral messages and text messages. Traitors are generals, messengers, and lower level officers who alter messages. Perhaps there can be forgetful messengers, but they can be regarded as traitors.

Lamport, et al. argue that if the messages are oral (meaning that they cannot be encrypted) and there are m traitors among the generals, then there has to be 3m + 1 generals to reach a consensus. They make their proof by assuming that 3m generals agree even if there is one traitor among them. But, they consider the case when m = 1. In that case the interactive consistency conditions, which they define, must be violated. They then prove that m can be greater than one by aligning groups of 3 generals into nodes, each of which can be considered a higher order general. From there, they replicate the case for 3 generals.

So, we have our first requirement for the a number of nodes in a consensus scheme. 3n + 1 nodes for data point agreement. But, we can do better than that.

Lamport, et al. then go further and introduce an algorithm in which each general may or may not be a traitor, while the lieutenants faithfully record the message they get from all the other lieutenants in the network. In this algorithm, each message starting with a general is signed and passed to lieutenants. Each lieutenant then copies the message and appends his signature to the message and sends it to all the other lieutenants.

Sounds a little like blockchain, right? The algorithm they introduce allows lieutenants to collect all the values they receive through from all the messages coming in. If they collect just one value, then they know the generals are trustworthy and that they can use the value.

The signed message algorithm allows m lieutenants to make a safe and sane decision even if all m generals are traitors. So, this is a network with m message creators and m message validators. That means there are 2m nodes in the system, but they have slightly different functions.

But, in some sense we have progressed from a requirement of 67% trustworthy nodes to a requirement of 50% trustworthy nodes. The paper goes on to talk about more configurations in the same Byzantine General's setup.

For this article, the point is making progress. So, is there better?

So far, we have some ideas of messages being passed around. But, what if we start making requirements about messages all being passed at the same time, synchronously, or at random times, asynchronously? Does that have anything to do with how many processors are required to be honest?

Fischer-Lynch-Paterson (FLP) impossibility tells us that no agreements can be reached in a completely asynchronous network of nodes if even a single node crashes. At the end of their article, they provide a theorem that says a partially correct consensus protocol can be achieved provided no nodes crash. However, they are using a very strict sense of asynchrony. That is, a node can take an infinite amount of time to respond to a message.

But, viable systems can be created if certain conditions on synchrony can be adjusted. One way to make an adjustment is to enforce a partial-synchrony. See Dolev, Dwork, and Stockmeyer 1987. Partial-synchrony simply puts an upper limit on the amount of time for a node to respond. But, in order for a partially synchronous network to be resilient, the consensus algorithms are required to broadcast messages to as least as many processors as the number of faulty processors they are trying to avoid.

Practical Byzantine Fault Tolerance

In 1999, the first algorithm that can be proven to reach consensus in an asynchronous network was presented by Castro and Liskov. This form of consensus is known as the Practical Byzantine Fault Tolerance (PBFT). Practical Byzantine Fault Tolerance assures a liveness in the network with at least 67% trustworthy nodes, but also works in an asynchronous context.

So, there is a way to work within asynchronous networks. But, the number of processors to be trusted can be fewer. And, the only ones that seem to get down to fifty percent of processors use wormholes. Wormholes are trusted components that cannot fail in a Byzantine way. Here's two versions of wormhole use,
Wormholes 2010 and Wormholes 2007. The first makes use of a trusted timely computing base. The second uses append only memory.

A good place to find out more about limits on consensus algorithms is this 2011 survey.

What can we say about Bitcoin, which only needs 50% trustworthy nodes? Can we call it asynchronous? Perhaps it is employing a wormhole of sorts. With the paper by Chun et al., the A2M wormhole, we can take information from the abstract:

"We propose Attested Append-Only Memory (A2M), a trusted system facility that is small, easy to implement and easy to verify formally. A2M provides the programming abstraction of a trusted log, which leads to protocol designs immune to equivocation — the ability of a faulty host to lie in different ways to different clients or servers — which is a common source of Byzantine headaches."

Well, that sounds to me like a blockchain ledger — the kind Bitcoin uses.

The Cost of Bad Behavior: 51% Attack

We now know how many computers (as a percentage of nodes) are needed to launch an attack on a blockchain. And, you also know that the choice of consensus algorithm figures into the determination of that percentage. If you want to attack the system and get around its consensus, in the case of Bitcoin, you will need 51% of the nodes in the network working on your behalf in order to start rewriting history.

But, just having 51% of the node is not enough. You have to have computing power to recreate the blockchain starting from a point in time where you put your double spending target transaction. In many blockchain implementations you have to outpace the nicely behaving nodes by some significant factor - say two to one. In the case of Bitcoin, the only way to rewrite history is to have all your many attack nodes solve all the puzzles for each block that you have created. There is no shortcut. That means that besides buying the processing power, you have to spend on electricity as well.

You might get that that will cost a lot of money - not for the casual attacker. Not only that, you are not guaranteed to succeed. You only have a chance, one you wouldn't have if you did not have 51% of all the nodes.

You might wonder what the cost is for attempting. Well, there is a website that claims to be able to provide the current cost of doing so for a number of blockchain offerings, called Crypto51. Take a look. It says (at the time of this writing) that the cost of attacking Bitcoin is approaching $0.5M USD per hour. So, I hope your attack doesn't take several days, which might be required in order to avoid getting caught.

Convergence to Security

In Section 11 of Satoshi's paper, there is a small program in the C programming language that can be used to see how the setup of probabilities for honest nodes v.s. attackers can determine a convergence to zero for the probability of creating a dominant side chain in Bitcoin.

Satoshi Nakamoto defines the following probabilities in the context of a binomial random walk:

p = probability an honest node finds the next block
q = probability the attacker finds the next block

The probability of the attacker catching up from z blocks behind is 1.0 if p < q (the house always wins with a bigger bank). Otherwise, the attacker loses as a power of the ratio of the two probabilities: (q/p)^z.

For the Poisson series formation, that's also mentioned in section 11.

Convergence with JavaScript

Below is a little Node.js application that you can play with to different probabilities to see this convergence. This is basically the same program as given by Satoshi Nakamoto, except that is it in JavaScript and you can just run it with Node.js.

In the code below if you set gQ = 0.49, you will see that the value of P gets smaller. But, the loop only goes up to 100, so you can see that it will take a very high z value to get convergence. If you set gQ over 0.5, say at 0.51, you will see that P starts growing, which ensures us that the attacker wins the game. The global variable gQ is used to specify the probability the attacker finds the next block.

// This is the Nakamoto calculation rendered in JavaScript.
// It is called below.

function attackerSuccessProbability(q,z) {
    //
    var p = 1.0 - q;
    var lambda = z * (q / p);
    var sum = 1.0;
    var i, k;
    for (k = 0; k <= z; k++) {
        var poisson = Math.exp(-lambda);
        for ( i = 1; i <= k; i++ ) {
            poisson *= lambda / i;
        }
        sum -= poisson * (1 - Math.pow(q / p, z - k));
    }
    //
    return sum;
}


// Format the data and try different values.

var gQ = 0.1;
console.log("Q = " + gQ);

var z_hat = 0;
for ( z_hat = 0; z_hat < 100; z_hat++ ) {
    var P = attackerSuccessProbability(gQ,z_hat);
    console.log("z: " + z_hat  + "  Q: " + gQ + " -> P: " + P);
    if ( P < 0.00005 ) break;
}


var gQ = 0.3;
console.log("\nQ = " + gQ);
var z_hat = 0;
for ( z_hat = 0; z_hat < 100; z_hat++ ) {
    var P = attackerSuccessProbability(gQ,z_hat);
    console.log("z: " + z_hat  + "  Q: " + gQ + " -> P: " + P);
    if ( P < 0.00005 ) break;
}

Types of Consensus Algorithms

Now that we have gone over the Bitcoin consensus process, you might be curious about what other types of consensus algorithms there are.

In order to help you appreciate the "growing zoo" of consensus algorithms, we'll give a brief rundown of algorithms. Please note that the list that follows is very likely not exhaustive. We have about twenty consensus algorithms listed, including Proof of Work.

For the sake of brevity, only one or two sentences will be given as a description. Some, but not all, will have external links to more information about the particular algorithm.

PoW: Proof of Work

Mining rewards are given to the first node to spend computing time solving a cryptographic puzzle. The most common scheme is for the miner to search for a block hash. Mining rewards are given to the first node that finds a block hash that is shorter than a required size. All nodes receive all transactions. All miners participate in the consensus process to extend the ledger. Bitcoin introduced this method and uses it — see the white paper for more info.

PoS: Proof of Stake

Priority is given to miners who have purchased the most stake in the blockchain system. See a definition here of PoS.

DPoS: Delegated Proof of Stake

This form of consensus mirrors the election of members in governing bodies. Witnesses, those who validate transactions, are elected. That is, the one with the most votes wins the mining privilege. In this scheme, those who are in a pool of candidates risk reputation and income for bad behavior. Here is a simple explanation of DPoS. Here are some users of this form of consensus: Lisk, EOS, Steem, and BitShares. Read this for a more detailed description: Delegated Proof of Stake Consensus

PoET: Proof of Elapsed Time

Whichever node reaches the end of a random wait period first has mining priority - without expending CPU energy. In this scheme, used by Hyperledger's Sawtooth, special trusted cores on the same chip as a miner's CPU, produce the wait period. The cores are supposed to be tamper-proof, however they were hacked by a research team at Austria's Graz University of Technology. More attacks have been reported, such as the speculative execution attack. Of course, the vendors developed counter measures — here is Sawtooth's documentation on PoET.

PBFT: Practical Byzantine Fault Tolerance

PBFT is often referred to as the traditional fault tolerance algorithm. It operates by state machine replication, with the state machines providing the examination and evaluation of data points. Each node publishes the result of running the state machine and broadcasts it to the network. A voting round determines the acceptance of data points. Some refer to Ripple and Stellar as modern variants of PBFT.

DBFT: Democratic Byzantine Fault Tolerance

Some would say that Tendermint is a democratic version of PBFT. Tendermint introduces voting at more stages than PBFT and is more geared to blockchain management. It utilizes a gossip protocol]. Find out more about Tendermint here.

PoAct: Proof of Activity

A hybrid of PoW and PoS. In this scheme, a PoW winner publishes his block, but then, a randomly selected set of validators prioritized by their stake sign the block. After all the validators have signed, the block is added to the blockchain. Here's a further description of PoAct. This is different than PoA (Proof of Authority, see below.), in which consensus is merely about voting.

Casper FFG and CBC

Ethereum is transitioning from PoW to PoS. As part of the transition, both schemes will be used for a while. Here is a useful discussion of these flavors: Casper FFG and CBC.

MARPLE

This is a synchronous, no leader protocol. The following is a quote from the Gorbyte introduction.

The kernel of these mechanisms is the MARPLE protocol: a Majority Agreement Recursive Protocol based on Logical Environs. For a deeper look into MARPLE, here is a technical discussion.

Raft

Raft is derived from Paxos. But, it touts itself as simpler to understand. It uses state machine replication and leader election. Here is the technical description and here is more information about the Raft Consensus Algorithm (it has a nice animation to help visualize Raft).

JP Morgan's Quorum-Chain Voting

This is a simple majority voting scheme where those who may vote are granted the right to do so. Those privileged nodes also may grant rights. Here is a discussion of this scheme.

PoM: Proof of Mass (Proof-of-Weight)

I decided to call this Proof of Mass since PoW is already taken. But, you will find Proof-of-Weight described in more than one place. Basically, this is very much like PoS, but instead of buying in, a user only needs to have a higher account balance to be preferred in mining. This is almost like a reward for savings. Filecoin falls into this category by equating used storage to the amount. This is not incentivized. more information about this form of consensus can be found here.

PoAI: Proof of Artificial Intelligence or Proof of Intelligence

In this consensus scheme, a node must train a neural network and use it to solve puzzles provided by the network in order to gain the right to propose the validity of transactions. Here is a detailed use case which includes PoAI from Aion.

PoB: Proof of Burn

This is the consensus mechanism most like a slot machine. Whoever throws away the most money wins. Find out more about Proof of Burn.

PoA: Proof of Authority

Taking a quote from this Medium article, PoA may be briefly explained as such: Proof of Authority (PoA) is a modified form of Proof of Stake (PoS) where instead of stake with the monetary value, a validator’s identity performs the role of stake.

Here are a few examples where the basic PoA stance is taken further:

Reputation Mining Colony
Proof of Ranking is a kind a Proof of Authority. Rublix uses a set of features to rank a trader. You can read more about trader ranking.

PoSV: Proof-of-Stake Voting

Like PoS, this requires a buy in. In the Tomo system, the currency is known as "TOMO" and 50K of them are required to become a master node. Members of the community for the MasterNodes, who then do the mining. Refer to the Tomo white paper for more information.

PoC: Proof of Capacity (Proof of Space)

Here is a description on Wikipedia about PoC. Burst uses PoC by constructing plot files, which retain a number of slots for holding information about transactions. Burst has a wiki with more explanation.

PoI: Proof of Importance

This is about the amount of transaction processing that a node does. More can be found here: XEM

PoD: Proof of Devotion

PoD contrasts itself with PoI. It selects validators based on their influence in the ecology of the blockchain. To gain a deeper understanding of this, read the Nebulas white paper.

IOTA: Direct Acyclic Graphs (DAGs)

The Tangle is a DAG that can be extended on many fronts and then be pulled together to get fairly highly reliable consensus. The more terminals of the consensus that there are, the more security. The design is for high speed small transactions. Check it out here: IOTA. If you are interested in swirlds v.s. tangles, check out Hashgraph, another DAG type of consensus for private distributed ledgers: Hashgraph.

Now, you should have a better idea of what consensus is, starting with the first algorithm to reach a consensus in an async network, PBFT. From there, we went over an example to see how convergence works with Node.js, based off of a program that Satoshi Nakamoto wrote. Lastly, we saw how different protocols are able to create various consensus algorithms according to their individual needs.

Next up is our last section on consensus, and how it relates to smart contracts and EVMs. Stay tuned!

Last updated on Nov 22, 2018

Engineering & Technical Insights