In the last article, Hyperscale Alpha: Part II, we talked about how to design a consensus protocol, rooted in the original Nakamoto consensus, which brings together weak liveness guarantees and deterministic safety.
I also explained in detail how this protocol utilizes a hybrid approach combining proof-of-work (PoW) and proof-of-stake (PoS) sybil resistance mechanisms to solve a number of issues which independently are difficult to find simple solutions for.
But there was still an issue.
As the network grows, there was a risk that block producers would centralize to just a few powerful entities, and the propagation of information around the network, specifically proposed blocks and the votes for them, could cause communication issues and congestion.
It's always a balancing act between trade offs - the more participants you have, the safer and more resilient the system is, but also the slower and computationally intensive the whole process might becomes.
The challenge is finding solutions which allow a large, highly decentralized, uncapped set of validators that can quickly finalize transactions, but efficiently communicate the required information between all required parties without saturating one or more resources.
This is where Cryptographic Sortition offers a perfect solution which can help us to manage and balance all the tradeoffs that come with this hybrid model to achieve our goal.
What is Sortition?
Imagine you have a large group of people who all want to participate in a specific task or role within a system. Depending on the tasks, having everyone involved at the same time can lead to slowness and inefficiency.
For the particular problem we are facing, cryptographic sortition is our solution.
Sortition is a technique used in large systems, physical as well as digital, to select a group of participants for a specific job or role. In our case, the sortition set needs to be selected at random and be difficult to game. We must rely on cryptographic primitives and verifiable random functions (VRFs) to ensure a fair, verifiable, and unpredictable selection process.
The randomness and unpredictability matters because it ensures it is very difficult, or even impossible, for one person to decide who will be chosen or to control the process. At the same time, each participant in the system must have an equal chance of being selected, based on certain criteria which promotes fairness. The selection process must be random yet deterministic, meaning that given the same inputs, it will always produce the same outputs, allowing verification.
In distributed systems like blockchains, the integrity of the system depends heavily on participants being unable to influence or predict who will be selected. Sortition when implemented correctly, allows such a system to maintain a very large pool of potential participants while still maintaining operational efficiency by activating a smaller subset at any given time.
In many cases, sortition can help to enhance the security of the system as it becomes more challenging for potential attackers to target specific nodes or gain de facto control over the network. A constantly changing selection of active participants for particular tasks adds an extra layer of protection against numerous attack vectors..
In our case, sortition is also great for reducing communication overhead in large networks. Instead of everyone being involved in every round of consensus or block production, a subset are responsible for it at any given time. This way, the system can stay fast, responsive and scalable even as ever larger numbers of participants join.
With the basic concepts of sortition defined, let's look at how it can be applied in Hyperscale Alpha.
Sortition in the Context of Hyperscale Alpha
As Hyperscale Alpha scales to support a larger network with potentially thousands of miners and validators, it will run into a problem. At these levels, communicating the proposed blocks from everyone who is mining along with all the votes for all of those competing proposed blocks, message complexity increases quadratically.
It's a challenge that comes with scaling up the network overall. It takes more time for everyone to witness the proposed blocks along with the votes cast for them and reach an agreement of which should be accepted and which are now final. Ultimately this slows down how quickly new blocks are created and agreed, reducing how many transactions can be processed, and in the worst cases bring the network to a halt.
Solving Voting Congestion With Gossip and Math
Thankfully, there are a number of well researched and implemented techniques we can use to mitigate the voting congestion issue.
Firstly, rather than every validator explicitly delivering their vote to all other validators, we can adopt a gossip based delivery system. Gossip delivery systems are present in many distributed systems and are well battle tested and efficient.
Simply, instead of delivering the message to everyone myself, I send the message to a subset of validators. Generally this set of validators are those whom I have open connections with in a peer to peer connected network topology. When those peers receive my message, if they haven’t already seen that information from elsewhere, they process it, then send it to their set of peers whom they are connected to. Those peers then do the same thing, and it repeats until everyone in the network has seen the message.
A neat trick is that I can batch my vote with other votes I may have just witnessed that others have sent to me, into the same message to my connected peers. Doing this batching doesn’t meaningfully reduce the amount of data I send overall, but it does reduce the number of messages I send in total which results in a much higher overall efficiency.
If you want to read more about Gossip delivery systems, this is a good place to start
Secondly, we can reduce the amount of data we have to send over the Gossip system by using cryptographic primitives which allow us to aggregate votes. Imagine I have just witnessed votes from 100 different validators and need to gossip them on to my connected peers.
If each vote consists of a 32 byte hash, which is the block being voted for, a 64 byte signature and a 32 byte public key, which informs who the vote is from, then each vote consumes at least 128 bytes. The message I have to send in the worst case (without other optimizations) is 12800 bytes, to each of my peers.
With vote aggregation it is possible, assuming everyone is voting on the same hash, to reduce that to a single 32 byte hash, 64 byte signature and 32 byte public key, plus a compact representation of who voted. For most purposes the compact representation can be provided via a Bloom Filter which for the case above could reliably carry the information of who has voted in approximately 400 bytes, for a total payload size of approximately 500 bytes versus 12800 bytes.
This approach consumes more compute to process the votes than sending them individually, and there is some additional overhead required to deal with aggregated vote payloads which might contain votes I have already seen in others. Generally though bandwidth is much more scarce than compute, therefore the net effect is positive.
This optimization fails to produce high efficiency gains and bandwidth savings if everyone is voting for something different. In that case you also need to include information of which hash each of the voters is voting for. In our case, without sortition of block producers, everyone could indeed be voting on a different block most of the time until the votes finally converge over an arbitrary period.
With that in mind, lets now move onto how we are going to use sortition of block producers to solve an array of problems and issues.
Here's What You Should Consider Before Implementing Sortition
When looking to implement sortition in a hybrid consensus model like Hyperscale Alpha, there are several important factors to keep in mind.
The goal is to strike a balance between
- Reducing message complexity
- Maintaining decentralization
- Ensuring the network liveness in adverse conditions
Keeping these goals in mind, the first factor to consider is:
- How to select the participants who will propose blocks in each round?
In Hyperscale Alpha, we define two types of proposers, primary and secondary. The primary proposers are determined through the sortition process, and it must be difficult and expensive for any attacker to subvert or corrupt the selection process.
The number of primary proposers per selection is defined as the square root of the total number of possible participants, with a minimum bound of 4 which is needed to tolerate at least 1 dishonest or faulty node and satisfy Byzantine fault tolerance. This selection quantity criteria maintains a good level of overall decentralization, and keeps message complexity manageable.
The second factor to consider is:
- What happens if a large part of the network goes offline?
In a situation where a significant portion (or all) of the primary proposers are unavailable, the network must have a fallback mechanism to maintain liveness.
This is where the secondary proposers take over block proposing, ensuring the network keeps running smoothly, even if a large percentage of it is offline or unavailable.
The third factor is
- How to protect against potential attacks, such as grinding attacks?
Malicious actors with enough resources could try to manipulate the selection process to take control of block production. To prevent this, the sortition mechanism needs to be designed in a way that makes it extremely hard for attackers to predict or influence future validator selections.
This is mitigated by using sortition seeds which are generated from the hashes of previously committed blocks to choose the primary proposers. This approach makes it extremely difficult, challenging and costly for an attacker to manipulate the selection process successfully as we will see.
Now there is a clear overview of the basic considerations let’s take a closer look at how these are addressed and implemented in Hyperscale Alpha.
Sortition in Hyperscale Alpha
Step 1: Generating a Random Seed for Sortition
The first requirement for sortition is to create a random seed that will be used to choose the primary proposers for the next round.
This seed is generated from the hashes of previous proposals, ideally those that have already been committed with a quorum. The quantity of previous proposal hashes used can be adjusted depending on the desired level of security or if proposals are being generated too quickly as part of a difficult adjustment of the proof-of-work.
The larger the number of previous proposal hashes that can be used, the higher the level of security.
To generate the seed, the hashes of the selected previous proposals are concatenated to form a new hash value. The concatenation process must be ordered from the most recent to the oldest to prevent caching of a partial result. This concatenated hash then serves as the seed for the sortition process.
It's also important to ensure that at least one of the proposals used to create the seed has reached a quorum. This means that enough validators have agreed on and committed these proposals to the blockchain.
In cases where the network is in a state where safety is probabilistic and new quorums cant be formed, there may be many proposals generated during this period which are at risk of being undone. Using at least one committed proposal hash in this case still ensures that the seed is derived from a deterministically safe point in the blockchain's history.
This process of deriving a seed keeps the sortition process both random and secure.
Step 2: Selecting Primary Proposers
Once the random seed is generated from the hashes of previously committed proposals, the next step is to use this seed to select the primary proposers for the upcoming round.
As stated, we want to select a small group of proposers from the total set who will be responsible for creating and proposing blocks and is defined as the square root of the total number of participants.
For example, if there are 1,000 possible block proposers, about 33 will be chosen as primary proposers.
To choose the primary proposers, the algorithm calculates the numerical distance between the generated seed and the public key addresses of all possible proposers in the set. This distance is determined using XOR sorting. The XOR sorting gives a measure of how similar or different the seed and the public key address are at the bit level.
After calculating these distances, the proposers are sorted based on how close they are to the seed. Those with the smallest distances, meaning they are closest to the seed, are selected as the primary proposers for the next round.
In our example with 1,000 total possible proposers, the top 33 validators closest to the seed would be selected.
This process doesn’t require any extra communication among those in the network to decide who will be the primary proposers, as all the information required should be already known by all participants unless they are faulty. Furthermore, because this process is deterministic, everyone will arrive at the same result given they know all of the required input information.
Once selected, these primary proposers are responsible for creating and proposing blocks for the next round. They will perform the necessary proof-of-work to generate valid blocks and then broadcast them to the network.
Once a supermajority of the primary proposers, 23 out of 33 or more, propose blocks within a set time window, the round proceeds and the process repeats.
Step 3: Defining Secondary Proposers
With this method, any participants not chosen as primary proposers automatically become secondary proposers.
For example, if there are 1,000 total participants and 33 are selected as primary proposers, the remaining 967 will be secondary proposers. Everyone has a critical role to play in the consensus process, either as a primary or secondary proposer.
The timeout window for the primary proposers to reach a supermajority of proposals generated is based on the average time it takes for proposals to be generated and spread across the network. For instance, if it usually takes about 5 seconds for 23 of the 33 primary proposers to propose their blocks, the secondary proposers will wait for that 5-second window to elapse before doing anything.
If the secondary proposers don’t see a supermajority of proposals from the primary proposers within that time, they will start producing their own proposals and broadcasting them to the network on the assumption that a critical quantity of the primary proposers are not available. This keeps the consensus process active and responsive, maintaining liveness, even if there are network issues and failures.
This assumption can be carried further as the primary proposers are selected at random via the random seed. If consistently less than 23 of the 33 primaries produce proposals over a given period, it can be assumed that a significant quantity of total proposers, and subsequently validators, are currently unavailable or facing issues. As time progresses this assumption becomes stronger, serving as a signal of overall network health.
Secondary proposers are a failsafe to liveness preservation allowing the efficiency gains of a smaller set of proposers to be taken advantage of, yet balancing that with the resilience of allowing the entire set to participate when necessary.
Let’s now consider a scenario where an adversary tries to attack the system by manipulating the primary proposer set.
Primary Manipulation Attack
Assume there is an adversary who wants to manipulate the sortition process such that they are always in the primary proposer set. If they are successful, they can potentially control the block production process, censor transactions or mount an attack on the network with a higher chance of success.
Also Imagine the adversary has enough computing power to consistently create the strongest block proposals for any primary set they are in. Their challenge then is to use the compute they have to grind the seed used for selecting the primary proposers in such a way that they are always part of the selection.
As I explained in the earlier section, in Hyperscale Alpha, the seed for the next round’s sortition comes from the hashes of previous proposals, many of which have already reached a quorum. Let's say the seed is taken from the last committed proposals N minus 100.
The adversary starts by grinding the hash for the N+1 proposal, trying to ensure it will put them in a favorable position for selection in the sortition set. But they also have to think ahead to the sortition sets for rounds N+2, N+3, and so on, up to N+100.
As they work, they find a N+1 proposal that selects them in the primary sortition set for N+2. Unfortunately, after generating N+2, it doesn’t select them for N+3. So, they discard the N+1 proposal and start again.
The adversary finally finds a sequence of proposals from N+1 to N+100 that keeps them in the sortition set for each round. The number of attempts required to produce this sequence is insane, I was going to include it, but it might be a fun task for someone else to do and I’ll edit the result next week!
Because the required number of attempts is so high, it takes time, and by that point the rest of the network has already moved from round N to round N+3’. Round N+3’ is of course a proposal the adversary didn’t produce.
The adversary’s proposal sequence is no longer relevant because the network has progressed. They need to start over from the current round N+3’.
This exponentially increases the compute requirements, making it extremely difficult and expensive for the adversary to produce any useful sequence of proposals so they can influence the current state of the network.
The honest participants simply don’t care if they are included in the next primary proposer set or not. Eventually thanks to the randomness and fairness they will be included in another. For them, the compute requirements are minimal because they only need to produce a proof-of-work if they are in the primary proposer set, or if the timeout window for the primary proposers elapses.
For adversaries, all means of completely controlling the network requires them to be included in the primary proposers as often as possible, which can only be achieved by generating a sequence of proposals in the above manner and performing a proof-of-work for an endless number of iterations.
Just One Last Thing
The eagle eye’d amongst you might have noticed a slight problem with the above. You’d be right, there is an angle of attack where the adversary can increase their chances of being included in the primary proposer set.
The adversary can simply create many public keys, generate a few proposals with them so they are known within the network as participants, then test not one against the primary set derived from the proposals they are generating in a sequence, but many of them!
If of the 1000 total participants, the adversary can masquerade as 500 of them cheaply, then their chances of producing a sequence of proposals which includes just one of them in the primary set increases dramatically. This reduces the total amount of compute required to produce the sequence to manageable levels and reduces the time required to do it.
As you’ve probably realized by now, the use of both proof-of-work and proof-of-stake has allowed us to solve many problems with simple solutions and elegance. This one is no different, and proof-of-stake is to our rescue.
Until now, the proof-of-work each block producer is required to generate has been of the same difficulty. That is, the amount of compute required to produce a valid one has been the same for everyone, but we can incorporate proof-of-stake into this difficulty requirement so that it is verifiably different for everyone.
Lets assume that the current baseline proof-of-work difficulty equates to 20 leading zero bits. This means that there is approximately a 1 in 1,000,000 chance that any proof-of-work you generate will meet that requirement. Crudely, if you generate about a million proofs-of-work, you’ll discover one that meets that requirement.
Now let us assume that you have 1000 tokens staked and proof of it is associated with an identity. This identity is known to the network as participating as a block producer or validator. A quantity of 1000 can be represented in binary in approximately 10 bits, so a reduction of 10 is applied to the proof-of-work difficulty requirement for that particular producer.
The chance of any proof-of-work you generate now has a 1 in 1000 chance of meeting the reduced difficulty requirement of 10 leading zero bits. The compute required for generation is now 1/1000th of what it would have been if just the baseline was applied.
The example is very simple and skirts around a bunch of detail, but the fundamental principle is that stake can be used to reduce compute requirements when producing proposals.
How then does this help with our problem?
If participants have some value in tokens they do not intend to use for other things, they can stake them to the network. This allows them to act as validators, voting on proposed blocks, receiving rewards and securing the network via economic security and deterministic safety.
The stake also can be used to reduce compute requirements to generate proposals, improving the efficiency of proposal generation and rewarding those who have significant amounts staked with further reduced compute requirements.
Imagine then that proposing participants who may be in the primary set have on average 1000 tokens staked. That means that on average, most of the primary set will have a difficulty requirement of approximately 10 leading bits.
If the adversary simply generates a lot of identities for proposal generation but doesn’t stake anything with those identities, then they will be proposing with the difficulty requirement of 20 leading bits. 1000 times more difficult than the average.
There are two options available to them, either purchase more compute to make up the deficit, or stake tokens across all of the identities they are using. Both options incur additional costs which are not insignificant.
Furthermore, if they decide to stake to their many identities, there manifests a stake management requirement they need to maintain to extract maximum efficiency from the distribution of their stake.
Too many identities with insufficient stake, others win the proposal generation because they have a stronger proof-of-work and a greater difficulty reduction. Too few identities with too much stake, none of their identities are included in a primary proposer set and they have to recompute part of the sequence.
Other strategies can be applied to the primary selection process to further restrict the efforts of an adversary with many identities, such as omitting identities which have been part of the primary proposer set within a number of past proposals.
Closing Thoughts
And there we have it, a somewhat digestible, not too technical dive into some of the inner workings of Hyperscale Alpha, specifically regarding the Cassandra consensus.
There are many elements of Hyperscale Alpha that we have yet to cover which will feature in future series of content.
For now, I hope you enjoyed the read, gained some insight into what we work on over here at Radix Labs every day and the interesting challenges and problems we look to solve.
If you have any questions, make sure to join the Radix telegram channel - and even if you don’t make sure to follow on X to not miss any updates!
Follow Dan Hughes (Radix Labs Founder) on X