This article is the first in a series that will lay out the core design philosophies and a high-level technical overview of the Radix Labs Hyperscale Alpha research networks consensus protocol.
Hyperscale Alpha is part of the Radix Hyperscale Roadmap that will become the backbone of a system that is not only linearly scalable, capable of processing millions of transactions per second with fast finality, but also provides a strong foundation that can maintain network progress during adverse conditions while supporting a highly decentralized uncapped validator set. And like everything in this industry, Hyperscale Alpha also starts with Bitcoin at its core.
Before we dive in:
This is just the beginning of a series of deep dives I’m excited to share.
To ensure you don’t miss any part of the journey, make sure to get engaged with the Radix and Radix Labs community ecosystem in the following places:
Follow Dan Hughes (Radix Labs Founder) on X
Join the Radix Telegram Channel
Hyperscale Alpha (formerly known as Cassandra) consensus protocol combines principles from Nakamoto consensus, used in Bitcoin, and classical Byzantine Fault Tolerant (BFT) protocols. The mandate is to achieve arguably the holy-grail of consensus: the ability to keep making progress even when network conditions are bad, and strong safety guarantees when the network is working well.
For broader context, in the upcoming Xian phase, Radix will transform into a pre-sharded architecture with a massive state space of 2^256, providing the foundation for Radix's linear scalability approach.
The insight for this approach to work, is the requirement of two consensus protocols which operate together seamlessly.
- Intra-shard Consensus Protocol: This will ensure that consensus is achieved within a single shard.
- Inter/Cross-shard Consensus: This will ensure atomic cross-shard transactions, allowing consensus to be reached across multiple shards simultaneously.
Cassandra will manage the consensus process within a shard (intra-shard) to bring nodes participating in that shard to agreement.
It makes sure that the shard can keep making progress even if the network conditions aren't ideal. At the same time, it provides strong safety guarantees when the network is operating below its failure bound (one-third of participants are faulty or dishonest).
On the other hand, Cerberus will manage the consensus between different shards (inter-shard).
When a transaction needs to happen across multiple shards, Cerberus orchestrates the execution and commitment of transactions across the involved shards to ensure atomicity and consistency across the network.
Simplified, Cassandra ensures agreement within each shard, while Cerberus is like the coordinator and messenger, ensuring all the shards are working together properly.
Now, this is the first in a series of articles about the Hyperscale Alpha consensus protocol that lays out the design principles and technical details on how we are building Hyperscale Alpha.
And, as is said above, “like everything in this industry, Hyperscale Alpha also starts with Bitcoin at its core.”
Why Is Bitcoin Special?
Now when I talk about Bitcoin in the context of Hyperscale Alpha, I am focusing here on the backbone of Bitcoin, i.e., the Nakamoto consensus protocol.
Nakamoto consensus is a revolutionary piece of technology because it transformed the field of distributed systems by solving the long-standing problem of achieving consensus in a decentralized network without relying on a trusted third party.
Before Nakamoto consensus , distributed systems relied on voting-based mechanisms, which were vulnerable to Sybil attacks and so required a central authority to manage who was entitled to vote in the voting process. These types of network are referred to as permissioned or centralized networks.
But in Bitcoin, the Nakamoto Consensus protocol uses proof-of-work (PoW) as both a Sybil resistance and voting mechanism to reach agreement among participants. This is because it is difficult to continually produce proofs of work which have the most “work” for long durations of time.
The way in which Bitcoin participants reach agreement isn't guaranteed to be 100% certain immediately; instead, the agreement becomes stronger over time because the probability that the majority of network participants haven’t seen all valid proofs of work for a particular period in time reduces.
Eventually this probability becomes so low, and the effort required to create a sufficiently strong sequence of proofs-of-work so high, that the historical record of events becomes trustworthy, enabling the creation of a secure, tamper-resistant, and decentralized ledger.
What makes the Nakamoto consensus protocol particularly well-suited for building resilient and highly decentralized distributed ledger platforms is its ability to maintain two critical properties of a blockchain system: weak liveness and probabilistic safety guarantees.
Weak liveness ensures that the network can continue to make progress even in the face of network asynchrony or node failures.
This means that even if a large portion of the network goes offline, the protocol can still operate, though perhaps at reduced efficiency. Ultimately, as long as at least one honest node remains active, the protocol can continue to make progress, albeit more slowly.
On the other hand, probabilistic safety provides increasing assurance over time that confirmed transactions will not be reversed.
These properties, thanks to the ingenuity of proof-of-work and how Satoshi used it, allows Nakamoto based consensus protocols to tolerate a high degree of decentralization, as they do not rely on a fixed set of validators or a leader to coordinate consensus.
Instead, any node can participate in the consensus process by contributing computational power to solve PoW puzzles.
As a result, the Nakamoto Consensus protocol enables the creation of highly decentralized networks that can scale to thousands of nodes without compromising security.
From my perspective, any distributed ledger platform that aims to scale at a planetary level should be able to be resilient enough and inclusive enough like the Nakamoto Consensus protocol, supporting thousands to hundreds of thousands of validators, without compromising on user-oriented features like high throughput and finality, which are more features of leader-based consensus protocols like Practical Byzantine Fault Tolerance (PBFT - more on that later).
Now before we go into details of Hyperscale Alpha, it’s important to understand some basic consensus properties.
Understanding Consensus Properties
Consensus properties are fundamental characteristics that define how a consensus protocol behaves and ensures agreement among participants in a distributed system.
The two primary consensus properties are liveness and safety, which are essential for achieving fault tolerance—i.e., the capacity of a system to operate correctly and reliably even under adverse conditions or in the presence of faults—and consistency in distributed systems.
Let’s have a look at each of them in detail.
A. Liveness
Liveness is a property that guarantees the system will eventually make progress and produce an output. In other words, a consensus protocol with liveness ensures that the network will not become stuck or deadlocked due to many participants leaving, and the remaining participants will eventually reach an agreement.
There are two types of liveness: weak liveness and strong liveness.
1. Weak liveness
Weak liveness is a guarantee that a system will make progress, but it doesn't promise when that progress will happen. In other words, the system will eventually move forward, but the time between events can be unpredictable and unbounded.
It's like saying, "We'll get there eventually, but it might take a while."
Consider Bitcoin's Nakamoto Consensus algorithm. If all miners except one were to suddenly disappear, the remaining miner would still be able to compute proofs-of-work and produce blocks. However, the time between blocks would likely be much longer than the usual 10 minutes because there may have been thousands of miners attempting to produce valid proofs-of-work, but now there is only one.
As more miners rejoin the network, the time it takes to produce blocks will gradually decrease. Eventually, the block production time and thus progress will return to the targeted interval of 10 minutes.
So, even though there might be some slowdowns along the way, the system keeps moving along. It might not be as responsive as we'd like, but it doesn't completely stop and is very resilient.
2. Strong liveness
Strong liveness, as the term already indicates, provides a stronger guarantee than weak liveness by ensuring that the system will make progress within a bounded time interval. This means that the system will not get stuck nor require hours to pass for some progress to happen.
It's like saying, "We'll definitely get this done by next Tuesday."
But here's the catch: in partially synchronous or asynchronous networks, it's actually impossible to achieve strong liveness. This is because of something called the FLP impossibility result.
The FLP impossibility result, named after its authors Fischer, Lynch, and Paterson, is a fundamental theorem in distributed systems.
It is a heavy topic, but worth a read of the paper if you’re that way inclined, but essentially it proves impossibility to achieve consensus in a fully asynchronous system, if even one process can fail.
B. Safety
Safety is a property that ensures the system will never agree to an incorrect output or reach an invalid state.
When we're talking about consensus protocols specifically, safety means two things:
- All the honest participants (the ones following the rules) will come to an agreement on the same value.
- Once a value is committed (meaning it's final), it can't be changed or undone.
Now, there are two kinds of safety: probabilistic safety and deterministic safety.
1. Probabilistic safety
Probabilistic safety provides a high probability that the system will maintain consistency, but there is a non-zero chance of a safety violation.
When we talk about probabilistic safety in Nakamoto Consensus, it means that there's a high chance the system will stay consistent and work as intended, but there's still a tiny possibility that something could go wrong. Events such as double-spend attacks or rewrites of historic transactions are common examples.
The way Nakamoto Consensus achieves this probabilistic safety is through the longest chain rule. In simple terms, the chain with the most proof-of-work put into it is considered the valid chain. Proof-of-work is a way for computers to prove they've done a certain amount of computational work.
As more blocks are added to a chain, it becomes harder and harder for an attacker to create a different chain with more proof-of-work. This is because they would need to redo all the work that's already been done on the valid chain, plus add even more blocks to make their chain longer.
So, with each new block added, the probability of a successful attack at some arbitrary point of time in the past gets smaller and smaller. A transaction performed yesterday is many orders of magnitude harder to double spend, or wipe from the historical record, than a transaction performed an hour ago.
This decreasing probability of a safety violation is what gives growing confidence in the validity of the chain over time. The more blocks we see added to the chain, the more certain we can be that it's the true, correct chain.
2. Deterministic safety
When we talk about deterministic safety in consensus protocols, we're referring to a guarantee that the system will always remain consistent and never produce conflicting results. This is an important property for ensuring the reliability and trustworthiness of the system.
One way to achieve deterministic safety is through leader-based consensus protocols, such as PBFT. In these protocols, a supermajority of participants, typically two-thirds or more, must agree on a value before it can be committed.
This high threshold ensures that there is a strong consensus among the participants and that the network has to be severely compromised with dishonest participants for a safety violation to occur. Even in these cases, once an honest participant has committed a result, it will never revert it, ensuring the system's consistency at least to the point the network was compromised.
The Trade-Offs Between Liveness and Safety
Liveness and safety are often in conflict with each other, and consensus protocols must make trade-offs between these properties based on the specific requirements and assumptions of the system.
In networks where messages can take an unknown amount of time to be delivered (partially synchronous or asynchronous networks), it is not possible to guarantee both strong liveness and deterministic safety simultaneously, as stated by the FLP impossibility result.
Even though Nakamoto Consensus offers both liveness and safety, it prioritizes liveness over safety by allowing the system to make progress even in the presence of network partitions or node failures, albeit at the cost of only providing probabilistic safety.
On the other hand, leader-based protocols like PBFT prioritize safety over liveness, ensuring deterministic safety but sacrificing liveness in the face of network partitions or node failures.
In fact, leader-based protocols only really offer safety guarantees, as they rely on a fixed set of validators and a leader to coordinate consensus. If the leader fails and a new one can not be elected with agreement, especially if the network becomes partitioned, the protocol cannot make progress, resulting in a full loss of liveness causing the network to halt.
This trade-off is inherent to the design of leader-based protocols, which prioritize the consistency and integrity of the system state over the ability to continuously process transactions in adverse network conditions.
In contrast, Nakamoto Consensus continues to operate and make progress even under adverse network conditions, though with weak/probabilistic safety guarantees.
Why Nakamoto Consensus Was Chosen as a Starting Point for Hyperscale Alpha
As you have seen above, Nakamoto consensus is the only option for both safety and liveness guarantees.
This unique combination makes it an ideal starting point for Cassandra - the INTRA-SHARD consensus protocol of Radix Labs Hyperscale - to address the challenges of a decentralized network that can scale to a planetary level with hundreds of thousands of validator nodes.
Nakamoto consensus provides a solid foundation that can be built upon to achieve the specific properties required by Hyperscale Alpha, while maintaining an ability to operate under challenging and adversarial network conditions.
Allow me to outline the specific properties Hyperscale Alpha aims to achieve.
The Core Design Principles of Hyperscale Alpha
The motivation behind Hyperscale Alpha is to combine the best properties of both Nakamoto Consensus and classical BFT consensus mechanisms to create a more robust, more decentralized, and more scalable network.
The specific properties that Hyperscale Alpha requires to achieve this goal are as follows:
- Weak Liveness Guarantee: Hyperscale Alpha needs a weak liveness guarantee to ensure that the network can continue to make progress even in the presence of network asynchrony or node failures. This property allows the protocol to recover from situations where a large portion of the network becomes unavailable or disconnected.
- Deterministic Safety: Hyperscale Alpha requires deterministic safety to ensure that the network will not produce conflicting outputs or reach an invalid state. This property is crucial for maintaining consistency across shards and ensuring the overall system's reliability.
- High Decentralization: Hyperscale Alpha must support a large number of validators (potentially in the thousands to hundreds of thousands) across multiple shards. The protocol must be able to maintain a high degree of decentralization to ensure the network's resilience and security.
- Leaderless Design: A leaderless design is essential for Hyperscale Alpha to avoid various points of failure that can occur in leader-based systems, especially when scaling to a large number of validators.
- Fast Finality: Hyperscale Alpha must exhibit fast finality, ensuring that transactions are confirmed quickly, ideally within a few seconds. This is particularly important for the user-friendliness of applications that will be built on Radix.
Building upon Nakamoto Consensus is the path of least resistance to achieve these properties.
To begin with, Nakamoto Consensus inherently provides a weak liveness guarantee, which is essential for Hyperscale Alpha. Enhancing this existing feature is more straightforward than introducing a liveness guarantee into protocols like PBFT or Hot Stuff which lack it.
Moreover, by starting with Nakamoto consensus, Hyperscale Alpha can maintain the original protocol's probabilistic safety as a fallback mechanism. This means that even if the additional layers that provide deterministic safety fail due to extreme network conditions, the system can still operate with the probabilistic safety guarantees of Nakamoto consensus which are now well known, understood and accepted.
Furthermore, Nakamoto consensus naturally supports a high degree of decentralization, as it allows for an unlimited number of participants to join the network as miners. This property aligns well with Hyperscale Alpha's goal of maintaining a large number of validators across multiple shards.
Lastly, the absence of a designated leader in Nakamoto consensus makes it an ideal starting point for Hyperscale Alpha, as it avoids the need to develop complex mechanisms to ensure a leaderless design.
By using Nakamoto consensus as its foundation, Hyperscale Alpha can build on its strengths while improving safety features to develop a hybrid consensus mechanism. This approach combines high decentralization and adaptability with the deterministic safety and quick finality needed for a large-scale, sharded network.
Now that we have covered the core design principles of Hyperscale Alpha, the next question is how to put these ideas into practice.
The first step is enhancing Nakamoto consensus by introducing deterministic safety.
This involves an implicit voting mechanism and a hybrid proof-of-work-esq/proof-of-stake model.
But that's just the beginning.
In the next article, Hyperscale Alpha - Part II, we'll go deeper into Hyperscale Alpha's unique hybrid consensus protocol, exploring how it achieves fast finality while maintaining both liveness and safety, even under extreme network conditions.