Radix Technical AMA with Founder Dan Hughes – 16th February 2021 | The Radix Blog

Radix Technical AMA with Founder Dan Hughes – 16th February 2021

February 26, 2021

Every two weeks, Radix DLT’s Founder Dan Hughes hosts a technical ask-me-anything (AMA) session on the main Radix Telegram Channel. There are always great discussions around Cerberus, Radix’s next-generation consensus mechanism, general design approaches to the network, and of course general industry questions.

Thank you to everyone who submitted questions and joined our AMA. You can find the full transcript below from the AMA sessions on February 16th 2021.

Is Cassandra planned to be open source? Or is it already? Will "trusted" developers be able to access it to suggest some improvements to its feature set?

Yes and already "kinda" is.

The source code for the flexathon is up on Github, but when it became apparent there was an appetite for the Flexathon to be more (from a community, an internal Radix and personal perspectives) I stopped maintaining that code base.

Reason being that for Cassandra to be real worth the time, code base from the Flexathon just wasn't good enough to continue with to produce a long term outward facing research programme. So I've basically started from scratch around Mid-January.

A lot of what was the flex has been copy pasted, or redone better and of course lots of new stuff too. Doing all that on a private repo at the moment which I'll push to Github once I have a "stable" build that does all the things we need for a solid first Twitter demo.

The codebase that is Cassandra is primarily for me, to test things, and to showcase some stuff. I'm not going to support anything external for it outside of a priority need. So if people want to build against it to mess around, great. But I won't be spending my days doing support tickets for things folks build on it.

That said, eyes on the code is always valuable, so if anyone spots anything that's an obvious mistake, I will review pull requests etc.

Cross-shard transaction complexity (Cassandra). In one of your tweets you made following quote:

"At its simplest a transaction sending funds from a to b will touch 3 shards. A tweet with no mentions, hashtag etc touches 6... Each mention or hashtag potentially adds another shard. A retweet is 7 min as is a reply. A tweet is at LEAST twice as complex."

Is it possible to break these (twitter) transactions down more?

Why does a "naked" tweet involve 6 shards? Each mention/hashtag adds another shard, ...

This tx complexity (measured via amount of shards involved) is a function of how fine grained the sharding of the ledger is probably

Are there any downsides to touching a lot of tx alot of the time (happens more frequently in finely sharded ledgers)?

If you think about what makes up a tweet, even the most basic does the following

1. Check the tweet hash doesn't already exist x 1 shard

2. Check the user handle posting the tweet exists x 1 shard

3. Initialize metric state variables (replies, retweets, likes) x 3 shards

4. Check the tweet ID doesn't already exist x 1 shard

The last one is kinda superfluous, but I need it here to be able to correlate replies and retweets to tweets as that is what twitter uses an ID. Even if they gave me a hash, my hash for the tweet would be different anyway.

In a real twitter clone on Cassandra where I wasn't mirroring historical twitter data, I could just use the tweet hash Cassandra generated. Anyway ....

Each of those things is a hash, and hashes map to shards, which in turn map to validator sets (or shard groups).

A reply or retweet references an ID, which touches another shard as a tweet needs to exist to be replied to or retweeted.

Each mention touches a shard to check the mentioned user exists.

#Tags invoke a shard so I can do "trending" style stuff later, which means that hashtags need some state variables (I can pack those on in a single shard reference in this case though).

So can't really break them down more, as they are the fundamental things a tweet needs to do in order to be deemed valid and achieve agreement.

A lightweight client needs a unique API interface to call for everything it needs.

How can it be implemented in a state sharded network such as Radix?

Who will be responsible for running this interface?

This is something I'm having to deal with atm with Cassandra.

The website is behind cloudflare. I've got some load balancing going on.

Those load balanced servers have a couple of connections to validators into the network, but they won't hold connections to all validators ... would be a bit insane.

Therefore they won't hold a connection to all shards either, so a mechanism is required so that validators they are connected to can do any searches for them and just return the results.

At the moment I've implemented a simple "scatter gather" method, where a validator can gossip a search request across the network and receive a response. It's not very efficient, but it works.

Later on in a properly deployed RPN-3 this needs either a different method, or needs to be massively improved, as there will be many clients requesting things, and not just a couple of load balanced web servers to serve a simple demo.

How does synchronization scale with the number of nodes?

I'm going to answer this in the context of what I'm doing with Cassandra as it's a little simpler than what RPN-3 will ultimately do but the principles will be largely the same.

Ideally I don't want nodes to have to "sync" in RPN-3 either, hence looking at this differently from the obvious in Cassandra.

In Cassandra, nodes don't "sync" ... if I'm processing something and need some state information I don't have from another shard group to process the atom, I can request that from some validators in that shard group. Likewise, if I'm processing something I know some validators in the other shard-group will need, I can send that to them ahead of time if I want to be nice.

Really what's going on is that validators are sending each other state that they will need, so that they ALL can execute the atom against the state they have and the state they have collected. If all nodes are executing the atom in its entirety (because they have fetched all state), then the output will be the same on ALL validators, in ALL shards groups relating to that atom.

That fact makes voting a shit ton more efficient, because I can make the assumption that if 2f+1 validators are non-faulty, then 2f+1 will all arrive at the same output hash. That output hash can be represented in a number of efficient ways, and means that all the votes from all the validators involved don't have to be sent around the network.

Doing things this way ventures a little into the stateless stuff I've been going on about ... I can receive some state, which has a proof with it, and I can look at a compressed representation of votes from those responsible for that state to make sure that they agree on the proof I received.

I don't need to sync anything, I just fetch discreet state information. I don't even need to fetch it from a multitude of validators in the other shard group, I can just get it from one, and wait for the compact output representations to be gossiped over. Then check that it was valid.

I will also need some help understanding the data structure. Obviously we're not a blockchain, but we're not a DAG. So what does the data structure look like within a shard? We say that each transaction is its own atomic unit, but what does the data structure look like? i.e. is it in some form of say NoSQL database type thing full of JSON, or is it a linear set of tx like a blockchain, or in some relational database

I will come back to this in 6 months ... atm I'm using block trees / which flatten to a chain in Cassandra ... moving forward from there are options for RPN-3 to use semi-disjoint DAGs within shard groups, and also possible for "soups" of data which are loosely correlated among others.

Over the course of the year I'm wanting to retrofit all of the different architectures into some Cassandra hybrids and review them.

Dan, in regards to the validator sets, is a node in only one set, and the other 99 nodes in that set then all verify the same set of transactions? And that's it?

Or, is a node also in multiple sets. So that there's a very complex tapestry of nodes being in multiple validator sets. So if a node does collude and get something wrong, it doesn't just lose its stake in that validator set, but all other validator sets. Therefore this is a much more resilient system, as there's a lot more stake at risk vs what could be won through colluding in a single validation set. The complex tapestry of each node being in multiple, random validator sets, means that it's basically impossible to collude.

Wouldn't I just create multiple validator identities and stake them independently? That's less risk. I can misbehave everywhere and maybe only one gets caught and slashed.

I was awake way past my bedtime thinking about Radix and it's got me wondering a few questions, sorry if asked before: how is the ledger history stored? Does Node A have the full transaction history of its current shard, or does it have the nth epoch of shard 1, n-1 epoch of shard 2, n-3 of shard 3, etc. going back to the point where it came online?

Where/how are more historic low activity addresses stored?

How long does an epoch last, how do the nodes know how much time has passed and does it end simultaneously for all nodes?

Follow up, may not be relevant if I'm misunderstanding - will there be a massive burst of network activity when an epoch ends and information of the addresses within a shard is transferred?

A validator stores all the historic information for the shards it's responsible for. If it gets relocated on an epoch, then it would discard the information it has, and sync up with the validators in its new validator set all the shards it's responsible for.

This is not an ideal solution, hence my forward thinking into stateless, but in a growing network sync time should be fairly constant. More validators joining = more validator sets = less shards overall per validator set to sync.

The system has no concept (nor cares) about low frequency addresses etc, it just has state, and that state is either live (UP) or used (DOWN).

When an epoch ends, some validators from a validator set may be reassigned, either as a preventative measure against attacks, or because of network growth. Only a small portion of validators will move per set on each epoch so that liveness remains.

Epoch duration is something that we still need to tune, but the sweet spot seems to be in the order of a few days.

How epochs are tracked will depend on the final implementation of RPN-3 and how it looks exactly. For RPN-1 it's pretty simple as it's unsharded, so is just the number of view changes / phases completed.

Hi Dan! I don't have a tech background and was hoping you could help explain to me - what exactly prevents a single shard from getting "congested" if it eventually gets very popular/complex hypothetically?

A single shard can't get congested, it's one of 2^256 shards, and the thing that lives in there wasn't put there by a "user" but was the result of a hash function. For the sake of simplicity, passing something through a hash function is basically like picking a random number between 2^256.

The chances of 2 things ending up in the same shard, over a reasonably measurable period of time are practically zero ... like literally, ZERO. You've more chances of winning the lottery every week for 1000 years.

So shards themselves won't get congested.

Validators don't serve shards, they serve shard GROUPS. Which are batches of shards they are responsible for within their set. This is needed because we would like to keep the amount of load on validator sets as small as possible, while maintaining security for responsiveness and decentralization.

Packing shards into groups allows the group size to shrink as the validators available in the network grows.

Initially the first 100 validators will serve everything ... then another 100 validators join, so then 2 validator sets are possible serving a shard group which is half the entire shard space, then 200 more join and each set does 1/4 .... so on and so forth.

Think about validators as maintaining groups of shards, not individual shards, and things in general may become a lot clearer.

That covers all the questions Dan had time to answer this session. If you are keen to get more insights from Dan’s AMA’s, the full history of questions & answers can be easily found by searching the #AMA tag in the Radix Telegram Channel. If you would like to submit a question for the next sessions, just post a message with questions in the main telegram, using the #AMA tag!