Updated: 2024-05-08.
A lot of people use the term “layer 2” or “L2” to describe protocols that claim to increase blockchain scalability. These are big words that I intend to go into detail on in this article. This article makes many references to how these terms relate to Bitcoin, since there’s been a number of groups working on various projects they call Bitcoin L2s, but only some of them really are. The Ethereum ecosystem has more projects and the discourse is more well-understood so I make several references to projects in that space and how they relate to the work on Bitcoin, but conclusions here apply to any network.
This article isn’t meant to be a statement of my personal opinion, although my own opinions do color how I write about the ideas here. The goal of this article is to lay out a framework that’s consistent with how these terms are dominantly used in practice and identify weaknesses and contradictions in some opposing definitions.
The key property we’re looking for in an L2 is that our confidence in the funds we hold in the L2 is comparable to those in the L1. That is to say, we can transact with them with a similar degree of security in the L2 as in the L1.
To define this is in more formal terms we can use a definition like the following:
A construction that improves the overall throughput of a blockchain ledger without requiring additional majority or strong economic trust assumptions.
There’s a few parts of this to unpack. Let’s start by defining what I mean by a “strong economic trust assumption”. Broadly speaking, the way we talk about the security of protocols is by defining an environment in which they’ll operate and then making proofs that the protocol maintains some property. When we define the environment we give a set of assumptions about it that we expect to hold true in practice. For a consensus protocol like proof-of-work, we assume that >50% of the hashrate is “honest” and will continue to mine on top of the longest chain. When this assumption holds, the consensus moves on as normal. If this assumption is violated, like in a 51% attack, the property that the chain keeps progressing can be violated, since an attacker could reorg blocks or control the exact contents of future blocks. But it doesn’t allow them to steal anyone’s funds, since funds are protected by cryptographic signatures.
The goal is that we want to compare protocols based on the assumptions they require users to accept when they use them. Then we can discuss in more philosophical terms about if those assumptions and expectations are acceptable rather than getting bogged down in the technical mechanics.
In the context of blockchain protocols, an economic assumption is usually about the parties that are in control of some resource. In proof-of-work consensus systems, that’s hashrate. In proof-of-stake consensus systems, that’s the asset that’s being staked (usually it’s the native asset of the ledger). We trust that some subset of parties in control of the resource are “honest”, whatever honest means in the context of the protocol. These kinds of assumptions rely on the resource holding its value over time. If the resource suddenly becomes worthless, then there’s no reason parties wouldn’t try to misbehave within the protocol. This is related to the “nothing at stake” problem.
A strong economic assumption is when the subset we trust to be honest is some threshold proportion of a larger set of parties with the resource, or as a shorthand we can call this “t/n”. In practice, this threshold tends to be either 1/3, 1/2, or 2/3 depending on the systems and properties in question.
Strong economic assumptions require a high degree of trust in the parties, but this can be acceptable in some scenarios. We have to look at what would go wrong when these assumptions are violated. If the Ethereum mainnet validators decide to become malicious they could rewrite history, but doing so would burn a ton of ETH and weaken the value of the network. This plays into the game theory of this trust assumption since they would be damaging the value of the thing they’d be seeking to gain tighter control over. Even then, they would not be able to steal funds. So at the large scale that ETH operates at, a legitimate argument can be made that it’s a safe assumption to make, even if you reject the macroeconomic tendencies that proof-of-stake might have.
I contrast this with weak economic assumptions. These still assume there’s an asset of value, but merely require that only one (or a few) parties out of the larger set are honest or at least economically self-interested. This is a pretty safe assumption to make because in even a moderately large group, it’s always likely there’s going to be at least one honest person. And in many situations, you can participate in a protocol knowing that you’re probably going to be honest, which gives you a very strong belief that this assumption will hold.
I also want to share this graphic that I used in the risk article again since it’s so good at describing the security model here.
(thanks Vitalik!)
The other part is what I mean by “improving overall throughput”. This is a looser concept, but the goal of any L2 system is to provide a viable, cheaper substitute for some subset of the transaction activity happening on the corresponding L1. The cost of a single transaction can be reduced, meaning that the wider network can bear more economic activity for the same cost.
Additionally, we tend to think about L2 systems as existing on top of rather than beside the L1 they derive their security from.
So what kinds of things meet the above definition? The most commonly used kinds of protocols that meet this definition are payment channels (principally Lightning, but others exist) and rollups of either variety (zk or optimistic).
This section describes just two kinds of L2s that have been deployed into production use, but the theoretical design space is much wider than just these.
Payment channel networks were the first concept that could reasonably be called an L2. If you use the software correctly there is no known way for another party to steal your funds. I won’t go into elaborate detail on their function because that’s not what this article is about. An important aspect of Lightning and other PCN designs is that they are not distributed consensus systems, which has a lot of benefits for rolling out upgrades and logic about how consensus is enforced do not apply.
Lightning does have some footguns. There are theoretical attacks on it that can interfere with a user’s access to their funds, in Lightning. But even in the worst-case scenario, channels with misbehaving peers can be closed and funds can be moved into a channel with a different peer (presumably with a better reputation).
The main vulnerability Lightning has today that compromises its security is replacement cycling. This is a real issue that can and should be alleviated with newer technology, but it has rather moderate impact and can’t attack funds at rest. It’s also fairly hard to pull off at scale with the defenses that have been implemented. More on this towards the end.
A practical weakness of Lightning is that it isn’t “richly stateful”. PCNs in general only do payments. They can be hyper-optimized for that, which is why the fee floor can be so low, but they’re not suitable for all applications.
There’s two broad categories of rollups, optimistic and zero-knowledge. Optimistic rollups use a fraud game (with a weak economic assumption) to ensure only valid state transitions are accepted, while zk-rollups use zero-knowledge proofs which rely on bleeding edge math. We don’t really care much about the distinction in this context since we want to talk about what they share in common. The rollup chain is advanced by sequencers submitting batches of L2 transactions to a “host” contract system on the L1 in a single L1 transaction. The contracts on the L1 use this to establish a trustworthy view of the rollup chain state instead of having to blindly trust the sequencers.
100% of the sequencers could decide to become malicious and your funds would still be safe, as invalid states will not be committed to and acted on by the host contracts. Users will always be able to withdraw their funds to the L1 by forcing a transaction through the host contract, even when all sequencers are malicious. Guarantees on the ability to exit are one of the distinguishing features that real L2s have that sidechains do not, and this plays into the game theory when we think about the level we can rely on them.
Since the host contracts should accept incorrect state transitions, by running an L1 full node, users can be confident about the state of all of the rollups that are based on it without any additional effort. They can cheaply query a state commitment they already have and verify statements about any arbitrary piece of rollup state that’s committed to in it without any additional effort to track the state of the rollup beyond what’s already in scope of the host chain’s smart contracts. This is extremely powerful and is an important qualitative distinction!
A practical drawback of many currently deployed rollups is they also have a governance mechanism attached to them that can activate a pause function, like in the event that a vulnerability is identified. This is also the path through which upgrades can be deployed. More mature rollups have spun out their governance processes to the community. The state of the art is still evolving. There is no reason rollups have to have these systems. What we care about are the properties of the core cryptographic protocol and what the limits are of its trustworthiness.
I wrote all about these particularities in Governance vs Protocol Risks, so I defer to what I’ve already written there. Conflating what governance is capable of with what the risks that the core cryptographic protocol have muddies the waters when discussing design and confuses newcomers. The ways we relate to the two risks they pose is different: one is of a cryptographic nature and arises from how that constrains us, where the other is of a social and somewhat arbitrary nature that arises from wanting to plan for future eventualities and can change though discourse and deliberation.
Plasma is an extension of state channels (which are themselves an extension of payment channels) that people were excited about from ~2017 to ~2019. Plasma channels involve putting state commitments from the channel states on the parent chain that permits some level of asynchrony between participants. But it’s a bit of a technological dead end because it’s complicated to express state transitions robustly and ensure necessary messages are disseminated. Once rollups were invented most of the R&D on Plasma moved to rollups since rollups can achieve everything people wanted Plasma and directly solve the issues Plasma faced.
In some configurations you could describe plasma chains as L2s, but in some other configurations they’re really a sidechain with a bridge mechanism that can still work to withdraw if the custodians drop off. I mention Plasma here since it’s relevant later.
The term “sidechain” refers to a blockchain with its own consensus system and a two-way peg between it and a “main” blockchain. The two-way peg is a kind of bridge that almost always takes the form of a multisig contract. Users deposit funds to be held by the custodians, who then issue a bearer asset on the sidechain ledger. When a user wants to withdraw from the sidechain back to the main chain, they redeem the bearer assets with the custodians who then release the funds back to them. There are variations on this design, but the common theme between them is that the custodians are responsible for the funds and, crucially for this discussion, have the capability to steal them. The structure of the group of custodians varies. It might be some fixed set of named parties, or it might be a bunch of coin/token holders dictated by a smart contract system on the main chain.
Typically what distinguishes a sidechain vs a fully independent blockchain is that a recent block is periodically committed back to the main chain in some way, although not all sidechains have this feature. This allows the sidechain consensus to follow the commitments on the main chain in the event a short range fork occurs. This does credibly allow you to claim that sidechains that do this inherit some level of consensus security from their main chain (ie. that blocks won’t be rolled back once committed to on the main chain), but following main chain consensus is the easy part! This security does not apply to any bridged funds from the main chain when viewed from the perspective of the main chain. We don’t think about this as being an “on top of” relationship as described in the first section, we think about it like a “beside” relationship. Sometimes this functionality isn’t even supported in the sidechain design and it only has the bridge, yet they’re still called sidechains.
We cannot say that a sidechain is a way to scale the throughput of its main chain, since using bridged funds demands a very different trust relationship than using main chain funds. We introduce an additional set of parties that we have to trust even more than the set of parties we normally trust when using the main chain. The main chain has no view of, and so has no way to reason about, activity on the sidechain, so the custodians’ actions are fully trusted by the main chain (they’re usually regular multisig transactions, after all). Additionally, sidechain users generally have to run a full node for the side chain in order to be able to trustlessly interact with it even for assets issued natively in the sidechain, just as they do for any other L1 blockchain. If they did not, then sidechain validators(/custodians) would have an easier time enforcing a hard fork that goes against the best interests of the sidechain users as a whole, just as in a typical blockchain. In this aspect, it does not expand the overall throughput of the main chain’s ledger because it’s a completely separate and isolated settlement domain. To make use of the throughput it has with the minimal possible trust you must increase the amount of resources you devote to fully verifying it and the main chain, just as you would if you just increased the block size (or gas limit) of the main chain.
If a user has funds on a sidechain they’re completely at the mercy of the custodians to trust that their bearer assets will be redeemed at face value. Even if less than t/n of the signers turn malicious, if >(n-t)/n becomes uncooperative, they can freeze funds in the multisig and hold them for ransom or otherwise coerce the honest parties to act in a way they normally would not, which is still a failure mode. This very strong relationship also probably has a bunch of regulatory significance that hasn’t even been figured out in court yet! It’s very plausible that there’s a legal significance to the t/n trust with a multisig bridge like this (since it’s an active role) that distinguishes it from what a governance mechanism might have the capacity to do (since it’s a passive role).
This was a common kind of graphic that was shared around when sidechains were first being developed in the early-mid 2010s.
This puts forward a nice idea of a network of chains all deriving their security from a main blockchain and different kinds of economic activity being split off onto the sidechains. But it turns out making trustworthy bridges is much more complicated than they envisioned back in the day and sidechains don’t inherit the more important kind of security that we want from their main chain.
In any case, this is distinctly different from the mechanism that rollups use. Using the term “L2” to describe sidechains conflates them with other kinds of protocols with very different security models. This weakens its usefulness as a term. That also leaves us without an obvious term to distinguish protocols that require accepting the very strong kinds of trust assumptions that sidechains require, from protocols that don’t require those assumptions and do give levels of security comparable to the base chain.
The primary technical reason many historical sidechains were built as opposed to rollups is just because they’re easier. The cryptography is far simpler and basically no new technology had to be invented to make them work. Rollups are newer technology that wasn’t proposed until ~2018 and wasn’t widely understood until around 2020. While the state of the art is much farther ahead now than it was 6 years ago, there’s still a lot of engineering that has to be done. But if all you’re trying to do is to get to market really quick, it still seems like a way better idea to build a sidechain.
Some links:
// TODO more
Polygon (the company) runs a sidechain called Polygon PoS (formerly “Matic”, which is the name I will continue to use here for clarity) that exists alongside Ethereum. From its founding until not too long ago, Polygon regularly marketed Matic using terms like “layer 2 plasma sidechain”. This is a bit of a buzzwordy branding, but it was really a sidechain with a plasma-like bridge, as described earlier. It carries nearly all of the risks that sidechains do, but (if I remember correctly) the plasma-ish bridge allows exits in the event the chain halts. But Polygon’s marketing team is incredibly effective and successfully misled thousands of newcomers into believing that they’ve solved EVM scaling and decentralization. Their argument was that because they had the plasma anchor, it’s an L2. But when we talk about security we have to talk about trust relationships, that doesn’t matter since the main chain still trusts the Polygon validtors to commit correct states.
And of course the broader Ethereum community eventually rallied against this, because they were obviously trying to be misleading. It took a while but their marketing has walked back from its prior narrative about it. They weren’t the only ones to use misleading marketing like this, but fortunately none are as large as Matic and there’s other teams building viable L2s now to point newcomers to.
One weakness the Matic sidechain still has is its vulnerability to rapid changes in price. If the total value of the bridged funds increases, it’s plausible that it could cross over the threshold where it’s economically advantageous for Matic validators to just run off with all of the bridged funds and accept that they’ll lose their stake because they still make a net profit. This isn’t particularly likely to happen, but it’s entirely possible and we shouldn’t base our security on the altruism of large groups of economically motivated parties.
However, Polygon (again, the company) bought a bunch of zk companies and amalgamated them into a larger umbrella of “Polygon zkEVM”. It’s kinda a mess of different teams, but they are working on a real zk-rollup. The details are mostly out of scope, but there’s plenty of marketing they’ve released.
EF also has an article discussing sidechains for more discussion.
// TODO dig up more links here for concrete historical evidence when I find them
I would also like to do an argument by contradiction. Let’s suppose for this section that it is reasonable to designate a multisig-bridged sidechain as an L2. This leads us to some weird conclusions that are hard to accept.
We can consider a multisig-bridged sidechain as two parts:
So let’s look for patterns here. Multisig bridges aren’t that uncommon! There are quite a few of them because they’re fairly easy to implement.
But this also suggests something strange. If a Bitcoin sidechain is a Bitcoin L2, then any blockchain with a bridge to it from Bitcoin is already almost a Bitcoin L2. If someone started putting chain tip commitments on Bitcoin for any of those protocols then you could retroactively turn it into an L2. Ethereum’s PoS consensus even already has a mechanism to do this using its weak subjectivity checkpoints. If that’s really the distinction then most blockchains with DeFi are already almost a Bitcoin L2. Solana, Polkadot, Aptos, and countless others.
That also means that a pair of chains with multisig bridges in both directions are both almost-L2s of each other! This is obviously nonsensical, the “on top of” relationship is at odds with itself. Does it really make sense to say that Ethereum is almost a Bitcoin L2 because tBTC exists? (Actually, it would be an almost-L2 multiple times over, for each the several different bridges that have been built.)
Then if we weaken the requirement for the commitments to include more sidechains that don’t post commitments as also being “real” L2s, then all of the “almost-L2-making” bridges also become real L2 bridges. You can’t reasonably say they somehow derive security from each other when it’s cyclical like that.
It’s really hard to argue how sidechains should be L2s but these cross-chain bridges shouldn’t be making their target chains to be L2s of their sources. There is not much technical difference between sidechains (as supposed L2s) and chains with bridges to them. The only grounds to assert a distinction would be based on vibes. If this is how we’re willing to accept how our terms work, then it would be stupid for every altcoin to not checkpoint itself on Bitcoin and then market itself as a “Bitcoin L2 sidechain”. (Fwiw, ICP does do this and make an argument a lot like this. That should tell you more about ICP’s scrupulousness than anything else.)
One possible reason for this line of reasoning being so popular in the Bitcoin ecosystem might be because Bitcoin’s scripting is really restrictive. This is a big limitation on what kinds of scaling solutions you can build. It’s really unfortunate. It’s a trillion-dollar asset, so for some people that are really strongly committed to Bitcoin’s vision (however they want to describe it), they might flip their logic around and decide “since this is the best we can get we should embrace it”, saying that “L2” for Bitcoin just has a broader definition.
Why should Bitcoin be special? Ie. why should it be okay that “L2” has a weaker definition when talking about Bitcoin than it does others? If we say this is fine, regardless of the reasoning, this should be seen as an admission of defeat and acceptance that Bitcoin really is technologically inferior to all the altcoins that have come after it that can support real L2s and isn’t likely to be improved substantially.
No, we can do better than that. And Bitcoin devs are trying, but as the oldest and most ingrained distributed ledger, it’s a slow process that requires a lot of community/ecosystem support.
It’s possible to make an argument that “none of this is likely to happen because even if an attacker got away with stealing the funds from a sidechain bridge, they’d never be able to cash them out without going through KYC”. That might be true, but this is all very new technology and legal precedents are still uncertain. It’s possible we might end up in a world where the judicial precedent changes such that the “Code is Law” mantra ends up being real law. If that ever becomes the case, then there’s no reason that quant/MEV firms won’t eventually start actively attacking blockchain protocols instead of merely subverting them. That’s just one plausible scenario, but there’s other black swan type events that could jeopardize the security of lots of funds if we accept a suboptimal regime.
I also think it’s just wrong not to be very open about the security model and trust assumptions that the protocols we rely on have. When a nontechnical user downloads a new wallet and wants to try a new project, they expect it to be reasonably safe.
It really annoys me when people play with definitions to market a product as something that it isn’t, and it’s especially problematic when real money is on the line. Yes, don’t invest what you can’t afford to lose. But it’s doing your users a disservice when they think they’re doing everything right but they’re really opening themselves up to risks they were misled into thinking didn’t exist because they used a product whose marketing uses terms contrary to how most of the rest of the ecosystem uses them.
Of course which concepts words refer to drifts over time, that’s the reason any two related languages can have differences from each other. But in technical discussions it’s extremely important to have a common lexicon, otherwise we end up talking past and misunderstanding each other. But when terms are intentionally used contrary to their generally accepted definition, that’s not something we should accept as a natural drift in terminology.
This is an especially important line when trying to introduce people from outside the Bitcoin ecosystem to using Bitcoin. Because it’s really hard to argue that calling sidechains L2s isn’t being actively misleading. In the legacy finance system, you might call that fraud.
Currently the only possible L2s for Bitcoin is constructions like Lighting. This is unfortunate, but it should be seen as a call to action to experiment with new features that Bitcoin could be upgraded with in order to permit more sophisticated constructions.
There’s a few that are right on the edge of being possible with new features that have working prototypes (CTV, APO, etc.):
And of course it’s entirely plausible for rollups to be implemented on Bitcoin, though that still requires more experimentation to work. These designs get better, cheaper, and more useful if we have rollups and can experiment with stacking layers and using the L1 features instrumental in implementing a rollup.
Sidechains, like Plasma, are somewhat of a technical dead end. It’s a bit like all of the alternatives to trains that have been thought up. All most sensible changes you’d make to improve sidechains end up making them function more like rollups. In specific scenarios, choosing to build and use a sidechain might make sense if that’s a trust model that the specific parties involved can accept. But that’s not the case for general purpose applications meant to be used by a wide audience.
There’s a certain level of collective willpower that needs to be assembled to advance the status quo, and I can’t predict if that’s really going to get together or if we’ll just keep dreaming about OP_CAT for the next 10 years. There are teams that are experimenting with newer constructions like BitVM that may unlock potential new designs, but it’s not quite ready for production use, yet.
Also, proof-of-work is ugly and looks bad on TV. So if we’re going to argue that it’s worth it, then we need real scalability that makes the best use of it that can reasonably be achieved.
In case my tone in this article comes off as harsh, I want to clarify that I want to direct none of the criticism in this towards the very talented engineers that work on many of the projects around this space. Most of the blame for this confusion is in part a result of historical happenstance and a few people (who tend to have an interest in a more broad definition being accepted) being opportunistic or shortsighted. I hope that we can be more rigorous with what kinds of protocols we aim to build and rely on and focus our efforts to make that actually into reality.
Bitcoin magazine released a statement on their editorial policy on Bitcoin Layer 2s, which might be interesting if you prefer to defer to an authority on the matter.
To be eligible for coverage, a Bitcoin L2 must:
Use bitcoin as its native asset: The L2 must be foundationally designed to use bitcoin as its primary token or unit of account, and the mechanism for paying fees for the system. If it has a token, it must be backed by bitcoin.
Use Bitcoin as a settlement mechanism to enforce transactions: Users of the L2 must be capable of exiting the system through a mechanism (trusted or trustless) that returns control of their funds on Layer 1.
Demonstrate a functional dependence on Bitcoin: If Bitcoin were to experience a total failure, and the system in question were to remain operational, then it is our position that that system is not a Layer 2 of Bitcoin.
The broad strokes of this definition aren’t actually bad, but there’s somewhat of a conundrum. The phrasing of “use Bitcoin as a settlement mechanism to enforce transactions” along with “[…] exit the system through a mechanism (trusted or trustless)” is partly oxymoronic. If the mechanism demands trust on a (set of) third party(s) then Bitcoin isn’t enforcing the exit. The consensus of the sidechain is also not enforced by Bitcoin in general, for reasons along the lines of how you have to run your own full node.
You can remark that rollups are a lot like a blockchain with its own consensus running separately from their L1, which makes them “like” sidechains. This is roughly true, but that observation describes an what’s more of an aesthetic similarity that ignores the mechanisms by which they operate.
You could rearrange the terminology like this:
and arguments throughout the rest of this article would still apply. But I would still object to this framing because it again conflates the term “sidechain” to refer to multiple things with very different security properties and distracts the “on top of” vs “beside” framing that’s useful when educating and generally makes the language more cumbersome when trying to be precise.
People who use this phrasing tend to argue that it should be this way because Lightning isn’t like a blockchain at all and sidechains/rollups are more like blockchains and that’s where the difference is. But that’s still kinda vibes based since we can think about Lightning like a blockchain too, just one where the only state is the balance of the channel and the set of outstanding HTLCs and where we require both parties to sign to produce a block.
I think the term “federated” as it’s used in the term “federated sidechain” is kinda weird. It suggests that members of the federation have the ability to disagree with the majority and still maintain some level of individual and internal independence. But that’s not how it works, the parties always act to form a quorum amongst themselves and act externally as a single actor with a unified decision making capacity.
This isn’t that consequential, it’s not a very widely used term in this context and it’s meaning is understood, even if I disagree with how it’s used.