On-chain wallets achieve a roughly 1-1 mapping of transactions to transactions: for every economic transaction that a user performs, roughly one blockchain transaction is needed. Aggregations, coinjoin, cut-through-payments, etc. change this statement a bit. But it’s roughly correct.

Lightning achieved a many-to-one mapping of transactions to transactions: the magic of Lightning is that an effectively infinite number of economic transactions can happen in a single Lighting channel, which itself is tied to a single unspent transaction output (UTXO). Essentially we’ve taken the “time” dimension — transactions — and achieved a significant scaling by collapsing that dimension.

But creating even a single UTXO per user is, arguably, not good enough. So there are many proposals out there to achieve even greater scaling by allowing multiple users to share a single UTXO in a self-sovereign way. Again, collapsing another “space” dimension of scaling — users — into one UTXO.

Our goal here is to do an overview of all these proposals, figure out what technical patterns they share, figure out what kinds of new opcodes and other soft fork upgrades they need to function, and create a comparison table of how all the parts fit together. Along the way we’ll also define what an L2 protocol actually is, what kind of scaling Lightning is already capable of, and get an understanding of what improvements we need to do to mempools to achieve all this.

Thanks goes to Fulgur Ventures for sponsoring this research. They had no editorial control over the contents of this post and did not review it prior to publication.

Thanks also goes to Daniela Brozzoni, Sarah Cox, and others for pre-publication review.

Contents

  1. Definitions
    1. What is Layer 2?
    2. What are Covenants?
      1. Recursive Covenants
  2. Goals
    1. Lightning’s Scaling Limits
  3. L2 Overview
    1. Lightning
      1. Non-Interactive Channels
    2. Channel Factories
    3. Eltoo/LN-Symmetry
    4. Ark
      1. Ark Economics
      2. Bootstrapping Ark
      3. Interactivity
      4. Advanced Ark
      5. On-Chain Fee Payment In Unilateral Withdraw
    5. Validity Rollups
    6. BitVM
    7. Hierarchical Channels
    8. CoinPool
    9. Enigma Network
  4. Mempool Considerations
    1. Transaction Pinning
    2. Fee Payment: RBF, CPFP, SIGHASH_ANYONECANPAY, Anchors, and Sponsorship
    3. Replacement Cycling
  5. Feature Patterns and Soft Forks
    1. OP_Expire
    2. SIGHASH_ANYPREVOUT
    3. OP_CheckTemplateVerify
      1. LNHANCE
    4. OP_TXHASH
    5. OP_CAT
      1. Is OP_CAT Too Powerful?
    6. Incremental Hashing
    7. Script Revival
      1. Simplicity
    8. OP_FancyTreeManipulationStuff
  6. Fund Pools
    1. Individual Pre-Signed Transactions
    2. Txout Trees
    3. Balance Based Schemes
  7. Failure Data Ratio
  8. Consensus Cleanup
  9. Testing Soft-Fork Dependent L2’s
  10. Potential Soft-Forks
  11. Footnotes

Definitions

What is Layer 2?

Often the term “Layer 2” is defined broadly, to the point where even a bank-like entity (e.g. Liquid) could be defined as a Layer 2. For the purposes of this article we will adopt a strict definition: a Layer 2 (L2) is a Bitcoin-denominated system, with the purpose of allowing BTC to be transacted more often than the number of on-chain transactions with other parties. Such that either:

  1. No-one is able to profitably steal funds in the system, taking into account in-system punishments and costs. Out-of-system costs and punishments like reputation loss, legal consequences, etc. are not considered in our definition.
  2. (Preferred) The true owners of the funds are able to unilaterally withdraw their funds, minus transaction fees, without the cooperation of any third parties.

The first option is required because we want our L2 systems to be able to represent amounts and transactions of such small value that they are unable to be represented on-chain. For example, in Lightning, HTLCs can have a value too small to represent on-chain. In that circumstance the HTLC value is added to the transaction fee of the commitment transaction. While a Lightning node can “steal” a dust HTLC by closing a channel at the right moment, doing so is more expensive1 than the HTLC is worth, making the theft unprofitable.

That said, unilateral withdrawal is always our primary design goal.2

With this definition things like Lightning are considered L2 systems. However systems such as Liquid, Cashu, and Fedimint are not L2’s, because another party or parties has control of your funds. Client-side validation schemes like RGB are also not L2’s under this definition, because they are unable to trustlessly transact BTC itself. Finally, Statechains fails to make the cut, because the Statechain entity can steal funds if they do not follow the protocol.

What are Covenants?

…and why do more scalable L2 systems need them?

In Bitcoin scripting, covenants are mechanisms by which the way a txout can be spent is is restricted in advance, such that the form of transactions used to spend that txout are pre-defined or otherwise restricted in a way that is not purely limited to signatures. L2 systems that share UTXOs between multiple parties need covenants because they need ways of constraining how the UTXO can be spent to implement the rules and incentives of the L2 protocol.

Recursive Covenants

A recursive covenant is a covenant with the property that the rules constraining how a UTXO can be spent can be applied recursively, to child UTXOs of the spending transaction indefinitely. Recursive covenants have long been considered to be undesirable by some because they can encumber coins indefinitely. Or at least, indefinitely without the permission of a third party such as a government.

Goals

Lightning is the current “best in class” Layer 2 system out there. However it has limitations. Namely:

  1. Scaling - Lightning currently requires at least one UTXO per end user.3
  2. Liquidity - Lightning requires that funds be tied up in channels.
  3. Interactivity - Lightning requires the recipients of payments to be online in order to receive them trustlessly.

In evaluating Layer 2 systems, our goal will be to improve on these key limitations, ideally without adding new limitations.

Lightning’s Scaling Limits

What does “one UTXO per end user” mean in practice? Since Lightning channels can operate indefinitely, one way of looking at this is to ask how many new channels can be created per year4. Creating a taproot output has a marginal cost of \(43\mathrm{vB}\); if channel creation is amortized, with many channels created in a single transaction, the other transaction overhead can be made negligible and fairly large numbers of channels can be opened per year to on-board new users. For example, suppose that 90% of block capacity went to opening new taproot Lightning channels:

\[52{\small,}560\frac{\mathrm{blocks}}{\mathrm{year}} \times 1{\small,}000{\small,}000\frac{\mathrm{vB}}{\mathrm{block}} \times 90\% \times 1\frac{\mathrm{channel}}{43\mathrm{vB}} = 1.1\,\mathrm{billion}\frac{\mathrm{channels}}{\mathrm{year}}\]

It’s estimated that about half of the global population own a smartphone, 4.3 billion people. So we can in fact on-board a significant percentage of the entire population that is likely to be able to make use of a Lightning channel per year.

However, channels do not last forever. On occasion users will want to switch wallets, increase or decrease channel capacity, etc. The most efficient method to change the capacity of a channel is via splicing, notably implemented in Phoenix Wallet.

Like channel opens, splicing could also be done in an amortized fashion to improve efficiency, with multiple splice operations sharing a single transaction to reduce the number of inputs and outputs necessary to add and remove funds5. Thus the delta blockspace required per users’ splice, assuming the use of musig, is the \(43\mathrm{vB}\) taproot output plus the \(57.5\mathrm{vB}\) taproot keypath spend, for a total of \(100.5\mathrm{vB}\). If we again assume a 90% blockspace usage, we get:

\[52{\small,}560\frac{\mathrm{blocks}}{\mathrm{year}} \times 1{\small,}000{\small,}000\frac{\mathrm{vB}}{\mathrm{block}} \times 90\% \times 1\frac{\mathrm{splice}}{100.5\mathrm{vB}} = 470\,\mathrm{million}\frac{\mathrm{splices}}{\mathrm{year}}\]

Finally, note how switching Lightning channels between wallets can be done in a single transaction by either trusting the next wallet to sign a commitment transaction after the funds have been sent to the commitment address, or with co-operative close-to-new-channel support in both wallet implementations.

Of course, there are competing use-cases for Bitcoin beyond Lightning channels, and how that will translate into fee-rates is difficult to know. But these numbers give us a rough ball-park that suggests that with current technology, it is at least technically possible to support hundreds of millions of self-sovereign Lightning users.

L2 Overview

Under our definition of L2 systems, there are two main design patterns being discussed in the Bitcoin development community:

  1. Channels
  2. Virtual UTXOs

In the channel pattern, of which Lightning is the main example, the protocol progresses by exchanging pre-signed transactions between the parties that could be mined, but are not in the “happy path”. These pre-signed transactions split a UTXO between the parties; transactions happen by repeatedly changing the balance of that split, with new pre-signed transactions. Since there will be many different possible valid transactions spending the same UTXO, some incentive mechanism is needed to make sure the correct transaction is the one actually mined.

In the Virtual UTXO (V-UTXO) design pattern, of which Ark is the most prominent example, V-UTXOs are created via covenants or multi-party agreement, via the creation of transactions that could be mined to unilaterally withdraw the V-UTXO funds by putting them on-chain, but are not in the “happy path”. In that respect, V-UTXO’s are similar to channels. But unlike channels, V-UTXO schemes do transactions by spending the V-UTXOs themselves, in (conceptually) a single6 pre-signed transaction.

The “happy path” design pattern is the use of an “all parties agree” script path, such as a N-of-N multisig; taproot is designed specifically for this concept, allowing the key path to be a N-of-N multisig via musig. Assuming that all parties agree, the happy path allows for the coins to be efficiently (and privately) spent.

Interestingly, since virtual UTXOs are “real” in many senses, it is quite easy to build channels on top of virtual UTXOs by simply creating virtual UTXOs that, if mined, would lead to the creation of the UTXOs required for the channels. In that sense, virtual UTXO schemes are a slightly lower layer than channels.

Lightning

The status quo, implemented in production as the Lightning Network, primarily based on the BOLTs standards. Lightning is a combination of a number of things, including Lightning channels and HTLCs, the P2P routing network, onion routing, invoice standards, etc. Notably, Lightning is not a consensus system, so different elements of the “Lightning system” need not be adopted in the exact same way by all users. For the purpose of this article, when we say “Lightning” we’ll use it in the broad sense, including easily foreseen upgrades to the current (typical) Lightning protocol(s) that are widely used.

As discussed above, the key characteristic of Lightning is its end-user scalability limit, due to the need to have at least one UTXO per user. That said, for the “core” routing element of Lightning — the public Lightning nodes that route the vast majority of transactions — these scalability limits aren’t much of a concern as Lightning functions just fine if there are far more end-users than routing nodes, because each public channel used for payment routing can easily support a large number of transactions per second. This is also why so many future L2 systems are expecting to also participate in the Lightning network. We also see this in how existing not-quite-L2 systems like Cashu rely heavily on the Lightning network to actually be useful: the primary usage of Cashu is probably to send and receive Lightning payments.

Non-Interactive Channels

This construction improves on Lightning channels by using OP_CTV to reduce the interactivity requirements. However, as it doesn’t improve on the 1-UTXO-per-user scaling limit, we won’t discuss it further.

Channel Factories

Here we have multiple parties negotiate a single n-of-n multisig address, along with a transaction spending that multisig address to create n different UTXO’s splitting up the funds. Those UTXOs in turn are used for payment channels. The channels can be used with the same security as if they had been directly opened on-chain, because in the event that the channel state needs to be put on-chain, the split transaction can be mined. This potentially saves on-chain space when the channels are closed, as the \(n\) parties can — in theory — co-operatively close all \(n\) channels at once.

Since channel factories are negotiating UTXO’s that could be mined, but are not expected to actually be mined in the happy path, they are a very primitive example of V-UTXOs.

Channel factories by themselves do not require any soft-forks to be possible. However, the simple channel factories described above are probably impractical beyond small numbers of parties due to the coordination required to actually achieve a scaling benefit. Thus, covenant proposals such as OP_Evict or CTV (via txout trees) aim to allow more fine-grained outcomes where individual parties can be forced on-chain, without forcing everyone on-chain at once.

Eltoo/LN-Symmetry

Since Eltoo is a terribly confusing name, we’ll only use the updated name LN-Symmetry going forward.

While Poon-Dryja channels encourage the correct state to be published on-chain by punishing incorrect states, LN-Symmetry instead allows incorrect states to be updated with an additional transaction. This has the advantage of simplifying Lightning channels by removing the complexity of penalties. However, this is likely to be a disadvantage in untrusted scenarios, as penalties are arguably needed to discourage fraud.

LN-Symmetry needs a soft-fork to enable SIGHASH_ANYPREVOUT, which is required to allow state transactions to re-spend other state transactions during updates.

By itself, LN-Symmetry offers no scaling improvements on conventional Lightning channels. But proponents have argued that it makes things like channel factories easier to implement.

Ark

Ark takes a new approach to transaction scaling: fully transferable virtual UTXOs (V-UTXOs), that can be merged and split in atomic7 off-chain transactions. In Ark a central co-ordinator, the Ark Service Provider (ASP), provides V-UTXOs for users with a defined lifetime, e.g. 4 weeks. These periods are known as rounds. These V-UTXOs are created via pool txouts, one per round, via some kind of mechanism such as CTV to allow a single on-chain txout to commit to a tree of V-UTXOs. The round expiration is how Ark achieves a scaling advantage: at the end of a round, the pool txout unlocks, allowing the ASP to unilaterally spend it with a single signature in a small transaction. Due to the round expiry time, the V-UTXOs themselves expire when the pool txouts creating them expire: users who own a V-UTXO must either spend that V-UTXO prior to the pool txout expiry time being reached, or put it on-chain (unilateral withdrawal).

To transact V-UTXOs between parties, the Ark coordinator co-signs transactions that spend one or more V-UTXOs, such that the transactions are only valid if one or more other V-UTXOs are created in a different round. In combination with some careful timeouts — see the Ark docs for the full details — this dependency is what makes spending V-UTXO’s trustless: the V-UTXO’s can’t be claimed on-chain unless new V-UTXOs are created in a different pool transaction. There’s a few potential ways to actually implement that dependency. But the exact details aren’t relevant to the purposes of this article.

Notice how this means that a given ASP will have many different active rounds happening at once. New rounds are frequently created to allow funds in existing rounds to be transferred. But the existing rounds overlap the new rounds, as they will generally expire sometime after new rounds, and new pool txouts, are created.

Ark Economics

When a V-UTXO is spent, the ASP must provide matching BTC in a new pool txout representing a new round. But they can’t recover the value of the spent V-UTXO until the round expires. Thus the economics of V-UTXO spends has a time-value-of-money cost, due to the liquidity the ASP has to provide.

Specifically, the cost is incurred when the V-UTXO is spent. While the V-UTXO is unspent, it represents a very real potential UTXO that could be put onchain to unilaterally withdraw the funds; the user owns those funds. However, to spend the V-UTXO, the ASP must create a new pool txout, using funds the ASP obtains elsewhere, while the funds in the spent V-UTXO are not available to the ASP until the expiry time is reached.

Thus spending a V-UTXO requires a short term loan, borrowing funds to cover the time interval between now and when the round expires. This means that the liquidity cost to spend a V-UTXO actually declines as the V-UTXO gets older and the expiry time gets closer, eventually — in theory — reaching zero when the round finally expires.

Finally, remember that the cost to spend a V-UTXO is related to the total size of the V-UTXO spent. Not the amount paid to the recipient. This means that wallets intended for transacting V-UTXOs directly (as opposed to managing one V-UTXO for the purposes of, e.g., a V-UTXO-based Lighting channel), have to make trade-offs in how they split up funds into V-UTXOs. A single V-UTXO minimizes the cost of unilateral withdrawal, while maximizing liquidity-based transaction fees; splitting up funds into many V-UTXOs does the opposite. This is entirely unlike the economics of on-chain Bitcoin, or Lightning transactions.

What is this liquidity cost? As of writing, the Lightning wallet Phoenix charges a 1% fee to reserve channel liquidity for 1 year; at worst Phoenix would have to tie up their funds for 1 year. However, that assumes that the liquidity isn’t used. It’s quite possible that the cost-of-capital to Phoenix is in fact higher, and they are assuming that the average customer uses their incoming liquidity in less than one year. Phoenix also earns money off transaction fees, potentially subsidizing channel liquidity. Finally, Phoenix might not be profitable!

The US Treasury Bill Rate gives us another estimate. As of writing the 3 Month Treasury Bill Rate is about 5% per year. Since there is an argument that this rate is inflated due to US dollars being inflationary, we’ll assume the cost of liquidity for BTC denominated funds is 3% per year for our analysis.

If the round interval is 4 weeks, this means that a transaction would start off with a liquidity cost of \(3\% / \frac{52}{4} = 0.23\%\), eventually declining to zero. Assuming the user tries to move their funds to a new round two weeks prior to the round expiring, the user is paying about 1.5% per year in liquidity costs to achieve self-custody of their funds. On the other hand, if the user waits until the last moment8, the cost could be nearly zero, at the risk of missing the expiration time.

Users may not see this as a trivial cost. And this cost assumes that fixed costs of each round have been made insignificant by amortising transaction fees and other costs over large numbers of participants.

What if fixed costs aren’t so insignificant? Suppose that the ASP has 1000 users, and pool txouts are created once an hour on average. Over a 4 week period, that’s 672 on-chain transactions. Which means to simply hold their funds, the ASP’s users collectively have to pay for almost as many transactions as users! It would probably be cheaper for them to all open their own Lightning channels, even though the ASP is requiring them to wait an entire hour for a confirmation.

Bootstrapping Ark

A new ASP with few users faces a dilemma: either ASP rounds happen infrequently, and users have to wait a long time for the proposed round to gather enough V-UTXOs to achieve a useful scaling and transaction fee reduction. Or ASP pool transactions happen frequently, with high transaction fees paid per user. As we showed in the previous section, it can take a lot of users to amortize frequent rounds, and their underlying pool txouts.

Because rounds expire, this problem is an ongoing one, even worse than that faced by Lightning channels: at least a Lightning channel can continue to be useful indefinitely, allowing a channel to be opened now and amortized gradually over many months. Secondly, because rounds expire, there is less flexibility as to when to create the new txouts backing these rounds: if fees are high for a week or two, users whose pool txouts are expiring have no choice but to (collectively) pay those high fees to maintain their custody over their funds. With Lightning channels, there is much more flexibility as to when to actually open a channel.

While the authors of Ark initially imagined a very optimistic scenario where new rounds every few seconds, initial bootstrapping will probably have to happen with use-cases that can afford to wait multiple hours for an Ark transaction to confirm, if transaction fees are not subsidized.

Interactivity

Non-custodial Ark is a highly interactive protocol: since your V-UTXOs expire, you have hard deadlines to interact with your ASP, or else the ASP could choose to take your funds. This interactivity can’t be outsourced either: while Lightning has watchtowers that discourage counterparties from trying to rip you off — even if your channel hasn’t been online — Ark coin owners must use their own private keys to refresh funds without trust. The closest thing possible in Ark to watchtowers would be to sign transactions allowing a watch tower to unilaterally withdraw your funds on-chain towards the expiration time, which has a significant transaction fee cost.

Consider what happens to a V-UTXO if the owner goes offline: after the round expires, the ASP needs to recover the funds to get their liquidity back for further rounds. If a V-UTXO owner goes offline, putting that V-UTXO on-chain has significant transaction costs, as the ASP now needs to recover funds at multiple levels of the V-UTXO tree. The ASP can recreate the unspent V-UTXOs in a new round. But this isn’t trustless from the perspective of the V-UTXO owners, as they can’t spend those V-UTXO’s without obtaining data9 from the ASP. The ASP could also simply record unspent V-UTXOs as a custodial balance. Or maybe even have a policy of seizing the funds!

Personally, I suspect that given the non-trivial cost of self-custody in Ark, many users will instead choose ASPs with a policy of rolling over funds into a new round and simply accept the potential for fraud at the end of each round. This is cheaper than proactively moving their funds early enough to guarantee safety in the event that, e.g., they fail to turn their phone on in time for their wallet to move the funds to a new round.

Advanced Ark

It may be feasible to reduce the liquidity requirements of Ark through more advanced covenants, if it is typical for liquidity to be used up part way through a round. For example, let’s suppose that 50% of the total V-UTXO value in a pool txout has been spent. If the ASP could redeem just that part of the round’s pool txout, they could recover liquidity quicker, reducing overall liquidity costs. While no concrete proposals on how to do this have been published, it certainly seems like it should be possible with Sufficiently Advanced™ covenants. Most likely through some kind of script revival soft-fork that adds many useful opcodes at once.

Similarly, through Sufficiently Advanced™ covenants the entire txout tree structure could be replaced with some kind of rolling withdrawal scheme, potentially offering space savings. We’ll cover this in a further section, as this technique is potentially useful for other schemes.

The end-of-round custody issue is another case where Sufficiently Advanced™ covenants could solve a problem: a covenant, in particular, a ZK-proof covenant, could force the ASP to recreate all unspent V-UTXO’s in the next round, removing the problem of custody reverting to them at the end of a round. While it is probably not possible to make this trustless, as the user will likely need some data from the ASP to spend their V-UTXO on the new round, it could prevent the ASP from financially gaining from fraud against offline users.

On-Chain Fee Payment In Unilateral Withdraw

Similar to Lightning, the economics of on-chain fee payment and the actual value of a V-UTXO after fees determine whether Ark usage meets our definition of an L2 via unilateral withdrawal, or fraud failing to benefit the ASP. We’ll discuss the specifics of this further when we discuss the txout tree design pattern.

Validity Rollups

A large class of sidechain-like constructs, generally proposed to use various forms of zero knowledge (ZK) proof technology to enforce the rules of the chain. The ZK-proof technology is the critical difference between validity rollup technology and other forms of sidechain: if the ZK-proof scheme works, the validity of the transactions can be guaranteed by math rather than trusting a third party. The “zero knowledge” aspect of a ZK proof is not actually a requirement in this use-case: it’s perfectly OK if the proof “leaks” information about what it is proving. It just happens to be that most of the mathematical schemes for this class of proof happen to be zero-knowledge proofs.

From the point of view of Bitcoin, a validity rollup scheme requires a covenant, as we want to be able to create UTXO’s for the scheme that can only be spent if the rules of the scheme are followed. This is not necessarily a decentralized process. Many validity rollup schemes are in fact entirely centralized; the rollup proof is proving that the centralized transaction sequencer followed the rules for a particular sequence of transactions.

As for what covenant… Zero-Knowledge Proof technology is still a very new field, with advancements still being frequently made. So it is highly unlikely that we will see any opcodes added to Bitcoin that directly validate any specific ZK-proof schemes. Instead it is generally accepted that specific schemes would instead use more general opcodes, in particular OP_CAT, to validate ZK-proofs via scripts. For example, StarkWare is campaigning to have OP_CAT adopted.

Validity rollups is such a large potential topic, with so many low-substance/high-hype projects, that we won’t discuss it further beyond pointing out what opcodes potentially make this design class viable.

BitVM

Very roughly speaking BitVM is a way to construct a lightning channel between two parties such that the rules of the Lightning channel are enforced by a zero-knowledge proof. Since it doesn’t actually need covenants to be implemented on Bitcoin today, and because it can’t directly be used to create an L2 system that scales beyond the 1-UTXO-per-user limit, we won’t discuss it further.

Hierarchical Channels

Hierarchical Channels10 aims to make channel resizing fast and cheap: “Hierarchical channels do for channel capacity what the LN does for bitcoin”. However they still don’t fundamentally exceed the 1 UTXO-per-user limit. They also don’t require any changes to the Bitcoin protocol anyway. So we’re not going to discuss them further. Their proponents should simply implement them! They don’t need our permission.

CoinPool

CoinPool allows multiple users to share a single UTXO, transfer funds between different users, and unilaterally withdraw. The CoinPool paper proposal requires three new softfork features, SIGHASH_ANYPREVOUT, a SIGHASH_GROUP allowing a signature to only apply to specific UTXOs, and an OP_MerkleSub to validate the removal of specific branches from a merkle tree; the latter could also be accomplished with OP_CAT.

At the moment CoinPool development seems to have stagnated, with the last commit to the specification website being two years ago.

Enigma Network

While I was asked to cover the Engima Network, there seems to be a lack of documentation as to what the proposal really is. Bitfinex’s blog post makes a lot of claims; the MIT page is empty. Since the blog post doesn’t really make it clear what exactly is supposed to be going on, we won’t discuss it further.

Mempool Considerations

Current mempool policy in Bitcoin Core is not ideal for L2 systems. Here we’ll go over some of the main challenges they face, and potential improvements.

Transaction Pinning

Ultimately an economic exploit, transaction pinning attacks, refer to a variety of situations where someone can intentionally (or unintentionally) make it difficult to get a desired transaction mined due to the prior broadcast of a conflicting transaction that does not get mined. This is an economic exploit, because in a true transaction pinning situation, there exists a desirable transaction that miners would profit from if they mined it; the conflicting pinning transaction is not mined in a reasonable amount of time, if ever.

The simplest example of pinning comes from the fact without full-RBF, transaction replacement can be turned off. Thus, we can have a low fee-rate transaction, with replacement turned off, that will not be mined yet can’t be replaced. Essentially 100% of hash power has fixed this issue by enabling full-RBF, and as of writing, full-RBF should be enabled by default in the next release of Bitcoin Core (after 11 years of effort!).

That leaves BIP-125 Rule #3 pinning, the only remaining pinning issue that is relevant to multiparty L2 protocols and unfixed in Bitcoin Core. For reference, BIP-125 Rule #3 states the following:

A replacement transaction is required to pay the higher absolute fee (not
just fee rate) than the sum of fees paid by all transactions being replaced.

This rule can be exploited by broadcasting a large, low fee-rate pinning transaction spending the output(s) relevant to the multiparty protocol (alternatively, a group of transactions). Since the transaction has a low fee-rate, it will not be mined in a timely fashion, if ever. Yet, since it has a high total fee, replacing it with a different transaction is uneconomical.

Rule #3 pinning is fairly easily fixed via replace-by-fee-rate, and this fix works in all situations. Unfortunately it’s unclear if RBFR will be adopted by Core in the near future, as they’ve spent a substantial amount of effort on a inferior partial solution, TRUC/V3 Transactions.

Fee Payment: RBF, CPFP, SIGHASH_ANYONECANPAY, Anchors, and Sponsorship

Since fee-rates are unpredictable, reliably and economically paying in situations where transactions are pre-signed is difficult. The gold standard for fee-payment is to use RBF, starting with an initial “low-ball” estimate, and replacing the transaction with higher fee versions as needed until it gets mined. For example, the OpenTimestamps calendar software has used RBF this way for years, and LND added support for deadline aware RBF in v0.18.

RBF is the gold standard for fee-payment because it is the most blockspace efficient in almost all11 situations: the replacement transaction(s) do not need any extra inputs or outputs, relative to what would have been necessary if the correct fee had been guessed the first try.

Efficiency is important, because inefficiencies in fee payment make out-of-band fee payment a profitable source of revenue for large miners; smaller, decentralized, miners can’t profit from out-of-band fee payments due to the impracticality and uselessness of paying a small miner to get a transaction confirmed. Out-of-band fee payment also seems to invite AML/KYC issues: at present, most of the out-of-band fee payment systems actually available right now require some kind of AML/KYC process to make a fee payment, with the notable exception of the mempool.space accelerator, which as of writing (Aug 2024), accepts Lightning without an account.

To make use of RBF directly in situations with pre-signed transactions, you need to pre-sign fee-variants covering the full range of potential fees. While this is quite feasible in many cases as the number of variants necessary is usually small12, so far the production Lightning protocol — and other proposed protocols — have opted instead to use Child-Pays-For-Parent (CPFP), usually via anchor outputs.

The idea behind an anchor output is you add one or more outputs to a transaction with a minimal or zero value, with the intent of paying fees via CPFP by spending those output(s) in secondary transactions. This of course is very inefficient when applied to protocols such as LN that have small on-chain transactions, almost doubling the total size of a ephemeral-anchor-output-using LN commitment transaction. It would be less of a concern when applied protocols making use of larger transactions, such as anything using OP_CAT to implement covenants.

A less-obvious problem with anchor outputs is the need to keep around additional UTXOs to pay fees with. In a typical “client” application, this can be a significant overall burden, as without the anchor outputs there is often no need at all to maintain more than one UTXO. Indeed, it is likely that some existing consumer-focused Lightning wallets are vulnerable to theft by the remote side of the channel in high-fee environments due to inability to pay fees.

SIGHASH_ANYONECANPAY can be used for fee payment in some cases by allowing additional inputs to be added to signed transactions; SIGHASH_SINGLE allows outputs to also be added. Lightning uses this for HTLC transactions. At the moment this practice can be vulnerable to transaction pinning if not handled carefully13, as an attacker could add a large number of inputs and/or outputs to a transaction to create a high-fee/low-fee-rate pin. RBFR fixes this issue; the approach used in TRUC/V3 transactions is unable to fix this issue. This style of fee-payment isn’t as efficient as RBF. But it can be more efficient than anchor outputs.

Finally there have been a variety of soft-fork proposals to add a fee sponsorship system to the Bitcoin protocol. This would allow transactions to declare dependencies on other transactions, such that the sponsor transaction could only be mined if the sponsored transaction was also mined (most likely in the same block). This could be much more efficient than a traditional CPFP as the sponsor transaction could declare that dependency using significantly less vbytes than the size of a transaction input.

Replacement Cycling

The Replacement Cycling Attack14 takes advantage of transaction replacement to attempt to replace a desired L2 transaction long enough to get an undesired one mined instead. Essentially, replacement cycling attacks are, for the attacker, an alternative to transaction pinning techniques in that they aim to prevent a desired, honest, transaction from being mined long enough to allow an undesired, dishonest, transaction to be mined instead. Unlike transaction pinning attacks, a replacement cycling attack can’t happen by accident.

The canonical example is against a Hashed-Time-Locked-Contract (HTLC). While it’s easy to think of an HTLC as being a contract to either allow a transaction to be spent via the revealing of a preimage, or via a timeout. In reality due to Bitcoin scripting limitations, an HTLC allows a transaction to always be spent via revealing a preimage, and then after a timeout, additionally via the timeout mechanism.

Replacement cycling takes advantage of this using the preimage after the timeout, to replace the transaction trying to redeem the HLTC output via the timeout mechanism without the victim learning the preimage. A successful replacement cycling attack does this long enough for a different channel’s HTLC to time out.

A main challenge in profitably exploiting replacement cycling is that each replacement round costs money. A deadline aware Lightning implementation will spend higher and higher fees attempting to spend the expired HTLC output before the expiry of the next HTLC output in turn expires. Secondly, anyone can defeat the attack by simply rebroadcasting the replaced transaction15 once the replacement cycle is finished.

As with transaction pinning, replacement cycling is also an economic exploit on miners. At the end of each replacement cycle, there exists a transaction that has been removed from mempools, yet is fully valid and could be mined if only miners still had it in their mempools.

Feature Patterns and Soft Forks

Now that we’ve given you an overview of the variety of covenant-dependent L2 systems out there, and mempool challenges, we’re going to try to distil that information down to a set of notable soft fork features (mainly new opcodes) and design patterns that these L2 systems share. For soft-fork proposals, we’ll also discuss the proposal-specific technical risks and challenges of getting each proposal deployed.

OP_Expire

We’ll get this out of the way first. OP_Expire was proposed16 as a simple way of eliminating the replacement cycling attack by fixing the problem at the source: the fact that HTLC’s can be spent in two different ways at once. In the context of L2 systems, this is relevant for anything using an HTLC-like mechanism, and possibly other use-cases. OP_Expire would make it possible for a transaction output to be unspendable after a point in time, allowing the HTLC spending conditions to be a true exclusive-OR rather than a “programmers OR”.

An actual OP_Expire soft-fork would most likely consist of two features, similar to how the OP_CheckLockTimeVerify and OP_CheckSequenceVerify opcodes come in two parts:

  1. A expiration height field for transactions, most likely implemented in the taproot annex.
  2. A OP_Expire opcode that checks that the expiration height is set to at least the desired height.

While OP_Expire itself barely qualifies as a covenant, it does appear to be useful for many covenant-dependent L2 systems. However, it may not be useful enough given that replacement cycling can also be mitigated by altruistic rebroadcasting15

A very notable challenge with deploying and using OP_Expire is reorgs: the Bitcoin technical community, starting with Satoshi17, has tried to ensure that the Bitcoin consensus protocol is designed in such a way that after a deep reorg, previously-mined transactions can be mined into new blocks. This design principal attempts to avoid the nightmare scenario of a large number of confirmed coins becoming permanently invalid — and thus people relying on those coins losing money — if a consensus failure leads to a large reorg.

In the event of a large reorg, transactions using expiration could become unminable due to their expiry height being reached. The OP_Expire proposal, proposes to mitigate this issue by treating the outputs of expiration-using transactions similarly to coinbase transactions, also making them unspendable for ~100 blocks.

A significant barrier to deploying transaction expiration is coming to consensus on whether or not this trade-off is acceptable, or even needed. The types of transactions where OP_Expire would be useful already involve long-ish timeouts where user funds are frozen. Adding even more time to these timeouts isn’t desirable. Also, double-spends have always been another way to invalidate coins after a reorg: with the increased use of RBF and proposed use of keyless anchor outputs, would transaction expiration make a significant difference?

SIGHASH_ANYPREVOUT

BIP-118 proposes two new signature hashing modes, both of which do not commit to the specific UTXO being spent. SIGHASH_ANYPREVOUT, which (essentially) commits to the scriptPubKey instead, and SIGHASH_ANYPREVOUTANYSCRIPT, which allows any script. As discussed above, this was originally proposed for use by LN-Symmetry to avoid the need to separately sign every single prior channel state that may need to be reacted to.

SIGHASH_ANYPREVOUT is also potentially useful in cases where we want to use pre-signed RBF fee-rate variants in conjunction with pre-signed transactions, as the fact that the signature no longer depends on a specific txid avoids a combinatorial explosion of fee-rate variants. However, the current BIP-118 proposal doesn’t address this usecase, and may be incompatible with it due to the fact that SIGHASH_ANYPREVOUT is proposed to also commit to the value of the UTXO.

An initial objection to SIGHASH_ANYPREVOUT was the idea that wallets would get themselves into trouble by using it in inappropriate ways. The issue is that once a single SIGHASH_ANYPREVOUT signature has been published, it can be used to spend any txout with the specified script. Thus if a second outputs with the same script is accidentally created, SIGHASH_ANYPREVOUT allows for a trivial replay attack to steal those coins. However, as there are so many other footguns inherent to wallets and L2 implementations, this concern seems to have died out.

At the moment, the general technical community seems reasonably positive about implementing BIP-118. However, as discussed above in our discussion of LN-Symmetry, there is debate about whether it’s main use-case — LN-Symmetry — is actually a good idea.

OP_CheckTemplateVerify

Our first covenant-specic opcode proposal, OP_CheckTemplateVerify — or “CTV” as it’s commonly referred to — aims to create a very specific, restricted, covenant opcode by doing exactly one thing: hashing the spending transaction in a specified way that does not commit to the input UTXOs, and checking the resulting digest against the top stack element. This allows the spending transaction to be constrained in advance, without making true recursive covenant restrictions possible.

Why aren’t recursive covenants possible in CTV? Because hash functions: the CTV checks the spending transaction against a template hash, and there’s no way18 of creating a template containing a CTV with a hash of itself.

That said, this isn’t necessarily a real limitation: you can easily hash a chain of CTV template hashes to a depth of tens of millions of transactions in just a few seconds on a modern computer. With relative nSequence timelocks and the limited blocksize actually reaching the end of such a chain could easily be made to take thousands of years.

The current CTV proposal in BIP-119 has only one hashing mode, known as the DefaultCheckTemplateVerifyHash, which essentially commits to every aspect of the spending transaction in the template hash. From a practical point of view this means that in many circumstances the only available mechanism for fee payment will be CPFP. As mentioned above, this is a potential problem due to it making out-of-band fee payment a non-trivial cost savings in cases where the CTV-using transactions are small.

It’s fair to say that CTV has the broadest support among the technical community of any covenant opcode proposal because of its relative simplicity and wide range of use-cases.

LNHANCE

One proposal to implement CTV is to combine it with two more opcodes, OP_CheckSigFromStack(Verify) and of OP_InternalKey. The problem is, as of writing, the documentation in that pull-req and associated BIPs simply isn’t sufficient to argue for or against this proposal. The BIPs are entirely lacking any rational for what the opcodes are expected to actually do in real-world examples, let alone in-depth example scripts.

While the authors probably have good reasons for their proposal, the onus is on them to actually explain those reasons and justify them properly. Thus we won’t discuss it further.

OP_TXHASH

Similar to CTV, this proposal achieves a non-recursive covenant functionality by hashing data from the spending transaction. Unlike CTV, the TXHASH proposal provides a “field selector” mechanism, allowing flexibility in exactly how the spending transaction is constrained. This flexibility achieves two main goals:

  1. Enabling the addition of fees to a transaction without breaking a multi-tx protocol.
  2. Multi-user protocols where users only constrain their own inputs and outputs.

The main problem with OP_TXHASH is that the field selector mechanism adds quite a lot of complexity, making review and testing challenging compared to the much simpler CTV proposal. At the moment there simply hasn’t been much design analysis on how beneficial the field selector mechanism would actually be, or how exactly it would be used. Thus we won’t discuss it further.

OP_CAT

The concatenation operator, that concatenates the top two elements of the stack and pushes the concatenated result back on the stack. Bitcoin originally shipped with OP_CAT enabled. But Satoshi quietly removed it in 2010, probably due to the fact that the initial implementation was vulnerable to DoS attacks due to the lack of restrictions on the size of the resulting script element. Consider the following script:

DUP CAT DUP CAT...

Without an element size restriction, each DUP CAT iteration doubles the size of the top stack element, eventually using up all available memory.

Concatenation is sufficient to implement many types of covenants, including recursive covenants, by doing the following:

  1. Assemble a partial transaction, without witness data, on the stack with one or more invocations of OP_CAT (and whatever covenant-specific logic is needed).
  2. Validate that the transaction on the stack matches the spending transaction.

As it turns out, by abusing the math of Schnorr signatures, it’s possible to perform the second step with OP_CheckSig via carefully constructed signatures. However it’s more likely that an OP_CAT soft-fork would be combined with OP_CheckSigFromStack, allowing the second step to be performed by validating that a signature on the stack is a valid signature for the transaction19, and then reusing that same signature with OP_CheckSig to validate that the spending transaction matches.20

The fact that we only need to assemble the transaction without witness data is a key point: the covenant only needs to validate what the transaction does — its inputs and outputs — not the witness data (if any) that actually makes it valid.

Modulo script size limits, the combination of OP_CAT and OP_CheckSigFromStack is sufficient to build many types of covenants, including recursive covenants. Compared to more efficient solutions like CTV it is more expensive. But the difference in cost is less than you would expect!

Roughly speaking, using OP_CAT to do this requires all of the non-witness part of the spending transaction to be placed on the stack via the witness. For typical CTV use-cases such as txout trees, the spending transaction will have no witness data at all. Since witness space is discounted 75%, that increases our effective transaction fee for the child transaction by only 25%. Not bad!

Is OP_CAT Too Powerful?

This is probably the biggest political and technical obstacle to deploying OP_CAT: it’s very hard to predict what use-cases will be made possible by OP_CAT. And once the “cat” is out of the bag, it’s very hard to put it back in.

A great example is how OP_CAT is claimed to be sufficient to allow reasonably efficient and secure STARK verification to implemented in Bitcoin script. Since STARKs are capable of proving extremely general statements, making it possible to implement STARKs efficiently has significant ramifications that go beyond the scope of L2 systems as it would allow many different systems to be built on top of Bitcoin. A strong argument against OP_CAT is that these use-cases may not be on a whole good for Bitcoin users.

The creation of harmful centralizing Miner Extractable Value is a key potential problem, termed “MEV that is evIL” (MEVil) by Matt Corallo. In short, MEVil is any circumstance where large miners/pools can extract additional value by employing sophisticated transaction mining strategies — beyond simply maximizing total fees — that are impractical for smaller miners/pools to adopt. The shear complexity of potential financial instruments that could be created with OP_CAT makes ruling out MEVil very difficult. Significant MEVil has already appeared on Bitcoin from token auction protocols; fortunately that specific case was defeated via the adoption of full-RBF.

In addition to the potential of MEVil, there are many other concrete OP_CAT use-cases that are potentially harmful. For example, the Drivechains proposal has been reviewed here, and is widely considered to be harmful to Bitcoin. It is believed to be possible to implement Drivechain’s with OP_CAT. Another example is token proposals such as Taproot Assets. While it is impossible in general to prevent them from being implemented with client side validation, there are proposals to implement them with OP_CAT in ways that are potentially much more attractive to end users, while also using much more blockspace, which could potentially outbid “legitimate” Bitcoin transactions. These use-cases may also raise legal issues due to how often token protocols are used for financial fraud.

Incremental Hashing

For covenants, OP_CAT would be primarily used to concatenate data, and then hash it. Another way to achieve this same goal is with some kind of incremental hashing opcode that takes a SHA256 midstate of some kind, and hashes more data into it; SHA256 itself operates on 64-byte blocks. There are many possible designs for incremental hashing opcodes.

One important design decision is whether or not to expose the actual midstate bytes on the stack in some kind of canonical form, or represent them in some new kind of opaque stack item type whose actual byte value can’t be directly manipulated. SHA256 is specified for a particular, fixed, initialization vector and it appears to be unknown whether or not SHA256’s cryptographic properties hold true if arbitrary midstates/initialization vectors are allowed.

Of course, since incremental hashing can do pretty much what OP_CAT can do, just more efficiently, it shares all the concerns about OP_CAT being too powerful.

Script Revival

OP_CAT was one of 15 opcodes that Satoshi disabled. In addition to restoring OP_CAT, Rusty Russell is proposing21 to essentially restore Bitcoin’s script to “Satoshi’s Original Vision” by re-enabling most of those opcodes, adding DoS limits, and potentially adding a few more in the same soft-fork. In particular, an OP_CheckSigFromStack is likely.

While OP_CAT alone does make (recursive) covenants possible, a full “script revival” would make more sophisticated covenants possible — and much easier to implement — as parts of the spending transaction could be manipulated directly. For example, you could imagine a covenant script that uses arithmetic opcodes to ensure that the total value of the txouts in the transaction adheres to some desired property.

Again, script revival raises all the same concerns, and more, about being overly powerful that OP_CAT alone does.

Simplicity

Similar to Script Revival, Simplicity is relevant to L2’s and covenants by making it possible to do anything. Unlike Script Revival, a Simplicity soft-fork would add an entirely new programming language to Bitcoin’s scripting system based on nine primitive operators known as combinators.

In practice, Simplicity is both too simple, and not simple at all. The primitive combinators are so ridiculously low level that basic operations like addition have to be laboriously implemented from scratch; raw Simplicity would be exceptionally verbose in practice. Thus, any real usage of Simplicity would make use of a system of code substitutions, similar to library function calls, known as jets. This poses a practical/political problem: how do you decide on which jets to implement? Most likely jets would be implemented in C++, like any other opcode, requiring a soft-fork for each new jet.

OP_FancyTreeManipulationStuff

There’s a large variety of relatively specialized opcodes that have been proposed to do tree manipulation in a space efficient manner for covenant dependent L2 proposals. For example, the Coinpools have proposed both TAPLEAF_UPDATE_VERIFY and OP_MERKLESUB, both of which manipulate taproot trees in ways necessary for the Coinpools proposal, and the MATT proposal has proposed a OP_CheckContractVerify opcode that, basically, verifies statements about merkle trees.

For the purposes of this article, we don’t need to go into detail about each one of these many proposals. Rather, we can talk about them as a group: they’re all relatively use-case specific proposals that make one class of L2 possible, ideally without unintended side-effects. They all have the advantage of efficiency: they all use less blockspace than achieving the same goal with more generic opcodes such as OP_CAT manipulation. But they all have the disadvantage of adding complexity to the script system, for a potentially niche use-case.

The same dynamic would happen if Bitcoin adopted the Simplicity scripting system. The equivalent to opcodes in Simplicity is adding a jet for a commonly used pattern. Again, implementing jets for use-case-specific operations like tree manipulation has similar pros and cons as implementing complex opcodes for use-case-specific operations.

Fund Pools

All L2 systems that try to have multiple users share a single UTXO can be thought of as some kind of multi-user fund pool, with users being in possession of some kind of right of withdraw. Potentially, there will also be a mechanism to add funds to the pool (beyond creating the pool with funds pre-assigned).

For a fund pool to be useful, it must have some kind of “share data state” associated with it: how is the txout value split up? If the fund pool is to evolve over time, that state must also change as funds are added or removed from the pool. Since we’re building on Bitcoin, adding or removing funds from the pool will inevitably involve spending the UTXO the pool controls.

Remember that the Bitcoin consensus system itself is based on validation of state changes: transactions prove via their witnesses that changes to the UTXO set state are valid; proof-of-work lets us come to consensus over which set of transactions is correct. This means that fund pools are themselves also going to be based on validation of state changes: we’re proving to every Bitcoin node that the rules for the fund pool are being followed on every state change.

But there’s another key aspect to trustless L2 fund pools: when the state of the fund pool changes, the system must inherently publish sufficient data for users participating in the fund pool to recover their funds. If we haven’t done that, then our system fails to provide unilateral withdrawal, without the cooperation of third parties. Many rollup-based schemes fail here: they suffer from data availability failures, where the user is unable to recover their funds if third-party coordinators go offline, because they have no way of getting the data necessary for them to construct a valid fund recovery transaction.

With these constraints in mind, what data structures are fund pools going to be based on? Inevitably, they’re all some kind of tree. Specifically, some kind of merkle tree. They have to be a tree, because that’s pretty much the only scalable data structure in computer science; they have to be merkelized, because that’s basically the only reasonable way to cryptographically commit to the state of the tree. Finally, updates to the tree are inevitably going to be published to the Bitcoin blockchain, because that’s the one publication medium all L2 users share, and the only one that we can force users to publish on to move coins. And because any covenant implementation is going to need parts of the tree to validate that the rules of the covenant are being followed.

So, with the high-level theory out of the way, how does this actually translate into Bitcoin scripts and transactions?

Individual Pre-Signed Transactions

The degenerate case of a tree, with exactly one leaf in it. Here the state of our fund pool can change state, roughly speaking, once. For example, a standard Lightning channel falls into this category, and once opened, can only be closed. The data that is published when a channel is closed is the transaction itself, which is sufficient information for the counterparty in the channel to learn the txid from blockchain data, and recover their funds by spending them.

The only “covenant” required here is the most basic covenant: the pre-signed transaction.

Txout Trees

The next, more complex, design pattern we see in fund pools is the txout tree. Ark is a notable example. Here the fund pool can be split up by spending the root UTXO in a tree of pre-defined transactions, enforced with simple covenants like pre-signed transactions or CTV, splitting up the value of that UTXO into smaller and smaller amounts until leaf nodes are reached that are spendable by the rightful owners.

It’s important to recognize that the purpose of the txout tree is to give users options as to how to recover their funds, and those options come at a cost: a txout tree will always be a more expensive way to split up a pool of funds, returning them to their owners, than simply splitting up the UTXO in a single transaction. Each layer in the tree adds cost because of the bytes used in the txouts and txins necessary to create that layer.

So, what kind of options might a txout tree provide? Again, Ark is a great example: we don’t want the on-chain redemption of a single V-UTXO to require every single V-UTXO to be put on chain. By using a tree, redemption can instead split up the tree into smaller parts until the desired V-UTXO is put on chain.

Similar to the individual pre-signed transaction case, the information being published is the transactions themselves, which informs other users’ wallet how to spend their funds if necessary.

The scalability of txout trees has interesting economies of scale. The cost for the first V-UTXO to be put on chain, in a fund pool with \(n\) V-UTXOs, is roughly \(\log_2(n)\) times more expensive than a single transaction as \(\log_2(n)\) levels of split transactions must be put on chain. However, once the first V-UTXO is put on chain, subsequent V-UTXOs become cheaper to redeem on-chain because someone else has already paid the cost of getting the intermediary transactions mined.

Recall that the total number of elements in a binary tree with \(n\) leaves is \(2n\). This means that to put all V-UTXOs on chain, the total cost to do so via a txout tree would be a small multiple of the total cost to do so in a single transaction. Surprisingly efficient!

Or maybe not… If the total size of the fund pool redemptions are sufficiently high, they may represent a non-trivial demand on total overall blockspace. Blockspace is a supply and demand system, so at some point fees will go up due to high demand. At the extreme, it’s quite possible to create txout trees so big and so deep that actually redeeming every V-UTXO in the tree is impossible.

An open question with txout trees is who pays the fees, and how? One obvious solution is to use keyless anchor outputs on the leaf transactions, and allow whomever wants the leaf transactions to get mined to pay the fees via CPFP. In some use-cases the V-UTXOs themselves can be spent immediately after creation, without a CSV delay, so the V-UTXOs themselves could be spent to add fees via CPFP.

RBF is complex to implement due to permission: the obvious place to take fees for RBF from is the V-UTXO value. But how do you ensure that only the owner has the ability to sign for a higher fee transaction? In many circumstances it’s not obvious how to do this in a way that is more efficient than a keyless anchor output. However, failing to do that does pose serious challenges for schemes used by end-user wallets, that may not have a UTXO to spend to perform a CPFP, if the V-UTXOs themselves can’t be spent immediately.

Finally, careful thought needs to be put into what incentives there are in txout tree systems, taking fee payment into account. For example, in an Ark like system, if a set of V-UTXOs individually cost too much money to be worth taking to on-chain V-UTXOs, an uncooperative coordinator could refuse to allow those V-UTXOs to be redeemed off-chain, and then make a profit by stealing the value of those V-UTXOs in a single UTXO spend once a a timeout is reached.

If this is the case, arguably such a system would fail our criteria to be an L2 for small V-UTXOs.

Balance Based Schemes

The state machine of a txout tree is still relatively simple: either the fund pool exists, or it is spent, to create two or more smaller fund pools. With more advanced covenants we could instead treat the fund pool as an evolving balance, with the ability to add and subtract funds from that balance.

To do this we need to implement a non-trivial state machine. But we also need what is essentially a shared database. Why? Because the goal here is to share one UTXO across many different owners. Finally, if we’re actually going to get a scalability improvement, we must do so in a way that puts as little as possible of that ownership data on chain.

These requirements inherently lead us to some kind of tree-like merkelized data structure, such as a merkle sum tree. Manipulating that data structure is inherently going to require something like OP_CAT, some kind of zero-knowledge proof verification opcode, or a purpose specific tree manipulation opcode.

Interestingly, as in txout trees, you can’t do better than order \(\log(n)\) scaling while maintaining similar security properties. Why? Let’s suppose we had a hypothetical OP_ZKP which through some advanced mathematics, needed a mere 32 bytes to prove any statement. While this zk-proof could prove that the merkelized data structure had been manipulated according to the rules of the layer 2 system, it would fail to provide the data necessary for the next user to also make a state change. This fails our preferred criteria of enabling unconditional withdrawal: at best one user might be able to achieve an unconditional withdrawal. But no further users could do so.

By contrast, if the modified parts of the merklized data structure are published via the covenant scriptsig — e.g. the sibling digests in a merkle tree — the next user has enough data to update their understanding of the system state and themselves make an unconditional withdrawal.

A potential way around this problem is if the covenant requires proof of publication on a different publication medium than the Bitcoin chain. However, the security guarantees will be weaker than is possible via Bitcoin.

Finally, notice how txout trees and a balance based approach can be combined. If the data structure being manipulated is a txout tree, funds could be added to the txout tree by spending the output and adding new funds, with a covenant script that validates that the funds were in fact added to the txout tree. Equally, funds can be removed by all the mechanisms normally available to a txout tree. Advanced Ark is an example of this class of scheme.

Failure Data Ratio

L2’s achieve scaling by adding an interactivity requirement in adversarial situations. In nearly all cases this means that honest parties in the protocol have deadlines by which they need to get transactions mined; if the deadlines are not met, funds can be stolen.

The maximum block capacity in all decentralized (and centralized) blockchains is limited by technical constraints. In Bitcoin, the maximum blocksize is such that Bitcoin operates essentially at capacity ~100% of the time. Since Bitcoin mining acts as an auction system, auctioning off blockspace to the highest bidder, in practice this means that the minimum fee-rate to get a transaction mined goes up and down as demand increases and decreases.

Fee-rate always factors into L2 economics and failure modes. For example, in Lightning “dust-sized” HTLCs that are too small to be profitably redeemed on-chain use a different security model than larger HTLCs. While the Lightning protocol doesn’t properly implement this yet, in theory this threshold should be dynamic, based on fee-rates as they go up and down, ideally to the point where a party could choose whether or not an HTLC even exists in a given commitment transaction based on fee-rate.

A variety of attacks have been proposed where this situation is intentionally triggered on Lightning, such as flood and loot22 and the mass exit attack23. Since Bitcoin blockchain capacity is shared across all use-cases, attacks between different L2 systems are also possible: eg triggering a mass exit on Ark to profit from Lightning channels.

L2’s that share UTXO’s amongst multiple users inherently make these problems potentially worse, as the worst case blockspace demand during a failure is proportionally higher. As of writing, we’ve never actually seen large scale failures on Lightning where large numbers of channels had to be closed at once. There is a good argument that we should get additional operational experience with Lightning and its approximately 1-UTXO-per-user scaling, before pushing the limits even further with UTXO sharing schemes.

Secondly, before new UTXO sharing schemes are widely adopted, careful research should be done on the potential profitability of attacks during high demand for blockspace. For example, in a system like Ark where the ASP can redeem funds using much less blockspace than other parties, it may be the case that intentionally triggering high fee-rates and then seizing funds that can’t be profitably unilaterally withdrawn is a profitable fraud, violating both our conditions for a true L2 system.

Consensus Cleanup

There’s a number of things that Satoshi got wrong in the initial Bitcoin protocol, in particular, scripting DoS attacks, the timewarp attack, and issues with the merkle tree. Previously, a number of other consensus bugs have already been fixed with soft-forks, such as the switch to evaluating time-based nLockTime’s against the median time past, (attempting to) fix the duplicate txid issue, etc.

The most recent soft-fork, taproot, had a relatively contentious deployment process, taking quite a long time to actually get deployed. An argument for doing a consensus cleanup soft-fork first, prior to enabling any new opcodes or other features for new types of L2’s, is that we’d learn more about how willing the wider community is to implement what should be a relatively uncontroversial soft-fork that arguably benefits everyone.

Testing Soft-Fork Dependent L2’s

Developers do not need to wait for a soft-fork to actually happen to test out their ideas. One particularly sophisticated approach being used by the Ark developers in covenant-less Ark is to simulate the covenants they need with pre-signed transactions. This allows them to test out the ideas of Ark with real BTC, on mainnet, with the same trust characteristics, as Ark is expected to achieve with covenants. The trade-off is that covenant-less Ark requires all parties to be online to sign the pre-signed transactions. Since clArk does work with real BTC, it may prove to even be useful enough to use in production for certain use-cases transfer that can tolerate the interactivity trade-off.

A simpler approach is to simply pretend that certain parties can’t do the actions that covenants would prevent. For example, if a proposed protocol wants to use CTV to enforce that a txout tree is spent in a transaction tree, each use of CTV could be replaced with a NOP or CheckSig. While in reality the txout tree isn’t actually being enforced, every bit of code interacting with the tree and each party can be tested as though it is, and since NOP and CheckSig are allowed in standard scripts, the protocol can be tested on mainnet with real funds.

Potential Soft-Forks

What’s the path forward? Here we’re going to chart out all the main L2 schems we’ve analyzed, and what soft-forks are useful (U) or required (R) to make these L2 schemes successful. As discussed above, OP_CAT (and by extension, Script Revival, which includes OP_CAT), can emulate all of the other soft-forks in this list — with the exception of OP_Expire and Fee Sponsorship — so where a project’s needs are most efficiently met by some other soft-fork directly we won’t include OP_CAT.

We’re also going to leave off all the proposed merkle tree manipulation opcodes. They’re all too niche, too use-case-specific, to have a significant chance of getting adopted at this time. To the extent that these opcodes are useful, implementing their effects via OP_CAT and/or Script Revival is a much more likely path to adoption.

  OP_Expire SIGHASH_ANYPREVOUT CTV OP_CAT Script Revival
Lightning U U U    
Channel Factories U   U    
LN-Symmetry U R      
Ark U   R    
Advanced Ark U   U   R
Validity Rollups       R U

CTV is the clear winner here, followed by SIGHASH_ANYPREVOUT (OP_Expire is useful to many things by being a replacement cycling fix, but not essential). CTV wins because so many things fit into the design pattern of “make sure the spending transaction matches this template”; even OP_CAT constructions can efficiently make use of CTV.

Unlike OP_CAT, CTV doesn’t appear to raise much risk of unintended consequences beyond encouraging out-of-band fee payments in certain cases. This isn’t ideal. But no-one has come up with a widely supported alternative.

My personal recommendation: do a consensus cleanup soft-fork, followed by CTV.

Footnotes

  1. At least, it should be! It is likely that Lightning implementations exist that don’t properly limit the total value of dust HTLCs in flight. 

  2. Over a decade ago I proposed a class of L2 systems — fidelity bonded banks — that met the requirement of third-parties being unable to profitably steal. But not the preferred requirement of unilateral withdrawal. I’ll be the first to say that Lighting is a much better idea than fidelity bonded banks! 

  3. You might ask why does Lightning requires at least one UTXO per end user, when a channel has two users? The reason is Lightning is the Lightning Network: with just one UTXO per two users, the best you can do is a bunch of isolated pairs of users, with no wider network. One UTXO per node is just enough to form a chain of nodes, users, which is technically fully connected. Of course, in practice you actually need more than one-UTXO-per-user, as a single chain isn’t useful beyond a handful of nodes as it’s not technically feasible to route payments though hundreds of hops. 

  4. Alex Bosworth independently tweeted an analysis coming up with essentially the same numbers as me in 2022! I either didn’t see — or had forgotten about — his analysis when I wrote mine. So it’s good to see us coming up with the same numbers. 

  5. The fact that splice-ins and splice-outs can be combined in the same transaction also reduces the number of additional inputs necessary, as splice-outs can supply the funds for splice-ins. Also note how to the external observer, such transactions are indistinguishable from coinjoins. 

  6. For the purpose of RBF, a V-UTXO scheme might actually have a variety of pre-signed transactions. But the purpose of those multiple transactions would be to perform a single economic transaction. Not — as in Lightning — to allow multiple economic transactions to happen. Of course, this is a somewhat fuzzy definition; maybe someone will come up with a scheme that combines the V-UTXO and channel balance ideas! 

  7. That is, an Ark transaction either does or does not happen; it could take hours or even days to setup. But either all intended funds are moved or none are. 

  8. Remember that if the ASP refuses to allow the V-UTXO to be spent into a new round, the only alternative the user has is to unilaterally withdraw on-chain. Not only is this expensive, it takes time: the user’s transactions might not confirm before the ASP is able to spend the pool txout. If the round is big enough, they may even end up competing with other users in that same round for blockspace. 

  9. An example of a proof-of-publication problem, which we will discuss later. 

  10. [Lightning-dev] Resizing Lightning Channels Off-Chain With Hierarchical Channels, jlspc, Mar 18th 2023. Also on github

  11. In certain rare cases where the inputs and outputs of a transaction are fixed due to some constraint, RBF can be less efficient than an out-of-band fee payment, sponsor payment, etc, allowing the transaction to be even smaller. 

  12. \(1.05^{75} = 1272\), so 75 fee variants differing by 5% each would cover more than the entire variation of next-block fee-rates Bitcoin mainnet has had in its entire lifetime. 

  13. SIGHASH_ANYONECANPAY can be made safe with respect to transaction pinning by only using it for some signatures, thus preventing attacking parties from being able to add inputs. Lightning does this with HTLCs transactions, by having only one party sign with SIGHASH_ANYONECANPAY

  14. While the canonical source is Antoine Riard’s paper, I strongly recommend reading mononautical’s replacement cycling tweet thread for an easier to understand explanation. 

  15. [bitcoin-dev] Altruistic Rebroadcasting - A Partial Replacement Cycling Mitigation, Peter Todd, Dec 9th 2023  2

  16. [bitcoin-dev] OP_Expire and Coinbase-Like Behavior: Making HTLCs Safer by Letting Transactions Expire Safely, Peter Todd, Nov 2nd 2023 

  17. Coinbase transactions are inherently tied to the blocks they exist in. Satoshi made these outputs unspendable until the block they are contained in has reached 100 confirmations. Satoshi also wrote about this exact problem on bitcointalk, explaining why an OP_BLOCKNUMBER opcode would be dangerous. 

  18. If you do find a message containing it’s own SHA256 hash, I strongly recommend selling your Bitcoin for guns, food, and ammo, and finding a nice cave to camp out in. 

  19. Technically we would actually build up the serialization that CheckSig uses for our chosen SIGHASH mode, which is different from the way transactions are serialized to generate the txid. 

  20. If CTV is available, you could also just assemble the CTV template, hash it, and check it with CTV. This is useful if the covenant-specific logic is more complex than what CTV alone can do. 

  21. Rusty first publicly proposed the script revival via a talk at the bitcoin++ Austin 2024 conference. As of writing, he is also working on a draft BIP and implementation. 

  22. “Flood & Loot: A Systemic Attack On The Lightning Network”, Jona Harris and Aviv Zohar, Jun 15th 2020 

  23. “Mass Exit Attacks on the Lightning Network”, Cosimo Sguanci and Anastasios Sidiropoulos, Feb 7th 2024