Bitcoin zk-rollups

Updated 2022-05-19: Fixed a few typos and other notes after it was shared on Twitter :)

Updated 2022-11-04: I noticed an inconsistency with a gadget name I made up.

For the uninitiated, a “rollup” is a scaling solution developed in the Ethereum ecosystem that relies on performing the “execution” of transactions off-chain, posting unexecuted transaction data on-chain, and committing to the state of this system to allow trustlessly entering and leaving the system. It comes in two major variations: the “optimistic” rollup that relies on a fraud game to prove the state commitments are correct with respect to the the transaction history, and the “zk” rollup that uses a zero knowledge proof to verify the state commitments. Rollups give very strong security guarantees and allow for rich kinds of state transition logic.

An advantage of this approach is that much of the complexity of state tracking is shifted into off-chain bookkeeping. This can allow for significant reductions in the size of data posted to the chain, most impressively reducing the size of an account-to-account transfer to as little as 12 bytes when amortized along with a large number of other transactions. More on that here. Another advantage to rollups is that they can be optimized for specific applications without having to shift complexity to the base chain.

While current rollup implementations are very dependent on how Ethereum-style contracts store state and represent settlement, there is no fundamental reason that a UTXO-based ledger cannot support rollups. In this article I aim to explore and sketch out some designs for how a zk rollup might function on a future version of Bitcoin that supports some additional script primitives.

When reading this you may want to have a working understanding of how both Lightning and Plasma work at a low level, as rollups are an extension of those concepts. The link above touches on them at a high level. Most of my understanding of how they work in practice is based on work done by the good folks at Matter Labs on zkSync.

Disclaimer: these code snippets have not been actually tested, they’re here to illustrate that it’s possible and to describe the flow of data, but they most certainly have some bugs

Prereqs

I expect that practical implementation of a prototype might happen on Liquid, as Elements has many of the opcodes we would use. This zk-rollup design relies heavily on recursive covenants. It may be possible to do some hacks with bounded partially recursive covenants, but that’s unclear at this time. An alternative to using Liquid-style covenants may be to use something like the OP_TXHASH proposed a few times on the bitcoin-dev mailing list, but this has its own challenges.

The other major component is a zk-SNARK verification primitive. It has been put forward that Bitcoin should permit some kind of elliptic curve pairing primitive that would enable this, however in practice it would still be very expensive to build a script to verify a zero-knowledge proof (zkp) and actually execute it.

It has also been discussed that introducing the concept of jets to enable more performance without overspecializing. It could be done using Simplicity, but I’m leaving that out of scope as most of the ideas discussed here would transfer over.

Another more minor limitation is the max stack element size of 520 bytes. This is really annoying but shouldn’t fundamentally limit us if we’re clever. Tapscript ups some of the limits but increasing this to be on the order of a few kilobytes would be much nicer. This design is made assuming we don’t have this limitation, but see the Replacements section near the end so some of them might be able to be circumvented. But generally speaking most of these limits need to be removed.

For ease of circuit design, it would also be beneficial to have SNARK-friendly hash ops, but we leave those out here.

And finally, since we have a lot of different spend paths, support for something like OP_EVAL would help significantly reduce costs in some areas, especially for some more advanced variants discussed later.

Basic design

The working principle is that since we have a zkp that can verify all of the constraints of the rollup’s state transition function, we can treat that as a black box for asserting the structure of the transaction and only use enough opcodes to enforce the recursive covenant and expose the transaction data to the script. The particular design of the circuit is left as an exercise to the reader, because I’m not a cryptographer. :)

Most of this description considers a hypothetical rollup with a relatively simple data model and a single centralized sequencer. But these are mostly not fundamental restrictions to the design. Here I take a very “brute force” approach with a very complex and monolithic script, it may turn out to be that a more elegant approach could be had using taproot, OP_TLUV, and some other novel primitives. But that’s also left as an exercise for the reader.

One major downside is that we have to include the verification key in the witness on every update. For PlonK, verification keys are ~1 KiB, which is large, but not absurdly so, so this is considered a minor barrier.

Gadgets

Storing state

There’s a certain amount of state that we need to carry forwards between steps of the protocol. For the purposes of this gadget we can ignore its internal structure and discuss just how we assert that it’s carried forward properly.

The structure of the script should push the current state onto the stack in the first opcode, and be structured to use it from there. The rest of the script from there on would be static. The offset of the “real” start of the script is as <programstartoff> and [push_op] is the op to push a buffer of the size of the state (pushed in its own push op, like PUSHDATA1(4c04deadbeef)).

# (stack: <newstate> <curscriptpubkey>)

# get the script hash and throw away the 0
PUSHCURRENTINPUTINDEX INSPECTINPUTSCRIPTPUBKEY DROP

# verify provided script is correct
DUP TOALTSTACK SHA256 EQUALVERIFY

# compute new output, including pushing the state
FROMALTSTACK <programstartoff> RIGHT
OVER $pushop SWAP CAT SWAP CAT SHA256

# assert that output matches
<output> INSPECTOUTPUTSCRIPTPUBKEY
0 EQUALVERIFY 2DROP EQUALVERIFY 2DROP

# (stack: <newstate>)

Except this is actually a little clumbsy. Usually we want to extract the old state, do some stuff to figure out the new state, and then assert the new state. So actually we’d structure it more like this:

# (start: <cur_scriptpubkey>)

# compute what the input scriptPubKey *should* be
DUP SHA256 # TODO make sure this is how it gets computed for segwit v0

# get the script hash and throw away the 0, then verify it
PUSHCURRENTINPUTINDEX INSPECTINPUTSCRIPTPUBKEY DROP EQUALVERIFY

# now extract the program and save it, then get just the actual state on the stack
DUP <program_start_off> RIGHT TOALTSTACK
len([push_op]) <state_size> SUBSTR

# (do stuff here: <old state> -> <new state>)

# reassemble what the output should be
# TODO make sure that this correctly compute what the segwit thing should be
[push_op] SWAP
FROMALTSTACK # restore the program
CAT CAT # put it all together

# hash it to compute what the segwit v0 output should be
SHA256

# now assert everything matches up
<output> INSPECTOUTPUTSCRIPTPUBKEY
0 EQUALVERIFY EQUALVERIFY

# (stack cleaned up)

So this is sneaky because it hides its state in the altstack, so as long as our stack hygiene there is good too, then it’ll all be okay.

This would be improved if we had a PUSHSCRIPT opcode that would push the currently executing script, as the script could pull the old script program directly. Since we’ve established that that functionality is already possible it makes sense to introduce an opcode that removes the necessity to restate the script as part of the scriptsig.

Something nice about this structure means that any signatures for any other funds that are spent with this do commit to the new state of the recursive covenant as long as they commit to the outputs. This is useful because it means we don’t have to worry about signatures being reused in different state transition.

Exposing tx data to ZKP verifier

Since we’re just relying on the circuit to enforce many of the constraints of the flow of funds, we have to expose that data to the verification somehow. It’s unclear what a SNARK primitive in Bitcoin would look like, so we have to take some liberties to imagine what it might look like, more on that later.

But somehow we’d have to expose the transaction outputs to the SNARK verifier. The simple way to do this is to do something like a DUP SHA256 CHECKTEMPLATEVERIFY on a copy of the outputs data provided as part of the witness, but that’s super inefficient! Verification keys are already huge so we want to avoid that!

More concretely, we can stitch together a blob of bytes that represent what are care about from the transaction outputs:

# create an buffer of 36 0x01 to use for outputs we don't have, this filler
# doesn't actually matter because we use the first byte of each 36 to signal if
# it's present or not
1 DUP CAT DUP CAT DUP CAT DUP DUP CAT DUP CAT CAT TOALTSTACK

# for each of the outputs
0 INSPECTNUMOUTPUTS
{
    # condition on if we have the output we're checking for
    DUP i GREATERTHANOREQUAL
    IF
        # attach the output data on, make sure it's taproot
        1 i INSPECTOUTPUTSCRIPTPUBKEY
        1 EQUALVERIFY 2DROP
        i INSPECTOUTPUTVALUE SCRIPTNUMTOLE64
        CAT
    ELSE
        # push the empty buffer on
        PUSHDATA1(0) FROMALTSTACK
        DUP TOALTSTACK
    ENDIF
    CAT CAT
} for i in 1..n

# get rid of the empty buffer
FROMALTSTACK DROP

And then of course tweak that to simplify the stack manipulation.

Merkle root update

In a few places we have to take a merkle root and replace a leaf with another leaf. We do this with the rollup state tree to execute withdrawals, but there’s other uses. We can do this with a zkp, but that’s overkill. We want to do it with regular script ops. Our life gets easier if we keep a fixed size tree.

We do this process in two steps since it would be inefficient to shuffle the elements around on the fly. You could probably get away without this by having the proof be provided already in parts, but let’s just put it here for simplicity:

TOALTSTACK TOALTSTACK TOALTSTACK
DUP 32 LEFT SWAP
{ DUP (32 * i) 32 SUBSTR SWAP } for i in 1..n-1
(32 * (n-1)) RIGHT
FROMALTSTACK FROMALTSTACK FROMALTSTACK

You also might be able to avoid this if you play around with the stack structure in this gadget.

So conceptually, when performing an update to a merkle root the stack ends up looking like this:

<root> <cohash 1> <cohash 2> ... <cohash n> <leaf> <update> <idx>

This requires some stack finagling in order to split the proof apart, but you could tweak this design to pull out the next lowest cohash from the proof progressively.

The core guts to doing each step looks like this:

DUP TOALTSTACK # save the current path bits
1 AND # condition on the lowest bit
IF
    # hash with our accumulators to the right
    TOALTSTACK TOALTSTACK DUP
    FROMALTSTACK CAT SHA256 SWAP
    FROMALTSTACK CAT SHA256
ELSE
    # hash with our accumulators to the left
    TOALTSTACK TOALTSTACK DUP
    FROMALTSTACK SWAP CAT SHA256 SWAP
    FROMALTSTACK SWAP CAT SHA256
ENDIF
FROMALTSTACK 1 RSHIFT # take the rest of the path bits and shave one off

After repeating this n times, our stack looks like this:

<root> <computed root> <new root> 0

So we can just finalize it like this, leaving the new root on the stack:

DROP TOALTSTACK EQUALVERIFY DROP DROP FROMALTSTACK

These can probably be optimized a little bit using other stack manipulation ops, but this is the general idea.

We’ll refer to this with the alias ~MERKLEUPDATE<n> used like this:

<old root> <proof> <leaf> <update> <idx>
->
<new root>

As said before, we may want to use a SNARK-friendly hash function instead of SHA-256.

In-place buf replace

Since we do this in a few places I wanted to define a simple gadget to avoid having to rewrite it many times. This just replaces a subsequence in a buffer with a different sequence of bytes. This seems clunkier than it should be, maybe someone else will figure out something simpler.

DUP
<n> RIGHT TOALTSTACK
<n> LEFT
SWAP CAT FROMALTSTACK CAT

Let’s call it ~BUFUPDATE<n, m>, where n and m are the start and end of the replaced buffer and it’s used like this:

<seq> <buf>
->
<updated buf>

High level design approach

7Rollups naturally lend themselves to account-like data models, as having static public keys allows referring to accounts by short indexes. The signatures for these transactions can then be omitted from the public part of the ZKP witness. This is not ideal for privacy, but there are privacy oriented rollup designs like Aztec that can be investigated in another work.

The particular semantics of the state transition function of the rollup can vary wildly depending on the application, but we can describe the common high level external behavior relevant to the function of the rollup that we are responsible for.

assert that the batch transactions, when played out on top of the old state root, yields the new state root
spend deposits into the rollup utxo
execute withdrawals from within the rollup

Since the goal of our Bitcoin script logic is to just expose transaction data and the rollup state commitments to the SNARK verifier, our goal is to use the SNARK circuit itself to enforce some of the more involved requirements of the transaction structure to make the rollup work. Note that we have to have the recursive covenant logic done by ourselves in regular script. In order to have the circuit assert that we would effectively have to have the circuit know the hash of its own verification key (or something similar), which is impossible. Actually there’s a few cases here where it seems like we could simplify some logic, but we run into similar issues like that where a hash would have to be present in its preimage.

The funds in the rollup are stored in a single utxo that has this recursive covenant structure that we’ve been outlining applied to it as the first input and output to each rollup update transaction. The normal update spend path is locked behind a timelock to permit forced withdrawal transactions to be submitted, although there’s some issues that still persist that we discuss below.

Strictly speaking, supporting deposits and withdrawals isn’t actually required if all we care about is improving the throughput of the ledger. This might be useful if we layer lightning channels inside the rollup and we only use it as mechanism to cheaply manage channels. But it limits its generality so we will still look to support them in our design.

Normal rollup-initiated withdrawals in a batch are exposed to the chain as extra outputs to the transaction going to their appropriate destinations. The script uses the Liquid inspection opcodes to extract the scriptpubkeys for these outputs and add a commitment to that data as part of the public input to the rollup. There is a maximum number of withdrawals that we can support at once due to the limitations of snark circuits, but concrete limitations are unclear at this time.

// TODO make diagram somewhere?

Handling deposits

Deposits are more involved to accomplish. The way I see is similar to how the forced withdrawal mechanism works (see below), by introducing another spend path that verifies the same basic requirements for continuing the recursive covenant, but instead of applying a state transition the user supplies a public key for the account they intend to create in the rollup. This does not immediately create the account, but instead requires that the sequencer does the work to insert the accounts themselves.

It should be investigated if it would actually be cheaper to use a merkle mountain range like construction instead of just always having a fixed sized merkle tree. It also seems like we’re doing a lot of stack manipulation here that might be able to be simplified by rearranging things.

Forced withdrawals

Forced withdrawals are tricky and there’s several different ways to make them work. If the sequencer(s) is(/are) misbehaving and are censoring withdrawal transactions, then we would like to be able to force a withdrawal in some way.

The obvious way to do this is to just submit a merkle proof for the account in the state tree and use that to update the root with the entry removed. This is a little bit tricky because the proofs could be large. A naive design might treat the state tree as a sparse merkle tree and have the bits of each address be the merkle path (as an earlier iteration of this did), but since we address accounts by a short index, our merkle tree is much shorter. We can get away with 20 bit indexes for a max of 1048576 accounts and a withdrawal proof size of 640 bytes, or even 400 bytes if we decided to use 20 byte hashes. This brings the size of the proof onto the scale of the size of the program itself.

But this might still be a little bit high and if the sequencer is censoring one user they may be censoring others as well, so an alternative approach may be to add an extra spend path to do a batched forced withdrawal. Verifying multiple merkle proofs is expensive, and by the time you have 2 or 3 merkle proofs in the transaction it may actually be cheaper to simply use another ZKP, so that avenue should be investigated eventually.

There’s a potential fee sniping attack here where an attacker can keep a rollup busy and keep submitting forced withdrawals to it as honest users are trying to withdraw or while the sequencer is trying to submit a state transition. This isn’t great because if they can keep it occupied long enough then they can force it over the sequencer failure deadline. Sloppy ways to solve this would be to also require some PoW or VDF in order to submit a withdrawal proof or to simply increase the failure deadline. The cost of the sequencer failing to defend is nearly zero (whatever the cost of proving an update) and the cost of the attacker succeeding is relatively high (blockchain fees, and eliminating an account they can attack with), so I don’t think in practice this attack vector will be very common.

One way to defend against this is to limit the number of withdrawals that can be made between updates. That’s can still prevent people from exiting the rollup, but it can’t force it to fail.

A cheaper alternative to using merkle trees is to use verkle trees. These rely on pairings, but we’re using pairings already anyways in the ZKP so that doesn’t add any additional security assumptions.

Sequencer failure

In the event that the sequencer abandons their duties the rollup will not be able to make progress. A more ideal solution for this is to have some system involving a rotating set of sequencers that can sign over their right or a mechanism where sequencer rights for a period can be auctioned off. There’s other discussion of this that’s out of scope for this discussion.

Regardless of the choice, if the sequencer fails and the rollup can’t make progress the simplest thing to do is allow users to reclaim their funds. We can introduce an additional spend path behind a CHECKSEQUENCEVERIFY timelock that extracts a hash from the state commitment and spends all of the rollup’s funds into a CHECKTEMPLATEVERIFY withdrawal tree that allows users to claim their funds on their own time. But this is tricky to update in a forced withdrawal scenario, so really we would want to use another recursive covenants that is aware of our state tree structure instead.

Computing this hash as part of the circuit does not seem very SNARK friendly due to the number of hashes that go into other hashes, but I’d imagine someone could devise a scheme where it’s not too bad. If not we can just build another recursive covenant as described above.

Tying it all together

To demonstrate a very basic example of what the rollup contract logic would look like.

As part of the state we carry forward we keep this data:

state root (maybe with a batch index to keep track of time, see comments on Lightning)
pending deposits root, pending deposits count, pending deposit value
pending withdrawals root, pending withdrawals count

So the structure of the state may look like this:

[ state root:       32 ]
[ deposits root:    32 ] # 64
[ deposits count:    8 ] # 72
[ deposits value:    8 ] # 80
[ withdrawals count: 8 ] # 88

Depending on the spend path our witness can take a few different forms:

update: <zkp_proof> <vk> <new_state_root> <batch_txs> <seq_sig> <cur_scriptpubkey> 0x01
deposit: <deposit_proof> <pubkey> <cur_scriptpubkey> 0x02
withdrawal: <withdrawal_proof> <signature_and_pk> <cur_scriptpubkey> 0x03
timeout abort: <deposit_cancel_program> <cur_scriptpubkey> 0x04

The overall script looks kinda like:

# commitment to the state that we throw away and reextract below
PUSHDATA1(<cur_state>) DROP

DUP 4 AND 0 0NOTEQUAL IF
    DROP
    # timeout abort path

    # make sure it's been long enough
    <timeout_delay> CHECKSEQUENCEVERIFY DROP

    # extract the state
    SWAP
    DUP SHA256 # TODO make sure this is how it gets computed for segwit v0
    0 INSPECTINPUTSCRIPTPUBKEY DROP EQUALVERIFY
    DUP <program_start_off> RIGHT TOALTSTACK # program_start_off = 90 I think?
    2 <state_size> SUBSTR # state_size = 88

    # (stack: <dcp.> <cur_state>)

    # get the state root, verify the two hashes against it
    DUP
    32 LEFT SWAP TOALTSTACK SWAP
    # (stack: <state_root> <dcp.>)
    # (altstack: <cur_state>)

    # get the withdrawal output that's there so we can verify it matches (and
    # make sure it's segwit v0)
    0 INSPECTOUTPUTSCRIPTPUBKEY 0 EQUALVERIFY

    # actually verify it matches
    2DUP CAT [push_op] SWAP CAT SHA256 EQUALVERIFY

    # now verify the withdrawal output is there and what we expected
    0 INSPECTOUTPUTSCRIPTPUBKEY 0 EQUALVERIFY EQUALVERIFY
    DROP DROP

    # (stack: <dcp.>)
    DUP SHA256 PUSHDATA1(<dcp_hash>) EQUALVERIFY # verify the program is correct
    FROMALTSTACK DUP TOALTSTACK 32 64 SUBSTR # maybe want 32 72?
    # (stack: <dcp.> <deposits_root>)
    [push_op] SWAP ROT CAT CAT SHA256
    1 INSPECTOUTPUTSCRIPTPUBKEY 0 EQUALVERIFY EQUALVERIFY

    # and finally verify that the values match up
    1 INSPECTINPUTVALUE LE64TOSCRIPTNUM
    1 INSPECTOUTPUTVALUE
    FROMALTSTACK 72 80 SUBSTR # maybe we should extract this sooner?
    ROT EQUALVERIFY # make sure the deposits reclaim value matches what's in
                    # the deposits reclaim output
    LE64TOSCRIPTNUM
    SUB # compute the amount we expect to be in the accounts withdrawal tree
    0 INSPECTOUTPUTVALUE LE64TOSCRIPTNUM
    EQUALVERIFY # verify the accounts withdrawal tree matches up

    # don't think there's anything else we have to do here is there?
ELSE
    # normal cases

    # extract the state
    SWAP
    DUP SHA256 # TODO make sure this is how it gets computed for segwit v0
    0 INSPECTINPUTSCRIPTPUBKEY DROP EQUALVERIFY
    DUP <program_start_off> RIGHT TOALTSTACK # program_start_off = 90 I think?
    2 <state_size> SUBSTR # state_size = 88
    SWAP

    # (stack: ... <cur_state> <path_id>)

    # update path
    DUP 1 EQUAL IF
        DROP

        # verify the sequencer's signature, if we have just a single sequencer
        SWAP PUSHDATA1(<seq_pk>) CHECKSIGVERIFY

        # (stack: <zkp_proof> <vk> <new_state_root> <batch_txs> <cur_state>)

        # verify the pushed key is correct
        3 PICK SHA256 PUSHDATA1(<vk_hash>) EQUALVERIFY

        # get out the old state root and deposits_root
        64 LEFT

        # (stack: <zkp_proof> <vk> <new_state_root> <batch_txs> <accumulator>
        ROT DUP TOALTSTACK SWAP CAT
        # (stack: <zkp_proof> <vk> <batch_txs> <accumulator>)
        # (altstack: <program> <new_state_root>)

        # build up the outputs
        1 DUP CAT DUP CAT DUP CAT DUP DUP CAT DUP CAT CAT TOALTSTACK
        0 INSPECTNUMOUTPUTS
        {
            DUP i GREATERTHANOREQUAL
            IF
                1 i INSPECTOUTPUTSCRIPTPUBKEY
                1 EQUALVERIFY 2DROP
                i INSPECTOUTPUTVALUE SCRIPTNUMTOLE64
                CAT
            ELSE
                PUSHDATA1(0) FROMALTSTACK
                DUP TOALTSTACK
            ENDIF
            CAT CAT
        } for i in 1..(max_withdrawals + 1)
        FROMALTSTACK DROP

        # join together the final public input
        CAT SWAP CAT

        # (stack: <zkp_proof> <vk> <public_input>)

        # verify the proof
        # TODO sketch out how this should work
        ZKPVERIFY
        
        # now figure out the new state
        FROMALTSTACK
        
        # put together the new state, with new merkle trees, computing like
        # this is shorter than writing it out verbatim, works out to be around
        # ~40 vs ~121 for merkle_height of 5
        PUSHDATA(0) DUP CAT DUP CAT DUP CAT
        DUP DUP CAT
        { DUP CAT SHA256 } for n in 0..(merkle_height + 1) # sneaky
        SWAP DUP 2 PICK OVER
        CAT CAT CAT CAT CAT # cat the 6 parts together

        { SIZE 120 EQUALVERIFY } if we want to be safer
    ELSE

    # deposit path
    DUP 2 EQUAL IF
        DROP TOALTSTACK

        # verify amounts match as expected
        1 INSPECTINPUTVALUE DUP
        0 INSPECTINPUTVALUE ADD
        0 INSPECTOUTPUTVALUE EQUALVERIFY

        # update the stored deposits value
        FROMALTSTACK DUP TOALTSTACK 72 80 SUBSTR # deposits value
        LE64TOSCRIPTNUM ADD SCRIPTNUMTOLE64
        FROMALTSTACK ~BUFUPDATE<72, 80>

        # now insert the new root into the accumulator
        DUP TOALTSTACK 32 64 SUBSTR

        # (stack: <deposit proof> <pubkey> <deposits root>)
        ROT ROT
        # (stack: <deposits root> <deposits proof> <pubkey>)

        # increment the counter
        FROMALTSTACK DUP TOALTSTACK
        64 72 SUBSTR
        LE64TOSCRIPTNUM
        DUP <deposits_limit> GREATERTHANOREQUAL VERIFY # make sure we're not at
                                                       # the limit
        1ADD
        DUP FROMALTSTACK ~BUFUPDATE<64, 72> TOALTSTACK
        
        # (stack: <deposits root> <deposits proof> <pubkey> <idx>)
        PUSHDATA1(0) DUP CAT DUP CAT DUP CAT DUP CAT DUP CAT
        SWAP

        # compute the new root
        ~MERKLEUPDATE<?> # log(deposits_limit)

        # and update it in the state
        FROMALTSTACK
        ~BUFUPDATE<32, 64>
    ELSE

    # withdrawal path
    DUP 3 EQUAL IF
        DROP
    # withdraw path
    # TODO (see discussion above)
    ELSE

    # (cleanup, the indents are weird otherwise, pretend this line isn't here)
    RETURN ENDIF ENDIF ENDIF
    
    # TODO see if we can condense cases 0x02 and 0x03 into using the same
    # merkle tree logic and save ~100 bytes

    # assert the new state
    # (stack: <new_state>)
    # (altstack: <program>)
    [push_op] SWAP FROMALTSTACK # TODO decide what this var is
    CAT CAT SHA256
    0 INSPECTOUTPUTSCRIPTPUBKEY 0 EQUALVERIFY EQUALVERIFY
ENDIF

For reference, the structure of the public data passed to the zkp verification logic is structured like this:

[ new_state_root: 32 ]
[ state_root:     32 ]
[ deposits_root:  32 ]
[ tx_outputs:     41 ] * max_withdrawals 
[ batch_txs:     ... ]

# tx output structure
[ present?: 1 ][ taproot_pk: 32 ][ value: 8 ]

And the pending deposit reclaim script, in the general case, looks like this:

# prepare the covenant
DUP SHA256
PUSHCURRENTINPUTINDEX INSPECTINPUTSCRIPTPUBKEY DROP EQUALVERIFY
DUP <program_start_off> RIGHT TOALTSTACK
2 <state_size> SUBSTR

# here is where we'd check if we wanted to do this case or the terminal case

TOALTSTACK

# verify the signature
DUP TUCK CHECKSIGVERIFY

# (stack: <proof> <idx> <amount> <pubkey>)

# make sure it matches the output pubkey
1 INSPECTOUTPUTSCRIPTPUBKEY 1 EQUALVERIFY EQUALVERIFY
DUP TUCK EQUALVERIFY

# make sure it matches the output value
SWAP
1 INSPECTOUPTUTVALUE
DUP TUCK EQUALVERIFY

# also make sure that they leave ther rest of the funds in the recursive output
# TODO tweak this math to be aware of tx fees
DUP 0 INSPECTINPUTVALUE SWAP SUB
0 INSPECTOUTPUTVALUE EQUALVERIFY

# now prepare the proof, somehow getting the cur root
CAT SHA256
# (stack: <proof> <idx> <entry leaf hash>)
FROMALTSTACK SWAP TOALTSTACK ROT ROT FROMALTSTACK SWAP
PUSHDATA1(0) DUP CAT DUP CAT DUP DUP CAT DUP CAT DUP CAT CAT # 36 bytes zeroed
SHA256 SWAP
# (stack: <cur root> <proof> <entry leaf hash> <empty leaf hash> <idx>)
~MERKLEUPDATE<n>
# (stack: <new root>)

# apply any changes here, maybe subtracting the number of deposits left

# assert the new state
PUSHDATA1(0x4c78) SWAP FROMALTSTACK # PUSHDATA1, 120
CAT CAT SHA256
0 INSPECTOUTPUTSCRIPTPUBKEY 0 EQUALVERIFY EQUALVERIFY

(Actually want to tweak it slightly for the last person in the rollup, probably just adding a counter that we decrement.)

Which gets used like this in the recursive case, should be augmented to figure out what to do about the other cases:

<proof> <idx> <amount> <sig> <pubkey> <cur_scriptpubkey>

Issues with these scripts as written:

doesn’t concern L1 transaction fees
might not actually use the Elements opcodes properly, the docs on these are a little sparse
stack manipulation might be incorrect or at least inefficient

Potential attacks

There is an issue where an attacker could conduct a fee pinning attack and prevent the rollup from progressing by outbidding the usual rollup update transactions and making excessive deposits. Their cost to attack would probably be cheaper than the sequencer’s cost to defend (as the usual update transactions usually have a large witness), so this could pose an issue. In the worst case, a well-capitalized attacker may be able to prevent updates all the way until the abort deadline (see below). The way we solve this is by limiting the deposit queue size such that only some number of deposits may be made between rollup updates. If many users wish to enter the rollup at the same time, they can coordinate to create a schnorr multisig account inside the rollup that they collaboratively deposit into and pre-sign several rollup transactions to claim their own funds after the deposit is accepted.

This does seem to reintroduce the issue where a well-capitalized attacker can still conduct a fee pinning attack and prevent other users from depositing, but I’m not sure what the best way to solve this is. Maybe a VDF or fidelity bond?

There’s a similar fee pinning attack in withdrawals as above, and we can resolve it in a similar way as before by limiting the number of withdrawals between update batches. It’s still possible to prevent honest users from performing forced withdrawals, but it won’t prevent them from doing normal rollup-initiated withdrawals.

These fee pinning weaknesses are the primary concern with implementing a protocol that takes this kind of approach. They could be solved if we had negative timelocks (also as discussed on bitcoin-dev a few times) that cut off a spend path after a certain deadline, but this is generally considered to be dangerous so might end up being harder to fork in than the covenants.

Variations

Optimistic-ZK Rollup

The idea with optimistic rollups is that if we build proper infrastructure on-chain to be able to verify a challenge, we can just skip the actual verification step. The verification infrastructure is some kind of sandboxed runtime that executes smart contracts that finds where the state stops matching up with what was posted to chain as part of the batch. When a batch is posted, everyone can inspect the data that was posted to ensure that it’s correct, and if it is not then anyone can challenge it and receive part of the batch submitter’s fidelity bond. This maintains the security of the whole system as long as at least one party is honest.

In Ethereum with rich access to state, this complicated machinery is in principle relatively easy to build. But Bitcoin does not have that luxury, but we can take a combined approach. We can maintain the process of putting the batch data on-chain, but introduce a fraud game where the sequencer is forced to submit the vk and a proof (and so only spend the kilobytes(!) on it in that case) paid for by the reporter.

This is more involved, but not unduly so. It involves a rotating commitment to recent state roots that we can pull data out of if someone reports a recently-buried state transition. This should be something like 16 in order to make the merkle tree symmetric and make writing the code simpler. If a mismatch is reported incorrectly then the sequencer should be able to submit a valid proof within a deadline and restore operations after the current most recent batch. If they’re unable to post a proof, then after a deadline another timelocked spend path becomes available where we can abort the rollup and spend its funds into a CHECKTEMPLATEVERIFY-style withdrawal tree that can be pulled from the history commitment.

// TODO some script snippets!

This also complicates the withdrawal process since withdrawals need to mature before they can be spent. We can use a similar process as with deposits where they’re put into a list that the sequencer is forced to act on. Since there’s a delay before we can actually create a set of withdrawals, we can use another CTV withdrawal tree that gets kept in the rotating history commitment that we use to track recent state roots. This has a slightly higher overall on-chain footprint, but the savings in bytes from not having to post the vk and proof to chain every time is totally worth it.

Deposits function very similarly to the basic case, users just have to be aware of the delay before they can accept funds or verify the rollup themselves.

Would probably work okay with Lightning (see Applcations) since the semi-happy path has delays that would probably be less than what Lightning already exposes.

Small group multisig happy path

If the rollup involves a small set of parties that make many transactions and for some reason “just using Lightning” isn’t an option, then it may make sense to have an alternate spending path that just takes a signature from all of them and they can all consent to the state transition without having to rely on the chain to enforce the logic.

This would be improved by using a taproot output which increases privacy and chain space.

Failing that, we could still rely on script to encourage liveness.

Validium-like

Instead of actually putting the batch contents on-chain just provide its hash and a threshold schnorr signature from a large number of parties. These parties attest that they have been given a copy of the data. But this is a rather naive way to do it.

Alternatively, use some kind of data custody proof that gets rolled into the SNARK circuit. With more work, a scheme where data custodians can be challenged to produce new data custody proofs on random data (similar to how Siacoin functions) could be introduced to help ensure that custodians actually keep a copy of the data.

By omitting the transaction data that gets posted to chain, the fee floor goes to zero since the marginal cost of each additional transaction goes to zero. This obviously comes with higher risks in that all data custodians may lie, although it only requires that a single data custodian remains honest in order for the system to remain functional. But in order to realize this you’d have to have a pretty high level of throughput in the rollup to justify the SNARK costs, at which point it may be more attractive to use Lightning anyways.

It may be unwise to combine this with the Optimistic-ZK approach. The “only one other user must be honest” is actually a slightly different assumption in the two cases. In that case, it’s “one of any user”, but in this case it’s “one of any data custodian”, and the difference in incentive alignment may lead to data custodians colluding to steal funds from non-custodian users by not announcing the data and so prompting them to trigger proof verification. If they don’t have the data accessible to them off-chain to verify then in order to ensure that the rollup state is correct that’s the only option they have.

This is related to the “data availability problem”.

Replacements

Stack element size

For the verification key and proof we can split out into several stack elements, probably just 2 each actually (see below). This makes stack manipulation much more complicated but manageable. But this stack manipulation increases the size of the actual script program, which is problematic since it’s already pretty large and we have an effective size cap of ~430 bytes since we lose out on ~90 due to the state we carry forward. If this ends up being a limitation then having something like OP_EVAL would make things a lot nicer since we can hide the code for the different spend branches under hash reveals.

The batch body is trickier because it’s variable size. One way to solve it is to just hash it immediately and pass the hash to ZKP verification, requiring the hash be revealed as part of the private input.

Transaction inspection

If we aren’t able to directly inspect the transaction’s outputs, we could still use CHECKTEMPLATEVERIFY like I mentioned above. It would be less than ideal since the weight on-chain would be a bit more, but strictly speaking we could get away with it.

Applications

Lightning

Rollups could work super well for Lightning.

I could imagine a rollup design with ledger entries that function like dumb accounts, but also special-purpose “channel” entries that act basically like lightning funding outputs, but with all the signatures omitted and proved as part of the SNARK circuit, including the ones for the close tx. One common criticism of Lightning is that if we tried to onboard millions of users opening a few channels each it would eat up the full chain capacity. While systems like channel factories can mitigate a lot of this (and rather elegantly using CTV, too), if we find ourselves in a situation where users open and close channels more often than we anticipate it would be nice to have a layer where we can

So close txs could be reduced in size to maybe 14 bytes (4 byte channel idx, Alice funds, Bob funds, 2 bytes feerate?) at best in the cooperative case and without HTLCs.

Uncooperative closes would be trickier and we would have to introduce some mechanism to keep time in the rollup and allow us to build a construction like Eltoo where newer states can be posted after a close is initiated. There would be a slight communication overhead since on every update the users would have to exchange signatures not just for the rollup transactions but also for the outputs that would be theoretically created if the rollup aborts and they have to recover their funds from the CTV withdrawal tree.

The really tricky part is if there’s any unsettled HTLCs in the channel when it’s closed. So we’d also have to introduce a new kind of entry to represent unsettled HTLCs that get spilled out of a channel when it’s closed, and how that gets represented in “real” HTLCs at “real” Bitcoin addresses if the rollup gets aborted should be pretty straightforward. It seems like it should be possible to omit the preimages from what gets posted to the blockchain, but there’s a bit of nuance here. It’s possible for the payment backpropagation to break if an HTLC makes its way into the rollup and the downstream party colludes with the sequencer to claim the HTLC without revealing the preiamge to other rollup participants, stealing from their upstream counterparty. This is a little tricky to resolve but a compromise could be to have a similar data custodian concept as discussed before. The risks don’t stop here though, since the security polices of two nodes downstream in a payment can impact upstream channels. The destination party would still get their money, but this makes it possible that to the original sender the payment appears to fail and they don’t get the preimage, so it also breaks protocols that rely on obtaining that.

Once we have a mechanism for setting up a Lightning channel within a rollup, other users that aren’t involved in it would still be able to route through it just the same as any other channel. We can extend the BOLT7 channel announcement spec to include merkle proofs to prove that the channel really was created and verifying the proof would be relatively easy and not require all of the rollup client logic.

Cross-chain bridges

It’s been conjectured that rollups could settle onto multiple chains in parallel. If the rollup natively supported some rich smart contracting system, users could build protocols on top of them that interact with funds from both chains relatively seamlessly.

The difficulty arises in synchronizing the settlement properly. A misbehaving sequencer could submit differing batches to the two chains and end up in a splitbrained nightmare scenario.

The way to solve this requires that the two chains have some way to communicate between the two chains. People have built Ethereum smart contracts that follow the Bitcoin header chain, and stick it all into a big accumulator so that you can make merkle proofs and prove transactions happened at a certain point in time. But I can’t imagine how we’d do that in both direction for two Bitcoin-like chains without some serious changes to the protocol that would make it very much unlike Bitcoin. So it’s probably not worth pursuing this avenue further at the moment.

Performance analysis

We may be able to reduce the number of rollup transactions per batch we need to include to cover the costs to around 50. Since we could support a couple hundred before the batch body starts to be bigger than the size of the vk, this seems like a win, especially when combined with a validium-like approach.

According to the zkSync contracts:

verification keys can be ~964 bytes
proofs can be ~1088 bytes (plus the witness size)

So let’s just round both of those numbers to 1KiB. We can expect the full verification script to be around 500 bytes. (If we had an OP_EVAL then we could omit lots of it like we do for the verification key, but let’s not worry about that now.)

Since that’s all going to be in the witness, the real weight cost is actually not too bad. If we expect the rest of the transaction structure to be on the order of 200 bytes (rough estimate), then that brings a conservative estimate for the total base cost to 2848 WU as the minimum base per update that has to be amortized across all the rollup transactions.

The equation looks like this:

cost_per_tx = fee_rate * (tx_size + (base_cost / num_txs_per_batch))

We assume conservatively 16 bytes per rollup tx. To actually figure out the break even count, we have to assume a cost for an effective normal transaction, for which 150 WU is pretty good.

The equation turns into:

cost_norm = (cost_base / num_txs) + tx_size
breakeven = base_cost / (cost_norm - cost_per_tx)

The result we got that gives us a value for breakeven of ~21.3, which seems pretty good!

Obviously, more is better. And if we do the ZK-Optimistic design then the fixed costs go wayyy down since we can just avoid putting the verification key and the proof on-chain.

If we assume, a moderately high fee of 25 sat/vB (it’s a rollup tx so we want to set it moderately high so it confirms quickly), then that means we get a relationship like this:

10 txs -> 7520 sat
50 txs -> 1824 sat
1000 txs -> 471 sat

In that last case that means we’d pay the same as 3.1 sat/vB even if the current cost rate is 25 sat/vB. And that’s with a really conservative size of the rollup tx, we could reduce that by some more. That tx would be like 18 KiB though, which is definitely on the larger side.

Conclusion

It would be a great idea if Bitcoin could support rollups some day.

Bitcoin really doesn’t need much to achieve some kind of “Functionality Escape Velocity”. Projects like zkSync are nearing the capability to support most EVM contracts with relatively minor modifications. An approach to system design like this seems fairly inline with the Bitcoin design philosophy of maintaining a relatively simple settlement layer and shifting complexity off to users who have the freedom to design systems that suit their own needs. Plus the primitives that would be necessary to build rollups are very general and would enable a wealth of other use cases which would supercharge what projects like Sapio aim to accomplish.

The biggest single barrier I see is probably political, as Bitcoiners understandably have an abundance of caution before adding new protocol features. Especially since SNARKs rely on new cryptographic assumptions (pairings) beyond what Bitcoin already does

Articles Index