Building a read-only P2P SQLite Database in the browser

The Harvard Library Innovation Lab recently wrote about serving the data.gov archive as static DuckDB files: put the database on ordinary static hosting, load DuckDB-WASM in the browser, and rely on HTTP Range requests so queries fetch only the byte ranges they need. With this, no query server, no backend database and no per-user compute on the origin are needed!

There exist similar projects like “Hosting SQLite databases on GitHub Pages” which mostly use similar techniques.

That is already an interesting project - but it leaves one remaining cost point: every browser still goes back to the origin for the same popular chunks, costing bandwidth over and over.

This made me wonder if we can peer-to-peer serve those blocks between browsers to other browsers over WebRTC. The static host remains the source of truth, but active readers become a swarm cache.

The result is a small prototype of P2P-accelerated SQLite in the browser: static file hosting as the fallback, WebRTC for peer transfers, Nostr relays for peer discovery, and hashes to protect against corrupted peer data.

TL;DR
I added a P2P layer (WebRTC via trystero + public Nostr relays for discovery) that can serve cached database blobs between browsers.
Blocks received from peers are verified using a Merkle Tree proof so bad peers can’t silently corrupt data.
This doesn’t make bandwidth/compute “free” (clients still pay it), but it can reduce origin traffic for popular datasets with overlapping reads.

I uploaded the final code at https://github.com/vantezzen/swarm-sqlite or you can try the result yourself at https://vantezzen.github.io/swarm-sqlite/ - though be aware this will open WebRTC connections to other active peers.

Finding an SQLite-WASM base

There are tons of existing SQLite-WASM solutions like the official build. However most of these don’t seem to make it easy to build own storage adapters - requiring custom Emscripten implementations, no Promise support etc.

For this project I wanted an SQLite engine that:

Allows declaring filesystem adapters with pure JavaScript No custom C-Adapter, no Emscripten FS interactions etc.

Allows byte-level reads We want to work on arbitrary byte ranges instead of loading the whole file into RAM

Full promise support Since I’ll be using fetch and P2P requests it would be easiest to just await those requests

After comparing different ones, I settled on https://github.com/rhashimoto/wa-sqlite. So I installed their npm package, followed their GitHub docs and implemented a POC filesystem.

However, various TypeScript errors kept popping up and everything I tried resulted in cryptic Emscripten errors. After a day of debugging, it turns out their npm version is around 2 years outdated and uses a wildly different codebase, so I needed to manually copy the required code pieces into my project.

Even still though, the API for creating custom filesystems in wa-sqlite is still very low-level, working on file IDs, bitmap flags and split int64 numbers for byte offsets. I thus worked on a base Filesystem class that abstracts away that complexity and simply tells us what byte offset of which file to read into which buffer


/**
 * Read up to `dst.byteLength` bytes starting at `offset` into `dst`,
 * and return the number of bytes actually read. The base class handles
 * zero-filling and `SHORT_READ` reporting when fewer bytes than requested
 * are available.
 */
protected abstract read(
  filename: string,
  dst: Uint8Array,
  offset: number
): MaybePromise<number>

If you ever want to create a different SQLite-WASM filesystem yourself, feel free to use this as your base.

HTTP Filesystem

With a base Filesystem in place, it was time to write a HTTP-based filesystem. Though the unique aspect is P2P transfers, we’ll still need to fall back to HTTP if no peers are available or no peers have this chunk:


// We can just use a HEAD request to get the whole file size
async size(filename: string): Promise<number> {
  const response = await fetch(filename, {
    method: "HEAD",
  })
  const filesize = Number(response.headers.get("content-length") ?? 0)
  return filesize
}

// Use Range requests to get specific sections of the file
async read(
  filename: string,
  dst: Uint8Array,
  offset: number
): Promise<number> {
  const response = await fetch(filename, {
    headers: {
      Range: `bytes=${offset}-${offset + dst.byteLength - 1}`,
    },
  })

  const data = await response.arrayBuffer()
  readIntoBuffer(data, dst);
  return data.byteLength;
}

For testing, I created a simple database of books and reviews based on a kaggle dataset. The resulting database is around 180MB, with often only some parts needed for search operations.

I created a super basic UI for showing and searching the books:

Internally, three SQL queries are run to incrementally show more info as new data loads in:

The top 50 books are shows first, which only requires the first few MB

Then we are running a SELECT AVG(rating), COUNT(*) AS num_ratings FROM reviews to also show reviews. This requires additional data from the reviews table

Simultaneously, we’re running a simple SELECT count(*) FROM books for the total tally at the top, which requires again some different chunks of the file

And we have an in-browser SQLite working off of a HTTP-hosted static database file!

Chunking and caching

I noticed that even on my local machine searches sometimes took a long time, although HTTP request should be almost instant. Looking into the network tab, I can see why:

SQLite has a default page size of 4096 bytes - this can be increased to%20interface.-,Maximum%20Number%20Of%20Pages%20In%20A%20Database%20File,-SQLite%20is%20able) 65KB, but this will still cause almost 2800 pages even for a medium-sized database like our books DB.

To work around this I added an additional file content cache system: Instead of just the 4KB page, we’ll align to 1MB blocks - an arbitrary number I chose in this case based on no research at all - and instead fetch those, caching those as the original 4KB pages SQLite will request from us for easier access.

The P2P part

I knew I wanted to use WebRTC for true P2P transfers, but doing the WebRTC handshake is always a pain. I thought about a basic socket.io server that manages these, but instead settled on trystero.

There are other interesting projects like WebTorrent but those come with their own logic and ideas of how things work, so for this POC I chose something that just gave a simple communication interface.

Trystero uses nostr (or other methods like MQTT, BitTorrent etc.) for decentralized handshakes over public nostr relays. This way I don’t need to set up and run any custom server infrastructure beyond hosting the static source files (the SQLite DB + a small info/proof file), but it’s worth noting that the bandwidth and compute costs don’t disappear - they shift to participating clients.

This was my first time using trystero and I was positively surprised: You define a unique App ID, define an “action” which acts as a data stream for a specific data type and trystero does the rest - discovering other peers, doing the WebRTC handshake and handling the low-level WebRTC transfers. The only critical point is the built-in list of nostr endpoints: I’m not too familiar with nostr, but some of these endpoints seem a little sketchy (though that is not too critical as we’re just handshaking) and I frequently see errors in the console about different nostr relays being down.

Here’s a high-level view of how reads flow from SQLite to the network:

Overview


+----------------------------------------+
| SQLite asks for 4KB page at offset O   |
+-------------------+--------------------+
                    |
                    v
+----------------------------------------+
| Filesystem adapter                     |
| read(filename, dst, O)                 |
+-------------------+--------------------+
                    |
                    v
+----------------------------------------+
| Map O to 1MB-aligned block             |
| floor(O / 1MB) * 1MB                   |
+-------------------+--------------------+
                    |
                    v
          +----------------------+
          | Block in local cache?|
          +----------+-----------+
                     |
          +----------+----------+
          |                     |
        yes                    no
          |                     |
          v                     v
+---------------------+     +----------------+
| Serve 4KB slice     |     | Try P2P first? |
| from cached 1MB     |     +-------+--------+
| block               |             |
+---------------------+     +-------+--------+
                            |                |
                          yes               no
                            |                |
                            v                |
+--------------------------------------------------------+
| Broadcast request via trystero action                  |
| peer discovery via Nostr relays                        |
+---------------------------+----------------------------+
                            |
                            v
                  +------------------------------+
                  | Peer responds before timeout?|
                  +--------------+---------------+
                                 |
                    +------------+------------+
                    |                         |
                  yes                        no
                    |                         |
                    v                         v
+----------------------------------+     +-----------------------------+
| Verify block with Merkle proof   |     | HTTP Range request          |
| root hash from server            |     | for 1MB block               |
+----------------+-----------------+     +--------------+--------------+
                 |                                      |
                 v                                      |
          +-------------+                               |
          | Valid?      |                               |
          +------+------+                               |
                 |                                      |
        +--------+--------+                             |
        |                 |                             |
      yes                no                             |
        |                 |                             |
        v                 v                             |
+------------------+  +-----------------------------+   |
| Store 1MB block  |  | Discard and fall back        |   |
| in cache         |  | to HTTP                      |   |
+--------+---------+  +--------------+--------------+   |
         |                           |                  |
         |                           v                  |
         |              +-----------------------------+ |
         |              | HTTP Range request          |<+
         |              | for 1MB block               |
         |              +--------------+--------------+
         |                             |
         +-------------+---------------+
                       |
                       v
              +---------------------+
              | Serve 4KB slice     |
              | from cached 1MB     |
              | block               |
              +---------------------+

With that in place we can make our P2P SQLite connection! Here is the overly simplified way of how this looks like:


// We're basing on the HTTP Filesystem
class SwarmFS extends HttpFS {

	protected async read(
    filename: string,
    dst: Uint8Array,
    offset: number
  ): Promise<number> {
		// Use trystero actions to send a request out  
    const [sendDataRequest] =
      this.trystero.makeAction<DataRequest>("data-request") 
    sendDataRequest({ filename, offset });
    
    // Wait if we get some response back from another peer
    const readBytes = await this.waitForResponseOrTimeout(dst);
    if (readBytes > 0) {
	    return readBytes
    }

		// Fall back to using HTTP range requests by just calling the super class
    return await super.read(filename, dst, offset)
  }
}

And it works! Data is sent, otherwise we’ll fall back to using HTTP after a timeout. Here we can add some more optimizations: If we don’t have any peers, we can fall back directly. We should again use the 1MB caching, etc.

Additionally, instead of a full JSON object for the returned object, we instead neatly pack it into a UInt8Array so we can minimize the overhead used:


Binary Blob contents:
[Byte 0 - 3] Offset of the data sent in the full file
[Byte 4 - 7] Length of the data blob (usually 1MB)
[Byte 8 - 9] Proof length (explained in next section)
[Bytes] Proof data
[Bytes] Block data

The bad actor

By now we get database file chunks from unknown peers and blindly trust them - but of course these peers could be sending us manipulated or simply corrupted data!

My naive approach at first was to simply hash each of the 1MB chunks, download that file of chunks from the trusted server and check against it - this file size scales linearly with a growing database file.

It seems the go-to for other P2P systems like BitTorrent is to use a Merkle Tree instead - I definitely heard of that during university before.

Source: https://en.wikipedia.org/wiki/Merkle_tree#/media/File:Hash_Tree.svg

For this, we’ll hash every chunk but then build a tree from those hashes, combining 2 hashes until we have one top hash. With this, we only need the top hash as 32 bytes from the trusted server. Other peers can then send us the data plus the path of hashes to reach back to the top hash, and then we can securely verify it with minimal server load.

However this presented a bit of a problem: We can get and save this path of hashes we get from other peers - but where do we get it from when we fetch the data directly from the server instead?

We again want this to be static and not some dynamic API endpoint, and we want to prevent needing to download the full tree of hashes just to get the path of hashes for a single block.

To solve this, I generated a binary file containing the path of hashes for each block at a specific offset. This way we just need the proof depth (i.e. how many hashes are in each path) and can then calculate the byte offset for each path.


Proof for block N at byte offset:
  N * (proofDepth * 32 bytes per SHA-256 hash)

This way, instead of one huge blob of hashes the client can again use a HTTP Range request to fetch the specific bytes of the hashes it needs.

For the additional info about the file (root hash and proof depth) I simply created an “info.json” file hosted alongside the SQLite file as a central place to get all info.

One downside of this is the size of that proofs file: Since we’re storing the full path for each block separately instead of as a real tree structure, for our 180MB file the proofs file is almost 50KB large. But since clients only ever fetch small pieces of this, the effect should be smaller.

Looking back, my naive solution of a central list of hashes would’ve probably sufficed and barely increased traffic bandwidth on the server. However, with this Merkle Tree in place, theoretically the clients can work completely independently over P2P after getting the one top hash from the server without needing more proof details from the central server.

Running it on GitHub Pages

Uploading to GitHub Pages was easy by just using the NextJS workflow template provided by GitHub. I decided to upload the SQLite DB itself to Git LFS so the hosting can also be done easily over GitHub Pages without polluting the repo size or needing a separate CDN somewhere.

Everything worked flawlessly locally, but after deploying I kept getting this error before anything displayed:

“database disk image is malformed”? I downloaded the whole SQLite file without the fancy WASM interface and it worked flawlessly so what happened?

After debugging between localhost and GitHub I found the issue: Remember this small comment?


// We can just use a HEAD request to get the whole file size

Turns out CDNs like the one on GitHub Pages returns the gzipped, compressed size when asking for it via a HEAD request. Because of this, SQLite only made enough space for the smaller, compressed byte number but the file header then contains the actual, full file size causing SQLite to crash to prevent overflows.

So instead I added the real filesize in bytes to the “info.json” file used for the proof depth earlier so the infos remain in one place.

Finale

This was a very interesting project to implement. I love going down to byte-level, emulating a filesystem-read over a P2P network, diving deep into details about how SQLite loads file and making it all run in the browser.

There are tons of aspects that can still be improved - some that come to mind:

P2P is done rather naively The P2P connection just sends out wild data requests and waits for the first peer to respond. In larger networks this could maybe cause some problems and unnecessary load on peers.

There is no versioning If I change the base SQLite file on the server, there is no real way to prevent old database blocks to fly around the system. The hash validation would throw them away on devices with an up-to-date top hash, but that is only after the blob has already been transferred fully.

Trying with a larger, more interesting database The books DB was chosen based on the original blog post about data.gov. As this is mainly one big “books” table, the browser loads most of the 180MB in the end anyway after loading the previews, search index, and entity count. To truly test the capabilities, a dataset with more separated table and different functionality could be tested.

Block size and requests Currently I chose 1MB blocks for no real reason. The 4KB of SQLite were tiny and caused too much overhead, so 1MB seemed like an acceptable size to push over WebRTC. This could probably be chosen much better, maybe adapted dynamically based on network conditions, peers etc.

Merkle Tree The Merkle Tree is currently done extremely simple, not handling things like non-power-of-two trees and other pitfalls

Still, it serves as a fun research project of minimizing deployment costs on static, public databases.