← Articles
technique

Why We Wrote a C SQLite Driver and Got 4.7× Faster

2026-03-25 13:52:48 · 01960e00-0001-7000-8000-000000000001

The problem with modernc

modernc.org/sqlite is a pure-Go translation of the SQLite C source. No CGO, no linker headaches, go build just works. For two years it carried every HOROS service in production.

Then production started pushing back.

Seven pain points accumulated:

  1. No -race support. The transpiled C code trips the race detector on every concurrent access. Every test run required -race=false or creative workarounds. We shipped a queue library (squeueHA) with 31 tests — none of them ran with -race.

  2. Allocation overhead. Every query allocates intermediate Go objects for the C-to-Go bridge. On hot paths (Publish/Claim/Ack in squeueHA), this added 57 allocations per cycle where the C driver needs 11.

  3. No WAL callbacks. modernc doesn't expose sqlite3_wal_hook. Our observability layer (tracqlite) needed to know when WAL checkpoints happened. We had to poll.

  4. No sqlite3_trace_v2. Same story. The trace driver that powers tracqlite was a wrapper around modernc's query interceptor, which doesn't see PRAGMA execution or internal statements.

  5. Schema validation gap. We wanted to enforce STRICT tables and column constraints at open time. modernc doesn't expose sqlite3_table_column_metadata.

  6. Static linking friction. modernc produces a Go binary, but the translation layer has its own memory model. Mixing it with musl for static builds required careful GOOS/GOARCH matching.

  7. Performance ceiling. On the squeueHA hot path (the pattern every HOROS pipeline uses), modernc was 2-6× slower than native C SQLite, depending on the query shape.

The decision

We wrote cwasq: a thin CGO wrapper around unmodified SQLite 3.48 (amalgamation), compiled with musl-gcc for static linking. The wrapper exposes exactly what we need: database/sql driver, trace callbacks, WAL hooks, PRAGMA introspection, and a policy table (_horosqlite) for per-database configuration.

The design is "closed-world": every database opened through cwasq gets a hidden _horosqlite table that stores driver-level policies (encryption keys, trace config, schema version). The application never sees it. The driver reads it at Open() and enforces it silently.

The numbers

Machine: Intel i9-14900K, Linux, WAL mode, single writer. All benchmarks run with -race (only possible with cwasq).

Workload: squeueHA hot path — the pattern that every HOROS pipeline runs millions of times per day.

Query modernc (ns/op) cwasq (ns/op) Speedup modernc allocs cwasq allocs
Publish (INSERT) 15,300 7,265 2.1× 13 5
Claim (UPDATE+RETURNING) 910,000 388,600 2.3× 37 4
Ack (DELETE) 56,000 9,514 5.9× 8 2
Full cycle (P+C+A) 106,000 22,465 4.7× 57 11

The allocation reduction matters as much as the raw speed. On a pipeline processing 50,000 payloads per hour, 57→11 allocations per cycle means the GC runs less often, which means fewer latency spikes under load.

The migration

19 services. Zero downtime. One session.

The migration was mechanical:

  1. Replace import _ "modernc.org/sqlite" with import _ "github.com/hazyhaar/horos48/cwasq".
  2. Change build command to CC=musl-gcc CGO_ENABLED=1 go build -ldflags '-linkmode external -extldflags "-static"'.
  3. Run go test -race ./... — now possible for the first time.
  4. Rebuild all 76 binaries with predeploy.sh.
  5. Deploy with service-by-service restart (no batch, each service gets its own systemctl restart).

The entire migration — driver development, 40+ tests, 10 bugs found by three LLM auditors, full deployment to production — took one session. Sixteen hours. Three worker terminals running in parallel.

After deployment, tracqlite confirmed: zero ghost writers, zero WAL anomalies, all 19 services reporting traces through the native sqlite3_trace_v2 callback.

What we kept

modernc.org/sqlite is excellent software. The pure-Go constraint it solves is real. If a project needs go build without a C toolchain, modernc is the right choice.

For HOROS, the production pain points outweighed the build simplicity. A C driver with musl static linking gives the same deployment story (single binary, no shared libs) while unlocking -race, native callbacks, and measurable performance gains on the hot path.

The 40+ tests in cwasq run with -race on every CI push. That alone justified the migration.

Source

Benchmark code: cwasq/bench_squeueha_test.go in the horos48 monorepo (private).

cwasqsqlitebenchmarkcgoperformance