Hosted oneditbitcoin.comvia theHypermedia Protocol

Performance Gabo Laptop 2026-05-26 21:08

Yes — fresh run shows the situation changed.

Current status

Now there are SQLite busy events.

SQLite

| Metric | Current | |---|---:| | Snapshot time | 2026-05-26 20:52:25 | | Write tx currently in flight | None right now | | begin_busy events | 55 | | Caller hitting busy | hmnet.(*Node).connect | | hmnet.(*Node).connect p90 total | 20.74 s | | hmnet.(*Node).connect p99 total | 20.85 s | | hmnet.(*Node).connect begin_wait p90 | 10.37 s | | hmnet.(*Node).connect begin_wait p99 | 10.42 s | | WAL size on disk | 0 B | | DB size | 4.0 GiB |

This is the important line:

hmnet.(*Node).connect
busy: 55
begin_wait p90/p99: ~10.4s

That means connect() is now repeatedly trying BEGIN IMMEDIATE, waiting the SQLite busy timeout, and failing with database is locked.

But the lock holder is unclear

The page’s begin-busy attribution only shows:

sqlitex.WALCheckpointer: 25 events, Σ held during wait 6.9ms

That is not enough to explain 10-second busy waits. WAL checkpoint holds are tiny here.

So this suggests one of these:

  1. A lock holder is missing from instrumentation

    • some SQLite writer path not going through the instrumented wrapper;

    • or a bare connection / external process.

  2. hmnet.(*Node).connect is stampeding

    • many connect attempts try to write peer state at once;

    • they pile up behind one writer;

    • some hit the 10s timeout.

  3. SQLite sees contention even though the debug page doesn’t show a long holder

    • possible if the holder is outside the tracked transaction system.

The debug page says “no write transactions in flight right now”, but the recent history clearly recorded busy failures around 20:51–20:52.

Network

Network is still bad too.

| Metric | Current | |---|---:| | Uptime | 29m38s | | connected_sync p50 | 10.06 s | | connected_sync p90 | 19.44 s | | connected_sync p99 | 21.23 s | | Discovery connected p50 | 17.95 s | | Discovery connected p99 | 21.36 s | | Dial p99 | 14.95 s | | Reconcile RPC p99 | 3.11 s |

Sync outcomes:

| Outcome | Count | |---|---:| | ok | 8305 | | dial_failed | 19265 | | preempted | 3309 | | rpc_error | 799 | | putmany_failed | 0 |

So the app has two active problems now:

  1. Network discovery is very slow.

  2. hmnet.(*Node).connect is now producing SQLite busy failures.

What changed vs previous run

Previous run:

SQLite busy: 0
hmnet.connect looked fine

Current run:

SQLite busy: 55
hmnet.connect begin_wait p99: ~10.42s
hmnet.connect total p99: ~20.85s

So the SQLite problem reappeared, and the visible failing caller is:

backend/hmnet/connect.go
hmnet.(*Node).connect

My current diagnosis

The current app performance issue is likely:

network peer churn / many syncs / many dials
        ↓
many hmnet connect() calls
        ↓
connect() writes peer state directly to SQLite
        ↓
many concurrent BEGIN IMMEDIATE attempts
        ↓
some wait ~10s and fail SQLITE_BUSY
        ↓
daemon/network flow slows or errors

This matches the earlier suspicion: peerWriter batches some peer writes, but connect() still has a direct synchronous DB write path.

Most suspicious fix target

hmnet.(*Node).connect should probably not directly perform synchronous peer-state write transactions under high network churn.

Likely options:

  • route connect() peer DB updates through peerWriter;

  • dedupe/throttle connect writes per peer;

  • skip redundant writes before opening a write tx;

  • avoid writing on every connect event;

  • make peer-state writes single-owner/batched.

Recap

The fresh run shows SQLite busy is back. The failing caller is hmnet.(*Node).connect, with 55 busy events and ~10.4s begin_wait. WAL is still 0B, and the WAL checkpointer is not holding long enough to explain the busy waits. Network discovery is also still slow: connected_sync p50 10.06s, p99 21.23s, with many dial_failed and preempted syncs.

  • Main new finding: hmnet.(*Node).connect is now the SQLite-busy hotspot.

  • WAL file is not the issue right now: db.sqlite-wal = 0B.

  • Network churn likely drives many connect() writes.

  • Next code area to inspect/fix: backend/hmnet/connect.go, especially direct DB writes not routed through peerWriter.

Do you like what you are reading? Subscribe to receive updates.

Unsubscribe anytime