Files
local-deep-research/docs/decisions
LearningCircuit d18887df24 fix(auth): atomic post-login settings + regression test, supersedes #3487 (#3502)
* fix(auth): atomic settings reload + app.version update on login

Previously, the post-login settings-version-mismatch path committed
twice: once after load_from_defaults_file() wrote ~498 default
setting rows, and again after update_db_version() wrote the
app.version marker. app.version is NOT in default_settings.json —
it is only ever written by update_db_version(). Any failure between
the two commits (crash, lock timeout, engine dispose mid-transaction)
left app.version unwritten, so db_version_matches_package() kept
returning False and every subsequent login re-ran the 498-row bulk
insert. This is the "sticky loop" that made container restarts
ineffective for the reported login-hang-after-idle symptom.

Changes:

1. SettingsManager.update_db_version now accepts commit=True
   (default, backward-compatible). Passing commit=False stages
   the version row in the session but does not commit, so the
   caller can combine it with other writes into one atomic
   transaction.

2. _perform_post_login_tasks step 1 now uses that flag to run
   load_from_defaults_file + update_db_version in a single
   session.commit() at the end. Either both persist or neither
   does — no more partial state.

Test plan:
- Existing test_update_db_version tests still pass (default
  commit=True preserves the old behaviour).
- New test_update_db_version_commit_false verifies that passing
  commit=False stages the row but does not call session.commit().

Part of the login-hang series. Independent of the other PRs.

* test(auth): lock in post-login atomicity + dispose-survival invariants

Follow-on to the atomic settings reload in the previous commit. Three
load-bearing properties are now guarded by regression tests and in-code
invariants:

1. Mid-write failure rolls back to a clean pre-write state — the next
   login retries fresh instead of entering the sticky loop that PR
   #3487 tried to prevent with a speculative dispose skip guard.
2. Happy-path atomic block restores both defaults and `app.version`
   together.
3. `engine.dispose()` does NOT break a thread holding a checked-out
   connection — SA 2.0's documented contract (`QueuePool.dispose`
   drains only idle entries, `Engine.dispose` calls `pool.recreate()`).
   20-iteration stress test against a real SQLCipher+WAL engine.

Also:

- Strengthened the comment on the post-login atomic block
  (`routes.py`) as an explicit ATOMICITY INVARIANT: splitting into
  two commits regresses to the sticky loop.
- Documented the caller contract for `load_from_defaults_file` and
  `update_db_version` (`settings/manager.py`): pass `commit=False`
  and own the terminal commit yourself.
- Rewrote the dispose-loop comment in `connection_cleanup.py` to
  record the SA 2.0 safety argument, so nobody re-adds a
  `checkedout() > 0` skip guard without a real reproducer (see PR
  #3487 discussion).
- Added ADR-0004 addendum summarising the PR #3487 investigation and
  pointing at the regression guard.

No change to `connection_cleanup.py` logic — dispose remains
unconditional. Supersedes PR #3487.
2026-04-16 23:04:01 +02:00
..