Commit Graph

163 Commits

Author SHA1 Message Date
Ishitta
cc98aa95f5 docs: add interpretation guide to BENCHMARKING.md (#3723) 2026-04-29 00:35:21 +02:00
github-actions[bot]
1126e3747d chore: auto-bump version to 1.6.4 (#3682)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-04-28 19:31:07 +02:00
LearningCircuit
2f7056a52c feat(notifications): default-off + env-only master switch for SSRF rebinding risk (#3675)
* docs(security): document DNS-rebinding TOCTOU window in notification SSRF

The notification URL validator (PR #3092 / #3311) resolves hostnames once
at validation time and checks resolved IPs against private ranges, but
Apprise re-resolves at send time -- a DNS-rebinding attacker can serve a
public IP at validation and a private IP at send. Apprise exposes no
DNS/Session hook to close this in code without fragile monkey-patching of
its plugin internals.

Given LDR's threat model (single-tenant local app, @login_required on
settings routes, per-user encrypted SQLCipher DBs), the residual risk is
acceptable as long as it's visible. This change makes it visible:

- Updated the inline comment in NotificationURLValidator._is_private_ip
  to describe the TOCTOU window and recommend plugin schemes
  (discord://, slack://, ntfy://, etc.) over raw http(s):// webhooks.
- Added a parallel comment in ssrf_validator.validate_url, since
  safe_requests has the same pattern.
- Added a "Notification Webhook SSRF" subsection to SECURITY.md with
  the rebinding window, the rationale for not closing it in code, the
  threat-model factors that make it acceptable, and operator-side
  mitigations (prefer plugin schemes, restrict egress).

No behavior change.

* feat(notifications): default-off + env-only master switch (LDR_NOTIFICATIONS_ENABLED)

Outbound notifications via Apprise carry a known DNS-rebinding TOCTOU
window: the URL validator resolves once at config time, but Apprise
re-resolves at send time, and a logged-in user with a controllable domain
can rebind to internal services on the LDR server (e.g.
127.0.0.1:<internal-port>) or the local network. The window cannot be
closed in code without fragile monkey-patching of Apprise's plugin
internals (HTTPS-only, doesn't follow redirects).

Since LDR is multi-user (per-user SQLCipher DBs behind @login_required),
the right default is to keep outbound notifications off until the
operator explicitly opts in -- a server-level decision, not something a
logged-in user can flip via the settings API.

Changes:
- Add notifications.enabled env-only setting (default False), registered
  alongside notifications.allow_private_ips in env_definitions/security.py.
  Auto-mapped to LDR_NOTIFICATIONS_ENABLED.
- NotificationManager reads the env at __init__ and gates send_notification
  before any other check; force=True bypasses per-user toggles only, never
  the operator switch.
- NotificationService now takes enabled=False; test_service refuses with
  a clear error pointing at LDR_NOTIFICATIONS_ENABLED. The settings route
  /api/notifications/test-url passes the env-read value through.
- Refresh inline TOCTOU comment in NotificationURLValidator._is_private_ip
  to reflect the new gate, and add a parallel comment near getaddrinfo in
  ssrf_validator.py for cross-cutting consistency (same TOCTOU pattern).
- Rewrite the SECURITY.md "Notification Webhook SSRF" subsection: lead
  with "disabled by default", explain how to enable, document the residual
  risk operators are accepting when they flip the switch.
- Tests:
  - tests/notifications/conftest.py autouse-enables the gate so existing
    tests exercising the inner logic still work.
  - TestMasterSwitchEnvGate covers the gate behavior explicitly: env unset
    => send_notification returns False (even with force=True), test_service
    returns a disabled error.
  - TestNotificationManager in test_notification_coverage.py gets a
    class-scoped autouse fixture for the same reason.
  - Existing NotificationService(...) calls in tests pass enabled=True so
    their inner-logic assertions keep working.

This is a behavior change. Existing users with notifications working will
need to set LDR_NOTIFICATIONS_ENABLED=true on upgrade.

* fix(notifications): rename env gate to allow_outbound + clearer logs + docs

Two issues with the previous commit:

1. Key collision. The env gate was named notifications.enabled, which is
   already a (currently dormant) per-user DB setting in
   default_settings.json. Renaming the env-only setting to
   notifications.allow_outbound (env: LDR_NOTIFICATIONS_ALLOW_OUTBOUND)
   keeps the two layers distinct. Symmetric with the existing
   notifications.allow_private_ips env-only setting.

2. Log levels. The gate-closed paths logged at DEBUG, which is invisible
   under default log configuration. An operator wondering why
   notifications aren't firing wouldn't see the actionable signal.
   Upgrade to WARNING with messages that explicitly name the env var and
   point at SECURITY.md.

Also:

- Regenerate docs/CONFIGURATION.md (auto-generated from env definitions
  + default_settings.json) so LDR_NOTIFICATIONS_ALLOW_OUTBOUND appears in
  the env-only table at line 52.
- Add a "Server-Side Opt-In Required" section at the top of
  docs/NOTIFICATIONS.md, including the symptoms an operator would see
  when the gate is closed (so debugging "why isn't this working?" is a
  one-step lookup).
- Rename NotificationService kwarg enabled -> outbound_allowed and the
  manager's self._notifications_enabled -> self._outbound_allowed for
  internal consistency with the new setting name.
- Update tests + conftest accordingly. 507 tests pass, pre-commit clean.

* fix(notifications): defense-in-depth gate in service.send() + module-scope test fixture

Two non-blocker recommendations from code review on PR #3675:

1. service.send() did not enforce the outbound_allowed gate itself --
   the manager always wraps it, but a future direct caller could bypass.
   Add the same WARNING-level guard at the top of send() that
   test_service already has, so the security boundary lives at the
   service layer (one place) instead of relying on call-chain
   discipline.

2. Promote tests/web/services/test_notification_coverage.py's autouse
   gate-opening fixture from class-scope (TestNotificationManager only)
   to module-scope, so any future test class added to the file picks it
   up automatically. Drop the now-redundant class-scoped duplicate.

Tests:
- TestSendOutboundGate in tests/notifications/test_service.py covers the
  new gate: outbound_allowed=False => send() returns False without
  touching Apprise (.notify and .add must not be called); gate open =>
  the existing send path runs.
- _make_service helper in test_service_extra_coverage.py now sets
  outbound_allowed=True so the SSRF/Apprise-failure tests exercise the
  inner logic, not the gate.

509 passed, 1 skipped, pre-commit clean.
2026-04-27 23:18:00 +00:00
LearningCircuit
08969f5ad2 config: disable general.enable_fact_checking by default (#3672)
* config: disable general.enable_fact_checking by default

Flip the default for general.enable_fact_checking from true to false.

When enabled, the citation handler does an extra LLM call per
analyze_followup that re-analyzes sources for consistency and injects
the result into the synthesis prompt. For agentic strategies (LangGraph
in particular) this is largely duplicative — the agent already
cross-references sources during its tool loop — and adds cost,
latency, and prompt bloat without clear quality gains.

Users who rely on the extra validation pass can re-enable it via the
setting. See #3671 for the discussion and trade-offs.

Refs #3671

* config: sync docs and edge-case test to new fact-check default

- Regenerate docs/CONFIGURATION.md so the documented default for
  general.enable_fact_checking reflects the new value (false).
- Update test_fact_checking_enabled_by_default in
  test_citation_handler_edge_cases.py: with no settings_snapshot, the
  handler now sees fact-checking as disabled, so analyze_followup
  invokes the LLM only once. Renamed to
  test_fact_checking_disabled_by_default and flipped the call-count
  assertion accordingly.

Refs #3671
2026-04-26 16:03:01 +02:00
github-actions[bot]
c45785dc63 chore: auto-bump version to 1.6.2 (#3639)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-04-26 12:11:39 +02:00
LearningCircuit
d3570c355a refactor: remove dead benchmark and citation functions (#3187)
* refactor: remove dead benchmark and citation functions

* cleanup: drop orphan cli.py stub, orphaned tests, stale docs

Follow-up to #3187 addressing djpetti's review and the failing
All Pytest Tests + Coverage check.

- Delete benchmarks/cli.py entirely. The file was already shadowed by the
  benchmarks/cli/ package (same import path), so the deprecation stub was
  unreachable dead code.
- Remove test classes that imported now-deleted functions:
  check_system_resources, plot_parameter_importance, plot_quality_vs_speed,
  CitationFormatter._to_superscript. This is what the pytest lane was
  failing on.
- Update docs/cli-tools.md and benchmarks/metrics/README.md to drop
  references to the removed CLI module and plot helpers.
2026-04-26 10:58:31 +02:00
LearningCircuit
ac23d3f847 docs(websocket): document auth requirement for WS handshake (#3658)
Two small additions reflecting PR #3127:

- troubleshooting.md: add an "Authentication / expired session" bullet
  to the WebSocket Progress-Updates-Not-Showing section so users know
  the handshake can be rejected after session expiry or server restart.

- env_configuration.md: clarify in the CORS / WebSocket Security section
  that WebSocket connections require an authenticated session in
  addition to passing the CORS check.
2026-04-26 08:42:59 +02:00
github-actions[bot]
503d244562 chore: auto-bump version to 1.6.0 (#3399)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-04-21 01:46:19 +02:00
LearningCircuit
3b1d6c6b2f feat: redesign journal quality system with data-driven scoring and predatory auto-removal (#3081)
* feat: redesign journal quality system with data-driven scoring and predatory auto-removal

Replace the expensive LLM-based journal scoring (SearXNG + AdvancedSearchSystem
per journal) with a tiered data-driven approach:

Tier 0: DB cache (instant, from previous runs)
Tier 1: Predatory check — auto-removes results from blacklisted journals/publishers
Tier 2: OpenAlex snapshot — h-index + DOAJ from ~217K sources (downloaded at runtime)
Tier 3: DOAJ check — quality floor for open access journals (downloaded at runtime)
Tier 4: LLM analysis — SearXNG fallback (now optional, not required)

Bundled data:
- Stop Predatory Journals: 6K predatory publishers/journals (MIT license)

Downloadable data (CC0, loaded if present):
- OpenAlex sources snapshot: 217K journals/conferences with h-index, impact factor
- DOAJ journals: 22K+ journals with DOAJ Seal status

Key changes:
- Extended Journal DB model with bibliometric fields (h-index, impact factor,
  DOAJ, predatory status, provenance tracking) + Alembic migration
- JournalReputationFilter now uses tiered scoring with journal dedup
- SearXNG no longer required — filter works with bundled data alone
- Predatory journals auto-removed (with whitelist override for false positives)
- Added journal filter to Semantic Scholar (was the only scientific engine without it)
- OpenAlex results now include source_id and source_type for direct lookups
- Fixed score parsing (regex instead of strict int()), prompt truncation,
  fail-fast on SearXNG failures, lru_cache on name cleaning

* fix: address code review findings from Round 1

- Remove dead __check_result method, update tests to use filter_results
- Fix predatory substring matching (min length guard prevents false positives)
- Add name parameter to is_whitelisted for journals without ISSN
- Fix migration: server_default for Booleans, correct index creation logic
- Improve safety net logging in filter_results

* fix: forward journal quality fields through _get_full_content (Round 2 review)

OpenAlex _get_full_content was constructing a new result dict without
forwarding journal_ref, openalex_source_id, and source_type from the
preview. This effectively disabled journal quality filtering for all
OpenAlex results since the content filters run after full content
retrieval and couldn't find the journal_ref key.

* fix: address Round 3 review findings — bugs, thread safety, tests

Critical bug fixes:
- Add missing quality_model column to migration 0005
- Fix dedup to use richest metadata (two-pass approach)
- Predatory cache entries no longer expire via normal TTL

Performance:
- Build indexed sets for predatory data at load time (O(1) exact match)
- Add threading.Lock for singleton and lazy property loading

Data quality:
- Deduplicate predatory.json (removed 21 dupes)

Test coverage (38 new tests):
- JournalDataManager: derive_quality_score, is_predatory, is_whitelisted,
  lookup_openalex, lookup_doaj, _expand_openalex_record, singleton

* fix: address all review findings — critical bugs, security, performance

Critical bugs: NASA ADS journal_ref, empty string guard, regex name
cleaning with LLM fallback, DOAJ field overwrite protection, predatory
cache TTL re-evaluation.

Security: prompt injection sanitization, log injection prevention,
Unicode NFKC normalization for predatory lookups.

Important bugs: predatory publish-after-indexes race fix, Tier 0 DB
error handling.

Performance: regex-based name cleaning eliminates ~5 LLM calls/batch.

* fix: .text() → .content for LangChain, improve regex name cleaning

Critical runtime fix:
- LangChain AIMessage has .content attribute, not .text() method.
  Both LLM calls in the filter (name cleaning and Tier 4 scoring)
  would crash with AttributeError at runtime. Fixed both occurrences
  and updated all test mocks.

Regex improvements:
- Add bare trailing citation number stripping (", 95, 146802")
- Add volume(issue) pattern stripping ("141(5)")
- Fix month regex: require at least 1 digit after month name and
  add word boundaries (prevents "May" in journal names being stripped)
- Only skip LLM when regex result has no residual numerics — complex
  citation strings like "Phys. Rev. Lett. 95, 146802 (2005)" correctly
  fall through to LLM instead of returning partially-cleaned name

* feat: add journal quality dashboard at /metrics/journals

Dashboard with summary stats, quality distribution chart, score source
doughnut, sortable/filterable journal table with pagination, quality
badges, trust signal icons, empty state, help panel, mobile responsive.

API: GET /metrics/api/journals — all journals + summary in one call.

* fix: XSS prevention, missing API fields, sort null handling in dashboard

Security:
- Add escHtml() helper for HTML entity escaping in all innerHTML
  injections (journal names, publishers, predatory_source, source badges)
- Prevents XSS via crafted journal names containing HTML/JS

API:
- Add works_count and cited_by_count to journal API response
  (bibliometric fields useful for dashboard display)

UX:
- Fix sort comparison with null values: nulls pushed to end consistently
  instead of unpredictable placement from mixed Infinity/string comparison

* fix: dashboard null-quality filter, avg h-index N/A, core label

- Fix null-quality journals appearing in predatory tier filter
  (quality || 0 coerced null to 0, which passed predatory check)
- Fix avg h-index showing "0" when no journals have h-index data
  (API now returns null, frontend shows "—")
- Rename "Scopus Indexed" to "Core Indexed" (OpenAlex is_core
  is CWTS core status, not Scopus indexing)

* feat: SQLite reference DB for dashboard with server-side pagination

Replace client-side 212K journal array with a shared read-only SQLite
database built from bundled JSON on first access. Near-zero RAM usage.

* perf: split summary from pagination queries in journal dashboard

Summary stats + chart data (3 SQL queries, ~130ms) are now fetched
only on initial page load via include_summary=true param. Subsequent
pagination, sorting, and filter changes only fetch the journal page
(1 query, ~7ms), making navigation feel instant.

* fix: expose Chart.js globally, split summary from pagination queries

- Add window.Chart = Chart in app.js so inline scripts can use Chart.js
  (was imported but never exposed on window — caused ReferenceError)
- Split summary from pagination: include_summary=true only on initial
  load, page/filter/sort skip the 3 extra SQL queries
- NOTE: run `npm run build` to rebuild the Vite bundle

* fix: guard Chart.js usage and defer initial load for module script timing

The Vite bundle loads as type="module" (deferred), but the inline
script in journal_quality.html runs immediately. Chart is not yet on
window when the script executes, causing ReferenceError that kills
the entire script block including the data loading call.

Fix: guard Chart usage with typeof checks, defer loadJournalPage
to window.onload so module scripts have finished executing.

* feat: upgrade journal filter logs from debug to info level

Users can now see the tiered scoring process in their logs:
- Tier 0: cache hit with score
- Tier 1: predatory detection + whitelist override
- Tier 2: OpenAlex match with h-index
- Tier 3: DOAJ match with seal status
- Tier 4: LLM analysis result
- Summary: passed/below-threshold/predatory breakdown

* fix: add 'the' prefix fallback for journal name lookups, add lookup logs

Many OpenAlex journals start with 'The ' (e.g., 'The Astrophysical
Journal Letters') but ArXiv journal_ref omits it. Now tries with/without
'the ' prefix when exact match fails — fixes ~5K potential Tier 2 misses
that would unnecessarily fall through to expensive Tier 4 LLM analysis.

Applied to both JournalDataManager (in-memory) and JournalReferenceDB
(SQLite). Added debug-level logs for lookup hits/misses.

* feat: quality tags in sources, sidebar menu, documentation

- Attach journal quality score to each result in filter_results
- Display quality tags in research output source lists:
  [Q1 ★★★★★] for elite, [Q2 ★★★] for moderate, etc.
- Add "Journals" item to sidebar under Analytics section
- Create docs/journal-quality.md with full system documentation

* fix: restore docstrings, increase DOAJ Seal score, fix truncated file

Address djpetti's review comments:
- Restore full Args/Returns docstrings on __init__, create_default,
  __db_session, __make_search_system, __clean_journal_name,
  __analyze_journal_reputation, __save_journal_to_db
- Remove "unlike the previous version" reference from create_default
- Add clarifying comment on regex vs LLM name cleaning tradeoff
- Increase DOAJ Seal score from 6 to 7 (2-point spread vs 1-point)
- Fix file truncation from disk-full error (line 763)

* refactor: move build logic into journal_reference_db module

Eliminate sys.path hack, make build logic importable. Script is now
a thin CLI wrapper. derive_quality_score imported from data_manager
(canonical copy) instead of duplicating.

* fix: review findings — docs, sidebar, dashboard, test gaps

Address final review round findings:
- Fix DOAJ Seal score in docs (6→7)
- Sidebar: use url_for() instead of hardcoded URL
- Template: set active_page='journal-quality' for sidebar highlight
- Rename stat-scopus to stat-seal with label "DOAJ Seal" (was mislabeled)
- Always use window.onload for initial load (readyState fast path unsafe)
- Add tests for _format_quality_tag (6 tests, all 5 tier branches + None)
- Add tests for "the" prefix fallback in lookup_source (2 tests)

* feat: add CORE conference rankings (795 CS conferences)

Bundle CORE Rankings (ICORE2026) for automatic conference scoring:
A*→9, A→7, B→5, C→4. Acronym + proceedings prefix matching.
Eliminates Tier 4 LLM calls for major CS conferences.

* feat: add data source attribution to journal quality dashboard

Credit the open academic data projects that make the dashboard possible:
OpenAlex (CC0), DOAJ (CC0), CORE Rankings, Stop Predatory Journals (MIT).
Displayed as an attribution section at the bottom of the page.

* fix: remove CORE conference data (no open license)

CORE Rankings are copyrighted (c) 2013 Computing Research & Education
with no published open license. Redistribution in an MIT project is
not permitted without explicit permission.

Removed core_conferences.json from bundled data. The build function
_load_core_conferences gracefully returns {} when the file is absent.
Conference matching still works via OpenAlex data + proceedings prefix
stripping.

Verified remaining data licenses:
- OpenAlex: CC0 Public Domain (confirmed)
- DOAJ metadata: CC0 (confirmed on doaj.org)
- Stop Predatory Journals: MIT License (confirmed in GitHub LICENSE)

* docs: add data source attribution to README, docs, code, and dashboard

Credit open academic data projects at multiple touchpoints:
- README.md: Journal Quality feature links to data sources
- docs/journal-quality.md: expanded attribution table with websites
- data/__init__.py: license details per bundled file
- journal_reference_db.py: data sources in module docstring
- Dashboard: attribution section with links (already added)

All bundled data verified: OpenAlex (CC0), DOAJ metadata (CC0),
Stop Predatory Journals (MIT).

* fix: DOAJ Seal score consistency across all tiers

Tier 2 (OpenAlex) now cross-references DOAJ for Seal status via
dm.has_doaj_seal(issn). Tier 3 now calls derive_quality_score
instead of hardcoding score=6. All tiers consistently score
DOAJ Seal at 7. Fixed docs inconsistency.

* feat: add CitationMetadata model for structured academic metadata

New citation_metadata table stores bibliographic data on academic
research sources using CSL-JSON vocabulary. 1:1 with ResearchResource.

- CitationMetadata model: doi, arxiv_id, pmid, authors, year,
  volume, issue, pages, container_title, journal_id FK, csl_json
- Migration 0006: create table + indexes
- citation_normalizer.py: engine-specific → CSL-JSON normalization
- extract_links: preserve citation fields (was dropping 90% of data)
- research_sources_service: create CitationMetadata for academic sources
- Quality never stored — derived via journal_id at query time

* refactor: simplify Journal table to only cache Tier 4 LLM results

Tiers 1-3 use bundled data (instant, no caching needed). Only Tier 4
(LLM) results cached in DB. Wire up journal_id FK on CitationMetadata.

* feat: auto-download journal data from GitHub Releases

Replace bundled data files with on-demand download:
- journal_data_downloader.py: fetch from GitHub Releases on first use
- Data in user dir (not package dir, read-only in pip installs)
- Dashboard shows download banner when data missing
- API: GET/POST /metrics/api/journal-data/{status,download}
- predatory.json (307KB) stays bundled, large files never in git

* refactor: fetch journal data from APIs instead of GitHub Releases

Fetch directly from OpenAlex and DOAJ public APIs. No redistribution
concerns — data fetched fresh from CC0 sources (~3 min first run).

* fix: review findings — h_index=0 edge case, dead code, missing field

- derive_quality_score: h_index=0 no longer bypasses DOAJ Seal score
  (0 means newly indexed, not low quality)
- citation_normalizer: remove dead arxiv check in detect_engine
- extract_links: add source_engine to preserved fields
- paths.py: fix stale docstring (GitHub Releases → APIs)

* fix: DB race condition and journal name normalization (Round 3 review)

- Wrap __save_journal_to_db commit in try/except to handle concurrent
  inserts gracefully (rollback + warning) instead of incorrectly
  incrementing the SearXNG failure counter
- Add geographic qualifier stripping to regex cleaner: "(London)",
  "(New York)", "(US)" etc. are now stripped deterministically,
  preventing duplicate scoring of the same journal under variant names

* fix: DB race condition and journal name normalization (Round 3 review)

- S2 close() now calls super().close() to properly clean up the
  JournalReputationFilter (SearXNG engine + LLM). Before this fix,
  adding content_filters to S2 created a resource leak since S2's
  close() override didn't delegate to BaseSearchEngine.close().

* fix: DB race condition and journal name normalization (Round 3 review)

- Fix predatory substring matching: check both directions for renamed
  publisher variants while keeping >= 10 char guard
- DB cache read: logger.exception for stack trace preservation
- Model Boolean columns: add server_default=sa_false()
- Migration downgrade: drop indexes before columns

* fix: correct url_to_quality type annotation after merge (Round 4 review)

Type was `dict[str, dict]` but values are `int` scores from the journal
quality filter. Changed to `dict[str, int]`.

* fix: CI failures — sensitive logging and file write allowlist

- journal_data_downloader: use logger.exception() instead of f-string
  with exception variable (sensitive-logging check)
- Add journal_data_downloader.py to file-write security check allowlist
  (writes public CC0/MIT journal metadata, not user data)

* fix: skip journal reference DB tests when DB not built (CI timeout fix)

The test fixture was calling db.available which triggers _get_conn()
which auto-downloads 200K+ sources from OpenAlex API. In CI this caused
60s timeouts on 26 tests. Now checks db_path.exists() directly.

* fix: renumber migration 0005 → 0007 to resolve multiple-heads conflict

Main already has 0005_add_resource_document_id and 0006_add_citation_metadata.
Our migration was also numbered 0005, causing Alembic to reject login with
"multiple heads" error. Renumbered to 0007 with down_revision=0006.

* fix: align test mock chains with real Tier 0 DB query pattern

Tests were mocking .filter_by().first() but real code does
.filter_by().filter(score_source=="llm").first(). Fixed mock chains
to match. Also fixed docs typo: reanalysis_period default 265 → 365.

* fix: journal dashboard showing "not installed" when reference DB exists

get_journal_data_status() only checked for raw JSON source files, not
the compiled journal_reference.db. If the DB existed without source
JSONs (e.g., after cleanup), the dashboard refused to load.

* feat: add DOI-based venue identification and conference detection

Adds a pre-enrichment layer that resolves paper DOIs to OpenAlex source
IDs via batch lookup (up to 50 DOIs per HTTP request). This gives the
journal quality filter a reliable ID-based lookup path instead of
fragile name matching.

Changes:
- New: openalex_enrichment.py — batch DOI → source_id resolution
- Integration hook in search_engine_base.py for scientific engines
- Conference detection heuristic as fallback for papers without DOI
- Year stripping in OpenAlex lookup: "NeurIPS 2023" → "NeurIPS"
- NASA ADS now extracts DOI to result dict
- Fix stale AdvancedSearchSystem mocks in tests

* fix: handle missing thread context in preview filter phase

The journal filter runs as a preview_filter (before LLM relevance) for
instant data lookups. But DB operations (Tier 0 cache, save) require
thread context which isn't available in the preview phase.

Fix: __db_session() returns None when no context available. Callers
skip DB operations gracefully — data-only tiers (1-3) still work.

* feat: disable Tier 4 LLM journal scoring by default (too slow)

* feat: institution scoring tier + DataSource refactor

- New DataSource ABC + registry under utilities/data_sources/ unifying
  openalex, doaj, jabref, predatory, and institutions sources
- Add InstitutionSource (OpenAlex Institutions, ~123K records) for
  affiliation-based scoring of preprints
- Add Tier 3.5 (institution lookup) to journal_reputation_filter
  for the no-journal_ref salvage path and as a max() lift for
  preprint repositories with weak Tier-2 scores
- Extract author affiliations in OpenAlex search engine
- Wire JournalReputationFilter into PubMed engine and fix journal_ref
  field aliasing
- Tighten regex cleaner for journal_ref (year/month/volume debris)
- Delete bundled src/local_deep_research/data/ — all sources now
  fetched at runtime with shared auto_download policy
- Dashboard banner shows all academic data sources with license + status

* refactor: consolidate journal-quality system into one package with SQLAlchemy

- New package src/local_deep_research/journal_quality/ groups all
  journal-related modules (downloader, db, models, scoring, data_sources)
- Single source of truth: gz files compile into one journal_quality.db
  via build_db(); JournalDataManager dict-based loader is deleted
- SQLAlchemy 2.0 ORM throughout (models.py + db.py); filter call sites
  unchanged thanks to dict-shaped lookup return values
- Read-only enforcement at three layers: SQLite mode=ro&immutable=1,
  POSIX chmod 0o444 after build, and a pre-commit hook that bans
  cross-module writable opens of journal_quality.db
- Downloader rebuilds the DB synchronously after each successful fetch
- New tables: predatory_journals/_publishers/_hijacked, institutions,
  abbreviations
- Tests migrated to tests/journal_quality/; 207 tests pass

* fix: P0/P1 bugs from journal-quality code review

- P0: flag hijacked journals as predatory in _populate_sources
  (loaded into predatory_hijacked but never checked against sources)
- P0: insert DOAJ-only journals (~8K rows) via second pass over
  doaj_data; previously only OpenAlex venues entered the DB
- P0: replace `mod._ref_db = None` with `reset_db()` in metrics
  rebuild route (the singleton attr is `_db`, not `_ref_db`)
- P0: change JournalQualityDB._lock to RLock to prevent first-run
  deadlock (_ensure_engine → build_db → reset_db re-acquires lock)
- P1: dedup sources on (name_lower, issn) so print + electronic
  ISSN variants both survive; drop unique=True on Source.name_lower
- tests: cover hijacked, DOAJ-only, and dual-ISSN cases

* fix: resolve CI failures on journal-quality refactor

- pre-commit: add missing .pre-commit-hooks/check-journal-quality-readonly.py
  to git (file existed locally but was never committed, so CI couldn't
  exec it)
- file-writes scan: extend allowlist to cover the new
  journal_quality/downloader.py and journal_quality/data_sources/*.py
  paths (the old `journal_data_downloader.py` entry no longer matches
  after the package move)
- mypy: fix 12 errors in journal_quality/db.py
  - explicit list[] annotation on `wheres`
  - dict comprehension on Row sequence in get_source_distribution
  - wrap loader returns in dict() so SQLAlchemy stub Any-types resolve
  - type: ignore[arg-type] on bulk_insert_mappings (known stub gap;
    SQLAlchemy 2.x types accept type[T] at runtime but stubs say Mapper)
- CodeQL py/incomplete-url-substring-sanitization: anchor doi.org URL
  parsing on scheme prefixes instead of substring `in` check

* refactor: address djpetti review comments on journal quality system

Tier 4 LLM scoring is now opt-in via the new
search.journal_reputation.enable_llm_scoring setting (default off) instead
of being unreachable behind a hardcoded flag. The redundant in-process
lru_cache on the LLM analyzer is gone - Tier 0 (DB cache) already covers
repeat lookups, and keeping the cache only masked DB write failures.

Trailing-year stripping for conference names ("NeurIPS 2023" -> "NeurIPS")
moves into __regex_clean_journal_name where it belongs, replacing the
post-hoc retry block in __score_journal.

DOAJ Seal score bumped 7 -> 8 to reflect the certification meaning more
faithfully (top ~10% of DOAJ journals, curated against best OA practices).
The h-index >= 7 tier mapping is unchanged so no test fixtures break.

Adds /api/journals/research/<id> + a "View Journals" button on the research
details page so users can see the journals encountered in a single research
session, not just the cross-research aggregate. Joins through
CitationMetadata -> ResearchResource without schema changes.

Adds quartile (Q1-Q4) as a display-only signal on Source rows, derived at
build time from cited_by_count percentile within each source_type. Quality
scoring is unchanged - h-index remains the canonical bibliometric.

Magic numbers in scoring.py / db.py extracted into a Journal Quality
Scoring Thresholds section in constants.py. Institution scoring is now
consolidated to scoring.py::institution_score_from_h_index, fixing an
unreachable branch in db.py::score_from_affiliations along the way.

Misc:
- OPENALEX_ENRICHMENT_API_TIMEOUT lifted into constants.py (was hardcoded 15)
- Deleted scripts/build_journal_reference_db.py - auto-build on first
  access plus the dashboard rebuild button cover all use cases

* perf(journal-quality): switch data sources to bulk dumps + release-gate test

Replace paginated REST API fetches with public bulk snapshots:
- OpenAlex Sources: S3 manifest + parts (~280K, ~270s vs 5-10min)
- OpenAlex Institutions: S3 manifest + parts (~120K, ~156s vs 5-10min)
- DOAJ: single CSV dump (~22K, ~2s)

Bulk paths are the OpenAlex/DOAJ-recommended way to pull the full
dataset and eliminate hundreds of rate-limited requests on every
"Download Data" click. Compact output formats are preserved so the
build pipeline and runtime accessors are unchanged.

Add a release-gate integration test + dedicated workflow that
downloads all 5 sources in parallel, builds the reference DB end
to end, and scores a real journal. Catches upstream schema breaks
(renamed fields, restructured dumps) before we cut a release.

* test(journal-quality): exercise dashboard query methods in release gate

* docs(journal-quality): credit upstream data providers on dashboard

* docs(journal-quality): add 'How It Works' tab explaining tiered scoring

* fix(journal-quality): score unknown journals as 3, log institution names

- Lower truly-unknown journals (no OpenAlex/DOAJ/Tier 3.5 hit) from
  pass-through to score 3 so the default threshold (4) actually filters
  them. Distinct from predatory (1) — these are merely unknown.
- Fix AttributeError in OpenAlex search engine when work has DOI key
  with explicit None value: use \`work.get('doi') or work_id\` instead
  of \`work.get('doi', work_id)\`. Was dropping ~14% of results per
  search before they reached the filter.
- Include matching institution names in Tier 3.5 log lines so the
  affiliation salvage path is debuggable.

* refactor(journal-quality): demote per-journal scoring logs to DEBUG, log institutions on score-3

* fix(openalex): handle None values for display_name, id, source.id

OpenAlex routinely returns these keys with explicit null values, which
bypassed the dict.get default and crashed downstream string operations
(slicing, split). Same antipattern as the 'doi' fix in b4f43f3e6.

Errors were causing whole search batches to fail with TypeError:
'NoneType' object is not subscriptable at line 222.

* fix(journal-quality): handle MEDLINE name format + publisher suffixes

PubMed serves journal names in MEDLINE format which OpenAlex doesn't
match directly:
- '[Original-language] English title' → strip leading bracket
- 'Title : long subtitle' → fall back to the head segment
- 'Title. Section name' → fall back to the head segment (>=6 chars)

Also strip trailing publisher names (Elsevier, Springer, Wiley, etc.)
that some engines glue onto the journal_ref.

Was causing Molecular Therapy, Journal of Alzheimer's Disease, and
~6 other major biomed journals to be dropped as score-3 unknowns on
PubMed searches.

* feat(journal-quality): default threshold to 2 (predatory-only)

Drop the default from 4 to 2 so the filter's out-of-the-box behavior
is conservative: predatory journals are still auto-removed, but
unknown/low-confidence venues (score 3) are kept. Users who want
stricter filtering can raise the slider in Settings.

Avoids the 'silently delete sources we don't have data on' problem
that the threshold=4 default was causing on PubMed and arxiv searches.

* docs(journal-quality): document threshold semantics + link to docs from dashboard

- Update docs/journal-quality.md with new tier pipeline (Tier 3.5 + score-3
  floor + Tier 4 off by default), bulk-dump source counts, and threshold table
- Add 'Threshold setting' card to dashboard 'How It Works' tab
- Link to docs/journal-quality.md from the dashboard help tab

* feat(journal-quality): add threshold slider to dashboard help tab

Live slider 1-10 with per-level explanations. Loads the current
value from /settings/api/search.journal_reputation.threshold on
first tab open and saves on change via PUT (debounced 300ms).

* feat(journal-quality): hoist threshold slider to top of dashboard

Compact slider widget below the data sources banner, always visible.
Synchronized with the full slider in the How It Works tab so changing
either updates both. Loads on page open instead of lazy-loading on
tab switch.

* feat(journal-quality): show global toast when threshold slider saves

* feat(journal-quality): make Global Database the default tab

Combines naturally with the threshold slider above — users can
immediately see the score distribution they're filtering against.
Your Research tab moved to second position and lazy-loads on switch.

* feat(journal-quality): show direct dataset links on dashboard sources cards

* fix(journal-quality): point DOAJ dataset link to docs page, not raw CSV

* fix(journal-quality): use DOAJ FAQ for dataset link (public-data-dump 404)

* fix(journal-quality): correct DOAJ dataset link to public-data-dump page

* review(djpetti): address PR review comments

- filter: drop @lru_cache on __clean_journal_name (DB cache covers it)
- filter: fix __db_session docstring (returns None, never raises)
- filter: restore long-form Tier 4 LLM prompt (avoid silent calibration regressions)
- filter: add Tier 3.6 LLM name-cleanup salvage that retries OpenAlex with a
  canonicalised name (gated behind enable_llm_scoring opt-in)
- filter: bump Tier 4 LLM scores by +1 when the journal has the DOAJ Seal
- filter: persist quartile + DOAJ status in __save_journal_to_db so the
  dashboard and Tier 0 cache see the same metadata Tier 2 used
- scoring: derive_quality_score now honours quartile directly (Q1→strong,
  +elite when h-index also tops the threshold)
- model: add Journal.sjr_quartile column + Alembic 0008 migration
- citation_normalizer: take over the canonical _extract_doi
- openalex_enrichment: use project-level USER_AGENT constant
- journal_quality dashboard: default to "Your Research" tab

* review(djpetti): inject project User-Agent into safe_get/safe_post

djpetti's openalex_enrichment.py:124 comment was specifically about
"injected into safe_get", not just using the constant. Make safe_get,
safe_post, and SafeSession.request auto-set User-Agent from the
project-level USER_AGENT constant when the caller didn't supply one.
Drops the manual override in openalex_enrichment except for the email
polite-pool variant.

* review(round-2): six correctness fixes + dashboard quartile + tests

Six confirmed bugs from the 25-agent merge-readiness review (tracked
in plans/spicy-finding-wreath.md), all surgical and confined to files
already touched by this PR:

A. filter: stop losing the negative DOAJ signal
   journal_reputation_filter.py:778-779 (Tier 2) and 908-909 (Tier 4
   DOAJ-Seal bonus) used `is_in_doaj=oa_doaj or None`. `False or None
   == None`, and __save_journal_to_db treats None as "don't update",
   leaving the column NULL after a Tier 2 hit even when OpenAlex told
   us the answer. The bug was not just observability — it broke
   `not is_in_doaj` (scoring.py:82, predatory branch), the predatory
   whitelist override (db.py:1024), and the dashboard trust icon. Tier
   2 now passes the boolean directly; Tier 4 uses `True if seal_bonus
   else None` so the no-bonus case is silent instead of clobbering
   Tier 2 data with a guessed False.

A2. journal_quality.db.reset_db() now holds _db_lock
   The /api/journal-data/download HTTP handler called reset_db()
   concurrently with searches in flight. Without the lock, a third
   thread calling get_db() could pass `if _db is None` while reset()
   was disposing, then short-circuit in _ensure_engine on the still-
   set _engine attribute and return a disposed pool.

A3. __searxng_consecutive_failures is now per-thread
   The filter instance is cached and reused across concurrent searches
   by parallel_search_engine.py. The shared mutable counter was
   clobbered by Thread B's reset, defeating the fail-fast that's
   supposed to disable Tier 4 after 2 consecutive failures. Replaced
   with threading.local() + three private accessors so each thread
   gets its own counter, reset at the top of every filter_results().

A4. PNAS-class journals are now exempt from the conference heuristic
   "Proceedings of the National Academy of Sciences" matched the bare
   `proceedings` regex and was auto-classified as a Q3 conference,
   throwing away its real h-index ~1,400. Same for the Royal Society,
   AMS, LMS, etc. Added a `lower().lstrip().startswith("proceedings
   of ")` guard before the heuristic.

A6. downloader.needs_update logic is no longer inverted
   The check was `installed_version is not None and != latest`, so it
   returned False when no data was installed at all — first-run users
   never saw the "download data" CTA. Changed `and` to `or`. The
   test_no_files test that was catching this now passes.

B. __regex_clean_journal_name strips leading ordinal markers
   "12th International Conference on Machine Learning" now cleans to
   "International Conference on Machine Learning" — has a fighting
   chance of matching OpenAlex.

Polish D. Surface sjr_quartile on the dashboard
   /api/journals/user-research and the per-research endpoints now
   include sjr_quartile on the journal row dict. The Your Research
   and Global Database tables both gain a Quartile column rendered as
   a colored chip (Q1=green, Q2=blue, Q3=yellow, Q4=orange) via a new
   getQuartileChip() helper. Quartile was the entire point of
   migration 0008 + the recent scoring work, and it had been
   computed and persisted but never displayed.

Polish E. Promote "python-requests" literal to _DEFAULT_REQUESTS_UA_PREFIX
   constant in safe_requests.py so a future requests-library rename is
   a one-line edit.

Test C. 30 new unit tests covering the 6 PR fixes
   - test_scoring.py: TestDeriveQualityScoreQuartile (13 tests) —
     Q1/Q2/Q3/Q4 mapping, case insensitivity, Q1 + elite h-index → 10,
     fall-through on unknown quartile, predatory override.
   - test_citation_normalizer.py: extended TestExtractDoi with 7 cases
     (external_ids / externalIds / lowercase / dx.doi.org / http /
     doi field priority / SSRF guard).
   - test_safe_requests.py: TestUserAgentInjection (6 tests) — auto-
     inject when missing, preserve explicit UA, case-insensitive
     header check, no caller-dict mutation, both safe_get and safe_post.
   - test_journal_reputation_coverage.py: TestTier4DoajSealBonus
     (3 tests — bumped, capped at 10, no-bump silent) and
     TestTier36LlmNameCleanup (2 tests — relabel hits OpenAlex on
     retry, relabel-then-miss falls through to Tier 4).

341 tests pass across the affected suite (was 273 before this
commit). No new failures.

* fix(tests): update migration head revision assertions to 0008

The migration chain now has 8 migrations (0001-0008). Tests that
hardcoded "0005" as the expected head revision now correctly expect
"0008". Also renamed test functions to be version-agnostic
(test_head_revision_is_current instead of test_head_revision_is_0005).

* test(security): add tests for 6 critical pre-commit security hooks

Adds 74 tests verifying the security hooks enforce data protection:

- test_deprecated_db_hook: Detects get_db_connection() and raw
  db_manager.get_session() that bypass per-user encrypted databases
- test_ldr_db_hook: Detects shared DB references that would leak data
- test_sensitive_logging_hook: Detects password/API key/token logging
- test_env_vars_hook: Enforces SettingsManager for LDR_* env vars
- test_journal_quality_readonly_hook: Enforces read-only DB access
- test_silent_exceptions_hook: Detects silent except:pass patterns

Test strings use dynamic construction to avoid triggering the very
hooks they test (e.g., _DEPRECATED_DB = "ldr" + ".db").

* docs: fix module docstring to match actual scoring tiers

* fix: move DB cache check from position 0 to before LLM tiers

The DB cache only stores Tier 4 (LLM) results. Tiers 1-3 use bundled
data that is instant and doesn't need caching. Moving the DB cache
check to right before the LLM tiers avoids a needless DB query for
journals that will be scored instantly by the bundled data tiers.

* fix: resolve CI test failures after merge from main

- Fix _content_filters → _preview_filters in arxiv, openalex, and
  arxiv_coverage tests (engines moved journal filter to preview phase)
- Restore migration test assertions from main (0005 not 0008)
- Add citation_metadata to EXPECTED_TABLES in schema stability test
- Wrap create_default settings read in try/except to prevent propagation
  when settings_snapshot raises (fixes S2 coverage test)

* fix(security): prevent exception details from leaking to API responses

CodeQL flagged that raw exception text (e.g. stack traces, internal paths)
was flowing from download_journal_data's error message to the JSON API
response at /api/journal-data/download.

Two fixes:
1. Route handler: separate success/failure paths — on failure, return
   generic "Download failed" to user, log full details internally
2. Downloader: remove {e} from return message, use logger.exception
   instead (logs full traceback server-side without exposing to user)

* refactor: deduplicate papers + add 50 tests (#3446)

* refactor: deduplicate citation_metadata into papers + paper_appearances

Replace the 1:1 citation_metadata table with a properly deduplicated
schema: papers (unique per paper, deduped by DOI/arXiv/PMID waterfall)
+ paper_appearances (junction table linking papers to research resources).

Fixes inflated paper counts in dashboard queries. Migration 0006
rewritten since it hasn't been released yet.

* test: add 28 tests for journal filter tiers, scoring, and new fields

- test_journal_filter_tiers.py: predatory auto-removal, whitelist override,
  OpenAlex/DOAJ tiers, dedup, fail-fast, stale cache, DB error safety net
- test_scoring_edge_cases.py: negative h-index, invalid quartile, Q1+h=0,
  normalize_name edge cases, three-way priority
- test_openalex_new_fields.py: source_id extraction, field forwarding,
  S2 venue→journal_ref mapping

* refactor: slim Paper model to indexed columns + JSON metadata blob

Out of 16 columns on Paper, only 4 are ever queried: doi, arxiv_id,
pmid, journal_id. The other 12 were dead storage. Collapse them into
a single paper_metadata JSON blob (hybrid relational-JSON pattern used
by OpenAlex/Crossref).

SQLCipher compatibility verified: JSON1 extension enabled by default,
LDR already uses 34 JSON columns in encrypted DBs successfully.

Python attribute `paper_metadata` maps to SQL column `metadata`,
mirroring ResearchResource.resource_metadata pattern to avoid
SQLAlchemy's reserved `metadata` attribute.

- citation.py: 13 columns → 4 + 1 JSON blob
- migration 0006: matching slim schema (unreleased, no data migration)
- research_sources_service.py: splits fields into indexed vs metadata
- _merge_identifiers: new signature (paper, indexed, metadata); merges
  missing keys into paper_metadata without overwriting

All 309 tests pass including encrypted DB ORM tests.

* fix: address Round 1+2 review findings on Paper schema slim

1. datetime.date JSON serialization: convert publication_date to ISO
   string in normalize_citation after _build_csl_json consumes it
2. _merge_identifiers SQLAlchemy dirty tracking: copy dict before
   mutating so reassignment is detected by plain JSON column
3. UNIQUE constraints on doi/arxiv_id/pmid to prevent concurrent
   duplicate writes; handle IntegrityError via rollback + refetch
4. container_title lookup chain: add container_title/container-title
   keys for CSL-style callers
5. Per-source exception logging: warning → exception for stack traces

* fix: address Round 3 review findings on journal quality data flow

Critical bugs:

1. Journal name case mismatch broke Paper.journal_id linking
   - research_sources_service.py: _resolve_journal_id used .lower()
     but the filter writes Journal.name in mixed case. Every Paper
     got journal_id=None silently.
   - Fix: use func.lower() on both sides for case-insensitive match

2. AttributeError crash when source["metadata"] is a non-dict
   - citation_normalizer.py: source.get("metadata", {}).get("journal")
     crashes when metadata is a string (default only applies when key
     is absent/None). Fix: explicit isinstance check before .get().

3. Author dict passthrough allows non-JSON-serializable fields
   - citation_normalizer.py: engines like OpenAlex/S2 return author
     dicts with nested affiliation objects, ORCIDs, etc. that may
     not be JSON-safe. Whitelist only CSL name fields (family, given,
     suffix) when passing through existing CSL-format author dicts.

4. predatory_source missing from API response
   - metrics_routes.py: template reads j.predatory_source for the
     tooltip but the route didn't emit it. Added to both journal
     aggregation responses.

* fix: address Round 4 review findings on transaction safety and JSON sanitization

Critical bugs:

1. resource_metadata stores raw untrusted source dict
   - Engine result dicts can contain non-JSON-serializable values
     (nested objects, numpy types, affiliations, date objects). Raw
     embedding would crash json.dumps() at flush time and silently
     lose the source via the per-source except catch.
   - Fix: new _json_safe() recursive sanitizer coerces everything to
     JSON primitives before embedding in resource_metadata.

2. db_session.rollback() wiped entire batch, not just failed source
   - The IntegrityError retry path and per-source except used a full
     session rollback, which lost every previously flushed source in
     the same batch. Also left stale resource.id references that
     pointed to rolled-back rows.
   - Fix: wrap each source in db_session.begin_nested() savepoint.
     Per-source rollback only affects that source. Earlier successes
     stay persisted. IntegrityError retry restarts a new savepoint
     and recreates the ResearchResource cleanly.

* test: add Paper dedup integration tests + harden _json_safe

Round 5 additions:

1. tests/database/test_paper_dedup_integration.py — 3 integration
   tests using a real encrypted SQLCipher database:
   - Paper created with indexed columns + metadata blob
   - Same DOI deduped across two sources (1 Paper, 2 PaperAppearances)
   - Metadata blob survives JSON round-trip through SQLCipher

2. _json_safe hardening: depth limit (32) + id()-based cycle
   detection to prevent RecursionError on pathological input.

* fix: harden DB session handling and ArXiv journal_ref forwarding (Round 3)

- Wrap __save_journal_to_db in try/except to handle DB session failures
  gracefully (e.g., encrypted DB with wrong password). Score is still
  valid but won't be cached until next successful DB access.
- Explicitly forward journal_ref in ArXiv _get_full_content to prevent
  fragile reliance on item.copy() preserving the field.

* fix: preview_filters resource leak, DOAJ Seal scoring, close() warning (Round 4-5)

Three fixes from code review rounds 4-5:

1. CRITICAL: BaseSearchEngine.close() now also closes _preview_filters.
   Previously only _content_filters were closed, but the journal filter
   is registered as a preview_filter — its SearXNG engine and LLM client
   were never released.

2. DOAJ Seal scoring: use max(h_index_score, doaj_score) instead of
   strict h_index priority. 5,882 DOAJ Seal journals with moderate
   h-index were penalized because h-index score (e.g., 7) overrode the
   Seal floor (8). The DOAJ Seal represents OA best practices compliance,
   an orthogonal quality signal that should reinforce, not conflict.

3. Suppress spurious close() warning when SearXNG is None (normal case
   when SearXNG is not configured). Pass allow_none=True to safe_close.

* fix: S2 publicationVenue, NASA ADS ArXiv preprints, test gaps

1. S2: request publicationVenue (structured, with ISSN) from API
2. NASA ADS: set journal_ref=None for ArXiv preprints (is_arxiv=True)
3. Fix vacuous test_doaj_with_seal assertion (was always true)
4. Add fail-fast behavioral test (verify Tier 4 skipped after 2 failures)
5. Clarify pyproject.toml setuptools sections

* fix: Round 4 review findings

Critical:
- build_db now writes to tmp path and uses os.replace() for atomicity.
  Prevents corrupt DB on disk if build crashes mid-way.

Scoring correctness:
- Tier 3.6 (LLM cleanup → OpenAlex retry) now passes quartile to
  derive_quality_score. Previously Q1 journals found via this tier
  scored 8 instead of 10.

Consistency:
- PubMed journal_ref now uses None (not '') for missing journals,
  matching all other engines.
- NASA ADS, OpenAlex, Semantic Scholar _get_full_content now forward
  all quality-relevant metadata fields (doi, affiliations, citations)
  to final results for downstream consumers.

* fix: Round 5 review — scoring correctness and data source safety

Scoring (scoring.py):
- Apply DOAJ Seal floor in quartile branch via max() so Q4+Seal returns 8
  instead of 5. Previously the Seal signal was silently discarded when
  quartile was present.
- Treat negative h-index as no signal (return None for fall-through)
  instead of JOURNAL_QUALITY_DEFAULT=4. Consistent with h_index=0/None.

DB build (db.py):
- Recompute `quality` column after quartile assignment, so the stored
  quality agrees with the live-filter score.

Data source safety:
- OpenAlex: refuse to overwrite if fetched < 10K records.
- JabRef: refuse to overwrite if fetched < 100 abbreviations.

* fix: Round 6 review — concurrency, pool, and edge cases

DB engine pool:
- Use StaticPool for immutable=1 SQLite (was default QueuePool/15 conn).
- Acquire lock before reading _engine to remove DCLP hazard.

Downloader:
- Atomic O_CREAT|O_EXCL sentinel instead of exists()+touch() race.

Filter:
- Strip whitespace journal_ref; ' ' no longer bypasses the guard.
- Handle clean_name == '' as no-venue instead of degenerate key.
- Predatory removal log includes original journal_ref, cleaned name, URL.

* fix: Round 7 review — caching, error visibility, SSRF hardening

- Tier 3.6 now saves to DB so future queries skip LLM cleanup step
- __save_journal_to_db warning passes exc_info=True for debuggability
- OpenAlex manifest URLs validated against expected s3://openalex/ prefix

* fix(journal-quality): atomic rename, engine reset on error, LIKE escape

- build_db writes to a tmp path and os.replace()s at the end so a
  crash mid-build or a concurrent Windows reader (unlink-on-open
  fails on Windows) can no longer leave a corrupt file that blocks
  every subsequent query.
- _ensure_engine validates PRAGMA user_version and integrity before
  wiring the RO engine so stale-schema or corrupt files get rebuilt
  at open time instead of erroring at first query.
- session() drops the cached engine on OperationalError/DatabaseError
  so a transient corruption no longer wedges the process.
- get_journals_page / get_institutions_page escape LIKE metachars
  and cap search length to close an authenticated CPU-DoS surface.
- Startup sweep clears stale journal_quality.db.tmp-* files left by
  prior crashed builds.
- Corrects stale entry in custom-checks raw-SQL allowlist (this file
  was renamed since the allowlist was written).

* fix(db): enable PRAGMA foreign_keys = ON on every connection

SQLite defaults foreign_keys to OFF, which meant every ondelete=CASCADE
and ondelete=SET NULL declared on an FK was inert. Bulk Query.delete()
calls — which bypass ORM cascade — then silently orphaned child rows,
and Paper.journal_id would not NULL out when a Journal was deleted.

Wiring the pragma into apply_performance_pragmas (which is already
registered via event.listen(engine, "connect")) makes every pooled
connection honor DDL-level cascade.

* fix(migrations): 0007 index guard, remove redundant Paper indexes, add 0009

- 0007 now gates index creation on index existence (via inspector)
  instead of on whether the column was added this run. A DB where
  the columns already existed from a prior partial upgrade or from
  ORM create_all will now get the named indexes.
- 0007 docstring header had stale revision IDs from a copy-paste.
- Drop the redundant explicit Index() entries and index=True on
  Paper's doi/arxiv_id/pmid and PaperAppearance.resource_id — these
  columns already carry UNIQUE, which creates a backing index.
- New migration 0009 backfills journal indexes that the old 0007
  guard skipped, adds ix_research_resources_research_id (previously
  unindexed FK forced a full scan on every research-detail join),
  and adds the journals.name_lower column + index that
  _resolve_journal_id needs to avoid func.lower() expression scans.

* perf(journals): name_lower column, indexed research_id, load_only on dedup

- Journal gains a name_lower column, populated on every write by the
  reputation filter and used by _resolve_journal_id for an indexed
  equality lookup instead of func.lower(Journal.name), which defeats
  the name index.
- research_resources.research_id declared with index=True so every
  research-detail join uses the index instead of a full scan. The
  matching migration that creates it on existing DBs is 0009.
- _find_existing_paper applies load_only(id, doi, arxiv_id, pmid,
  journal_id) to the three dedup lookups so they no longer fetch the
  paper_metadata JSON blob (which can be multi-KB) just to check an
  identifier match.

* fix(tests): bump head revision asserts + relax llm_utils header check

- test_migration_0005_resource_document_id.py asserted the full-chain
  head is still "0005", which broke as soon as 0006/0007/0008 landed
  (now 0009). Bump the three full-chain asserts to "0009" and keep the
  targeted upgrade-to-0005 asserts at "0005" since those call
  _run_upgrade_to(..., "0005") explicitly. Also rename the two
  head-revision tests to match.
- test_uses_auth_headers mocked requests.get and asserted an exact
  header dict, but safe_get wraps requests and injects a project
  User-Agent. Check that the Authorization header survives instead of
  doing a full dict equality.
- Relax _validate_existing_db: PRAGMA user_version = 0 is the
  pre-stamping default, so treat it as grandfathered-in rather than
  triggering a rebuild. Only non-zero, non-current values force a
  rebuild. This keeps CI environments with pre-built DBs working.

* ci: retrigger after Round 7 fixes

* fix: Round 8 review — data source safety, DB validation, error visibility

db.py:
- Remove duplicate safe_close() in _validate_existing_db schema-mismatch
  branch. The finally block already handles closing; the extra call
  produced a spurious "Cannot operate on a closed database" warning
  on every schema-triggered rebuild.
- Move reset_db() to before os.replace() so no new engine can latch
  onto the file mid-swap and then get disposed out from under an
  in-flight query.

doaj.py:
- Add _MIN_DOAJ_JOURNALS=5,000 floor. Prevents overwriting good data
  with {} if DOAJ CSV schema changes upstream (column rename breaks
  ISSN lookups, parser silently produces zero entries).

institutions.py:
- Add _ALLOWED_PREFIX="s3://openalex/" manifest validation loop
  matching openalex.py — defense-in-depth SSRF block.
- Add _MIN_INSTITUTIONS=50,000 floor (snapshot has ~120K).

jabref.py:
- logger.warning → logger.exception for per-file fetch failures so
  tracebacks are preserved. Operators diagnosing partial fetches
  need the exception type, not just the filename.

StaticPool kept as-is — the tradeoff (immutable=1 + single conn vs
QueuePool overhead) was settled in prior rounds; reviewer's concern
was theoretical and hasn't materialized.

* fix: CI failures — raw SQL allowlist + filter test data-download stub

Two concrete CI fixes after investigating the PR 3081 pytest failures:

1. test_no_raw_sql was flagging journal_quality/db.py line 207 for
   `conn.execute("PRAGMA user_version")`. This is a legitimate read-
   only schema-version check (cheap, no SQLAlchemy overhead, matches
   the pattern already skipped for database/initialize.py). Added
   journal_quality/db.py to the skip list.

2. Many filter unit tests were timing out at 60s in CI because they
   hit the real data-download path on a fresh container. Trace:
   filter_results → __clean_journal_name → expand_abbreviation →
   _ensure_engine → _build_or_raise → ensure_journal_data →
   download_journal_data (OpenAlex + DOAJ + JabRef fetch).

   Added tests/advanced_search_system/filters/conftest.py with an
   autouse fixture that stubs _build_or_raise to raise FileNotFound.
   expand_abbreviation already catches that and returns None, so
   the filter falls through to its own scoring tiers without
   touching the network.

Tests run in 5.5s locally (was passing because my local DB is built).

* fix(tests): use ResearchHistory UUID for ResearchLog FK

test_research_logs was inserting Integer research.id into ResearchLog.research_id,
which is String(36) FK at research_history.id (UUID). Previously latent because
SQLite FK enforcement was off; commit 5078c867e turned PRAGMA foreign_keys = ON
on every connection, exposing the pre-existing mismatch. Production log_utils
already writes UUIDs, so the FK is correct — the test was wrong.

* fix(migrations): timezone-aware DateTime in 0006 + extend hook to scan migrations

Migration 0006_add_citation_metadata declared three sa.DateTime() columns
without timezone=True, contradicting the ORM (citation.py uses UtcDateTime).
Add timezone=True to the three columns (papers.created_at, papers.updated_at,
paper_appearances.created_at).

The check-datetime-timezone pre-commit hook missed this because its path
filter only scanned src/.../database/models/. Extend the path filter to
include database/migrations/versions/, and teach the AST walker to also
recognise sa.Column()/sa.DateTime() (attribute-style) — not just the
bare Column()/DateTime() form used in ORM models — and accept
sa.DateTime(timezone=True) as valid for migration files.

* fix(citations): support old-format arXiv IDs in URL extraction

The regex r"arxiv\.org/abs/(\d+\.\d+)" only matched new-format IDs
(YYMM.NNNN). Pre-2007 papers with identifiers like cond-mat/0501001,
math.AG/0601001, and hep-th/9802150 silently returned None.

New regex accepts:
  - Old-style archive(.SubjectClass)?/YYMMNNN (with optional uppercase
    subject class like math.AG); archive can contain hyphens like
    cond-mat / hep-th
  - New-style YYMM.NNNN or YYMM.NNNNN (5-digit seq from 2015)
  - Optional vN version suffix (2501.12345v2)

Also adds 5 new tests in TestExtractArxivId covering all three
old-format variants plus version suffix and 5-digit sequence.

* fix(journal_quality): surface build_db failure to downloader caller

Previously download_journal_data swallowed any build_db() exception with a
log-and-continue, then returned (True, "Fetched ...") as if everything
worked. The dashboard saw a green success toast even when no DB was built.

Capture the exception and return (False, msg) carrying the reason, while
preserving the "lazy-build on next access" design — the runtime accessor
still rebuilds from the downloaded .gz files on next access if the DB is
absent. The existing callers (ensure_journal_data, metrics_routes.py)
already pivot correctly on the bool, so this only flips a misleading
green to an honest red.

Tests:
- test_successful_fetch now patches build_db to a no-op so the happy-path
  assertion is deterministic regardless of whether the minimal fixture
  is buildable end-to-end.
- Adds test_build_db_failure_returns_false covering the new (False, msg)
  contract.

* docs(journal-quality): clarify score scale is non-contiguous

The docs and settings description previously advertised a "1-10 scale"
and referenced score 3 ("Unknown") in the threshold table, but the
code only emits {1, 4, 5, 6, 7, 8, 10}. Values 2, 3, and 9 are never
assigned (the default/unknown case emits 4, not 3).

- Fix the opening scale claim to note the non-contiguous emission.
- Replace the "Score 3 = Unknown" row with "Score 4 = Default" so the
  table matches constants.py (JOURNAL_QUALITY_DEFAULT=4).
- Correct the threshold table: thresholds 3 and 4 now behave the same
  as 2 (since 2 and 3 aren't emitted scores), and raising to 5 is
  what starts dropping default/unknown venues.
- Update default_settings.json description and regenerate golden
  master to match.

* fix(journal-quality): remove score-3 references (score 3 is never emitted)

Scoring pipeline emits {1, 4, 5, 6, 7, 8, 10}; value 3 is reserved but
never returned. Completes the cleanup begun in 0fe435bfc, which fixed
the table and settings description but left three residuals:

- search_utilities.py::_format_quality_tag — the `>= 3` branch was
  unreachable for score 3 but caught score 4 (JOURNAL_QUALITY_DEFAULT),
  silently rendering unknown/default venues as [Q3 ★★]. Give score 4
  a dedicated [Unranked ★] label so Q-tier labels stay truthful to
  SCImago quartile semantics.
- docs/journal-quality.md step 7 "Score 3 floor" — the code actually
  returns None on no-signal. Rewrite as "No-signal pass-through".
- journal_quality.html threshold descriptions — thresholds 3 and 4
  both behave identically to threshold 2 (no emitted score falls in
  the 2–4 gap); score 4 only starts being dropped at threshold 5.
  Corrected both the HTML list and the JS threshold-detail map.

Tests updated: test_default_unknown_tier asserts [Unranked ★] for
score 4; test_score_boundary_5_is_q2_not_unranked pins the boundary.

* fix(journal-quality): simplify Tier 0 cache to LLM-only and fix 9 correctness bugs (#3510)

* feat(journal-quality): fix cache bugs and simplify to LLM-only

Stacked on PR #3081. Review of #3081 surfaced 10 issues in the journal
quality system. The dominant bug: the Tier 0 cache read predicate
filters on `score_source == "llm"`, so Tier 2 (OpenAlex) and Tier 3
(DOAJ) scores were written to the user DB but never read back. This PR
scopes the cache to LLM-only (per user direction: "we don't even need
to cache [Tier 2/3]") and fixes the remaining 4 functional bugs.

Bugs fixed:
* Tier 0 cache broken for Tier 2/3 → drop Tier 2/3/3.6 write-back;
  keep Tier 4 LLM cache; migration 0010 drops 16 cache-only columns.
* Paper dedup waterfall → single OR query; logs warning on conflict.
* ISSN dashes not normalized → new normalize_issn() in citation_normalizer,
  applied at both reference-DB lookup and ingestion (openalex, doaj).
* Migration 0009 SQL backfill wrong for diacritics → Python name.lower()
  batch loop matches runtime insert path exactly.
* LLM out-of-set scores silently accepted → raise ValueError; existing
  failure counter + circuit breaker surface prompt drift.
* quality_model not in cache predicate → add get_model_identifier helper
  and filter on it so cache invalidates across LLM upgrades.
* Journal upsert race → savepoint + IntegrityError + refetch pattern
  mirroring the Paper upsert.
* Cache-read validates cached quality ∈ VALID_QUALITY_SCORES; evicts
  pre-fix 2/3/9 values.
* OpenAlex JSON parse now try/except + malformed-line counter; existing
  MIN_OPENALEX_SOURCES floor still aborts catastrophic failures.
* Per-user metrics dashboard rewritten to join user Journal with the
  reference DB by name for display bibliometric fields.

Schema: migration 0010 drops 16 bibliometric columns from journals
(h_index, sjr_quartile, is_predatory, …); keeps name, name_lower,
quality, score_source, quality_model, quality_analysis_time.

Tests: 298 tests green across filters, citation_normalizer, llm_utils,
paper dedup. Existing cached-quality test updated for new predicate
chain; LLM clamp test now asserts ValueError instead of silent clamp.

* fix(journal-quality): bundle migration 0010 drops into single batch + docs

Bundle all 19 ops (3 index drops + 16 column drops on upgrade, 16 column
adds + 3 index creates on downgrade) into a single `batch_alter_table`
block each. SQLite has no in-place ALTER DROP COLUMN, so alembic's batch
mode recreates the whole table per block — the previous per-op loop paid
that cost 19 times. Bundling also makes each direction atomic: an error
mid-batch rolls back cleanly, eliminating partial-schema states the
per-op version could leave behind.

Also update docs/journal-quality.md to reflect the LLM-only cache scope:
the old docs claimed "Tier 0 — Database Cache: Instant lookup from
previous scoring. Journals are scored once and cached." which describes
the pre-fix behavior. The new description positions Tier 0 between
3.5 and 3.6 (where it actually fires) and explains that only Tier 4
results are persisted — reference-DB lookups for Tiers 1–3.5 are already
instant and get re-checked every query.

No behavior change beyond the migration perf win.

* fix(journal-quality): address 100-agent review feedback

P1 — predatory_blocked global count:
The Tier 0 cache rewrite in /api/journals/user-research turned
`predatory_blocked` from a global count across all user journals into
an in-page count (top 200). AI code reviewer and R10-4 both flagged
this as a semantic regression — summary stats are expected to be
global, matching `total_journals` which is still global. Fix: add
`JournalQualityDB.count_predatory_by_names(names)` helper that issues
one `WHERE name_lower IN (…) AND is_predatory = TRUE` query, call it
with ALL user journal names from `/api/journals/user-research`. The
per-research endpoint is already correctly scoped to the research
(no 200-limit) and is left unchanged.

P2 — Journal schema stability test:
R1-3 and R9-10 both flagged that tests/database/test_schema_stability.py
verifies table names but not column-level shape. Migration 0010
deliberately trims Journal to 7 columns; an accidental model addition
without a matching migration would slip through silently. Added
TestCriticalColumns.test_journal_has_exact_column_set asserting the
exact column set {id, name, name_lower, quality, score_source,
quality_model, quality_analysis_time}.

P3 — polish:
- Add `# noqa: silent-exception` + explanatory comments to
  `_ref_db_lookup` and `_get_ref_db_or_none` (project convention for
  best-effort broad catches).
- Update `logs.py` module docstring to explain Journal's LLM-only
  cache scope after migration 0010.
- Clarify `quality_analysis_time` column comment is "Unix seconds
  (not ms)" and rationale for Integer (vs UtcDateTime) typing.
- Add `__all__` declarations to `utilities/citation_normalizer.py`
  and `utilities/llm_utils.py` codifying the public API surface.

No behavior change beyond P1. 305 tests still green across filters,
citation_normalizer, llm_utils, paper dedup, schema stability; 54
metrics route tests still green.

* fix(journal-quality): prod-ready polish for PR #3081 — migration squash + ops hardening (#3513)

* feat(journal-quality): clearer log milestones around first-run DB build

The "Building X ..." message is too terse — on a fresh install the
~30s download + insert looks like a hung process. Expand the start
message to mention the one-time nature + the download size, and
include the source count in the completion log so the server log
tells operators when the DB is ready to serve scoring.

Addresses the UX gap previously considered a blocker: users already
see the server log, so a milestone log line is enough (no UI progress
event needed).

* fix(journal-quality): set Windows readonly attribute after chmod

chmod 0o444 is a no-op on Windows — the compiled journal-quality
reference DB stays writable on Windows installs, violating the
read-only invariant. Combine the POSIX chmod with a best-effort
SetFileAttributesW(FILE_ATTRIBUTE_READONLY) on win32. Log a warning
if SetFileAttributesW fails; the check-journal-quality-readonly.py
pre-commit hook still enforces read-only opens in consumer code.

* feat(journal-quality): pre-check free disk space before bulk download

The five journal-quality data sources uncompress to ~1 GB of
intermediate working set plus the compiled reference DB. On a
small-disk machine, a mid-stream failure can leave an orphan
.tmp-* file that blocks the next build. Fail fast with a clear
"X.X GB available, 2 GB required" message before touching the
sentinel or the network.

Threshold is exposed as JOURNAL_QUALITY_MIN_FREE_DISK_BYTES in
constants.py so ops can tune it if needed. OSError from
shutil.disk_usage is non-fatal (logged, build proceeds) — don't
block a download just because disk stats are unavailable.

* security(journal-quality): stop leaking exception text into HTTP path

CodeQL alerts 7650 and 7684 flagged that str(exc) from a build_db
failure in download_journal_data() flows into the tuple's message
string, and from there through to the /api/journal-data/download
response. SQLAlchemy errors embed SQL statements and file paths —
sanitize at the source by returning only the exception class name.
Full traceback remains in logger.exception (server-side only).

Add tests/journal_quality/test_downloader_exception_sanitization.py
asserting that a simulated build_db error whose message contains
stack-trace-shaped substrings never reaches the caller.

* feat(safe-requests): add safe_get_with_retries and wire into journal-quality downloads

Bulk journal-data downloads currently abort on the first transient
network failure: a packet drop or short AWS S3 hiccup forces the
user to restart from scratch. Add a safe_get_with_retries wrapper
with exponential backoff (1/2/4s, 3 attempts by default), retrying
on ConnectionError, Timeout, HTTP 429, and HTTP 5xx. Honors the
Retry-After header when present. SSRF ValueErrors and non-429 4xx
responses are passed through unchanged.

The five journal-quality data sources (OpenAlex, DOAJ, predatory,
JabRef, institutions) now import the retry wrapper instead of the
bare safe_get. Call sites are unchanged beyond the import alias.

* feat(journal-quality): detect OpenAlex field-level schema drift

OpenAlex occasionally renames snapshot fields (the Works schema has
seen h-index and ref-count migrations in the last year). The existing
row-count floor catches a collapsed fetch but cannot tell the
difference between "212K journals with h_index correctly populated"
and "212K journals all silently None because the field was renamed".

Sample the first 100 parsed rows after the parse loop and refuse to
overwrite the snapshot if every one of them has h_index == None or
every one has cited_by_count == None. Raise a new SchemaDriftError
so operators can grep for it in logs and the CI release-gate job
can fail fast on upstream breakage.

* fix(migrations): squash the journal-model churn in 0007 + keep 0008/0010 as stubs

The pre-squash chain had 0007 add 17 bibliometric / trust-signal
columns + 3 indexes to the per-user journals table, 0008 add a
sjr_quartile column + index, and 0010 drop all of 0007/0008's
additions except three. On SQLite every batch_alter_table is a
full-table rebuild, so every live user pays for TWO back-to-back
rebuilds on the journals table within a single release for no
net schema gain.

New shape: 0007 adds only the columns the final form keeps —
name_lower, score_source, quality_model — plus their indexes and
the name_lower Python-side backfill (moved from 0009, because a
Unicode-correct backfill belongs with the column that needs it).
Downgrade drops the three it added.

0008 and 0010 remain as no-op stubs. A user whose alembic_version
row reads "0008" or "0010" from a prior upgrade still needs a
revision to walk through; deleting the files would strand them.
Stubs are cheap, one return statement each, and keep the chain
contiguous without forcing anyone to rewrite history.

0009 is simplified to its one remaining unique responsibility
(ix_research_resources_research_id); the journals.name_lower
work it used to duplicate now lives in the squashed 0007.

Verified end-to-end against 206 existing migration + schema tests
(including the full chain's up/down/up stairway per revision) and
four new squash-specific regressions in
tests/database/test_journal_migration_squash.py:
  - chain reaches head 0010 with the 7-column final shape
  - name_lower backfill handles diacritics (Café → café)
  - re-running run_migrations is idempotent
  - squashed 0007 is a no-op on a DB already stamped at 0010

* fix(safe-requests): cap Retry-After + parse HTTP-date form

A hostile or misconfigured upstream returning a large `Retry-After`
integer can pin a Flask worker via `time.sleep()` — the call chain
from `/api/journal-data/download` to `safe_get_with_retries` is
fully synchronous. Cap at 300 s and extend the parser to the
RFC 7231 HTTP-date form (previously the `ValueError` from `int()`
was silently swallowed). Negative values clamp to 0 to avoid
`time.sleep(-5)`, which CPython rejects.

Also drops dead `last_response` bookkeeping from the retry loop —
the path that referenced it was removed two commits back.

tests/security: add four retry tests — cap enforced, HTTP-date
parsed, unparseable falls back to schedule, negative clamps.

tests/database: replace the squash-scenario test with one that
actually creates the pre-squash 17-column journals shape via
`ALTER TABLE`, stamps at `0006` so 0007 runs (including the
`name_lower` backfill), walks to head, and verifies both column
preservation and the diacritic backfill. The prior test only
proved Alembic's built-in "don't re-run at head" guarantee; its
docstring is tightened to match.

* chore(pr-feedback): document orphan-column intent + log skipped drift check

Follow-up to the Friendly AI Reviewer pass on #3513. Two substantive
nits addressed, three stylistic ones deferred (see /plans in review
thread for the full breakdown).

tests/database: the pre-squash walk test asserts `"issn" in cols`
as a success condition. Without context, that reads as "orphan
columns are fine" rather than "orphan columns are the intended
trade-off of the stub-based squash". Expand the docstring and the
inline comment so future maintainers don't misread the intent.

journal_quality: the schema-drift check is a no-op when the parsed
sample has < _SCHEMA_SAMPLE_SIZE entries (a branch that only fires
on truncated test snapshots or aggressive parse filters — the
10k-row floor above catches a collapsed fetch). Previously silent;
now logs at debug so operators can see it was bypassed.

* chore(pr-feedback): surface orphan-column trade-off in migration docstring

Second AI-reviewer pass asked for the orphan-column note to live in
the migration docstring (where maintainers look first during a
schema-change investigation), not just the regression test. Copy
the trade-off rationale into 0007's header.

Also promote the "schema-drift check skipped" log from debug to
info — debug-level messages are typically filtered out in production
log configs, which defeats the observability goal of the branch.
The skip is rare (OpenAlex ships ~280K sources; the `<100` sample
only arises from truncated test snapshots or aggressive parse
filters), so info-level noise is negligible.

* refactor(journal-quality): cleanup + preventative security (stacked on #3513) (#3514)

* feat(journal-quality): clearer log milestones around first-run DB build

The "Building X ..." message is too terse — on a fresh install the
~30s download + insert looks like a hung process. Expand the start
message to mention the one-time nature + the download size, and
include the source count in the completion log so the server log
tells operators when the DB is ready to serve scoring.

Addresses the UX gap previously considered a blocker: users already
see the server log, so a milestone log line is enough (no UI progress
event needed).

* fix(journal-quality): set Windows readonly attribute after chmod

chmod 0o444 is a no-op on Windows — the compiled journal-quality
reference DB stays writable on Windows installs, violating the
read-only invariant. Combine the POSIX chmod with a best-effort
SetFileAttributesW(FILE_ATTRIBUTE_READONLY) on win32. Log a warning
if SetFileAttributesW fails; the check-journal-quality-readonly.py
pre-commit hook still enforces read-only opens in consumer code.

* feat(journal-quality): pre-check free disk space before bulk download

The five journal-quality data sources uncompress to ~1 GB of
intermediate working set plus the compiled reference DB. On a
small-disk machine, a mid-stream failure can leave an orphan
.tmp-* file that blocks the next build. Fail fast with a clear
"X.X GB available, 2 GB required" message before touching the
sentinel or the network.

Threshold is exposed as JOURNAL_QUALITY_MIN_FREE_DISK_BYTES in
constants.py so ops can tune it if needed. OSError from
shutil.disk_usage is non-fatal (logged, build proceeds) — don't
block a download just because disk stats are unavailable.

* security(journal-quality): stop leaking exception text into HTTP path

CodeQL alerts 7650 and 7684 flagged that str(exc) from a build_db
failure in download_journal_data() flows into the tuple's message
string, and from there through to the /api/journal-data/download
response. SQLAlchemy errors embed SQL statements and file paths —
sanitize at the source by returning only the exception class name.
Full traceback remains in logger.exception (server-side only).

Add tests/journal_quality/test_downloader_exception_sanitization.py
asserting that a simulated build_db error whose message contains
stack-trace-shaped substrings never reaches the caller.

* feat(safe-requests): add safe_get_with_retries and wire into journal-quality downloads

Bulk journal-data downloads currently abort on the first transient
network failure: a packet drop or short AWS S3 hiccup forces the
user to restart from scratch. Add a safe_get_with_retries wrapper
with exponential backoff (1/2/4s, 3 attempts by default), retrying
on ConnectionError, Timeout, HTTP 429, and HTTP 5xx. Honors the
Retry-After header when present. SSRF ValueErrors and non-429 4xx
responses are passed through unchanged.

The five journal-quality data sources (OpenAlex, DOAJ, predatory,
JabRef, institutions) now import the retry wrapper instead of the
bare safe_get. Call sites are unchanged beyond the import alias.

* feat(journal-quality): detect OpenAlex field-level schema drift

OpenAlex occasionally renames snapshot fields (the Works schema has
seen h-index and ref-count migrations in the last year). The existing
row-count floor catches a collapsed fetch but cannot tell the
difference between "212K journals with h_index correctly populated"
and "212K journals all silently None because the field was renamed".

Sample the first 100 parsed rows after the parse loop and refuse to
overwrite the snapshot if every one of them has h_index == None or
every one has cited_by_count == None. Raise a new SchemaDriftError
so operators can grep for it in logs and the CI release-gate job
can fail fast on upstream breakage.

* fix(migrations): squash the journal-model churn in 0007 + keep 0008/0010 as stubs

The pre-squash chain had 0007 add 17 bibliometric / trust-signal
columns + 3 indexes to the per-user journals table, 0008 add a
sjr_quartile column + index, and 0010 drop all of 0007/0008's
additions except three. On SQLite every batch_alter_table is a
full-table rebuild, so every live user pays for TWO back-to-back
rebuilds on the journals table within a single release for no
net schema gain.

New shape: 0007 adds only the columns the final form keeps —
name_lower, score_source, quality_model — plus their indexes and
the name_lower Python-side backfill (moved from 0009, because a
Unicode-correct backfill belongs with the column that needs it).
Downgrade drops the three it added.

0008 and 0010 remain as no-op stubs. A user whose alembic_version
row reads "0008" or "0010" from a prior upgrade still needs a
revision to walk through; deleting the files would strand them.
Stubs are cheap, one return statement each, and keep the chain
contiguous without forcing anyone to rewrite history.

0009 is simplified to its one remaining unique responsibility
(ix_research_resources_research_id); the journals.name_lower
work it used to duplicate now lives in the squashed 0007.

Verified end-to-end against 206 existing migration + schema tests
(including the full chain's up/down/up stairway per revision) and
four new squash-specific regressions in
tests/database/test_journal_migration_squash.py:
  - chain reaches head 0010 with the 7-column final shape
  - name_lower backfill handles diacritics (Café → café)
  - re-running run_migrations is idempotent
  - squashed 0007 is a no-op on a DB already stamped at 0010

* refactor(journal-quality): lookup_institution returns full-name keys

The on-disk JSON snapshot uses one-character keys (n, c, t, h, if,
w, cb, r) to save bytes across ~200K institutions. That's fine
on-disk but a bad Python API — callers have to memorize the
mapping, and a future schema change breaks every caller silently.

_institution_to_dict now returns full names (name, country, type,
h_index, impact_factor, works_count, cited_by_count, ror_id). The
snapshot-reading code in _populate_institutions keeps the compact
keys — only the public accessor changes.

Grep confirms zero live callers today (only a comment mention in
search_engine_openalex.py), so no migration needed.

* refactor(journal-quality): extract _openalex_common for shared S3 helpers

openalex.py and institutions.py duplicated three symbols:
_OPENALEX_S3_BASE, the `s3://openalex/` manifest prefix check, and
the s3_to_https translator. djpetti flagged this in PR #3081 review.

Move them to data_sources/_openalex_common.py (stdlib-only, no
circular imports) and import from both data-source modules. The
on-disk compact key format and manifest fetch URLs stay where they
are; only the duplicated helpers move.

* test(safe-requests): cover redirect-hop SSRF validation + DNS rebinding

safe_requests.py has always validated every redirect hop against the
SSRF allowlist (lines 208–250), but the existing test suite only
exercised the initial request. These five new tests drive the
redirect loop itself:

- redirect target is a private IP → blocked
- redirect target is AWS metadata (169.254.169.254) → blocked
- redirect loop exceeds 10 hops → raises ValueError("Too many")
- DNS-rebinding case (first hop validates, redirect validates false
  for the same hostname) → blocked on the second hop
- a legitimate redirect from one public URL to another is followed

* feat(search-utilities): HTML-safe variant of the journal quality tag

_format_quality_tag emits plaintext like "[Q1 ★★★★★]" which is fine
when the caller renders the containing string as Markdown or plain
text. Today every caller does that, so there's no live XSS. But the
tag is typically embedded alongside a search-result title that came
from an external search engine, and the first HTML-rendered consumer
that does {{ title + quality_tag | safe }} or equivalent would leak
any tags in the title.

Add _format_quality_tag_html(quality, *, title) that html.escape's the
title (angle brackets, ampersands, quotes) and appends the plaintext
tag. Existing callers are unchanged — this is the safe variant any
future HTML-rendered caller should reach for.

The existing helper gets a docstring warning so reviewers of future
PRs know which variant is appropriate.

* test(db): migrations 0006-0010 on a SQLCipher-encrypted DB

The existing test_encrypted_database_orm.py exercises ORM CRUD over
an encrypted DB but never explicitly walks the new journal-quality
chain. This test creates a fresh keyed DB via DatabaseManager (which
runs the full migration chain as part of create_user_database),
inserts a Journal row with every kept column, closes the engine,
reopens with the same key, and reads the row back.

The second test asserts the final journals column set (id, name,
name_lower, quality, score_source, quality_model, quality_analysis
_time) is exactly what test_schema_stability expects.

Guards against SQLCipher key-ordering regressions where a future
change to sqlcipher_utils would let batch_alter_table's rebuild
path see a non-keyed connection.

* test(db): data preservation across journals-table rebuild

Adding name_lower + its index in the squashed 0007 triggers a
SQLite batch_alter_table rebuild under the hood (ALTER ADD COLUMN
is implemented as a full copy). The rebuild runs inside a single
Alembic transaction, so SQLite guarantees atomicity — either the
new table is fully populated or the original stays untouched.

The test validates what successful output must look like:

- 100 rows with a mix of ASCII, diacritics, CJK, and whitespace-
  wrapped names all survive the chain
- name / quality_analysis_time values are preserved verbatim
- name_lower is backfilled via Python's str.lower() (Unicode-
  correct, unlike SQLite's ASCII LOWER())
- no _alembic_tmp_journals orphan table is left behind

Complements test_journal_migration_squash.py (which covers the
simpler idempotency + head-stamp cases).

* refactor(jabref): log abbreviation collisions at debug level

The jabref downloader loads 14 CSV files in order and silently
overwrites on duplicate keys. For abbreviations like "J Org Chem"
that appear in multiple vocabularies (general + ACS) the last
file loaded wins, with no audit trail.

Emit a debug-level log line on each overwriting collision,
mentioning the source filename, abbreviation, and the two
competing full names. Debug level (not info/warning) because the
collisions are expected — the current "last writer wins" behavior
is kept, this is purely observability for operators who care to
tail the log.

* docs(doaj): flag ternary-to-binary seal-field collapse

The DOAJ public CSV distinguishes three seal states: "yes", "no",
and blank (application never submitted). scoring.py only needs
the boolean floor today, so the importer collapses blank and "no"
into has_seal=False. A future tier that rewards "applied and was
denied" differently from "never applied" would need to preserve
the raw ternary — add a comment so that future change isn't
stalled rediscovering this.

No functional change; code path unchanged.

* chore(review-feedback): four follow-ups from the #3514 fixup review

Addresses the must-fix + two should-fix items surfaced by a 3×10
subagent review pass. Three other flagged items (HTML-safe scaffold,
_make_engine tempdir, fake_validate flag threading) are deferred
with rationale noted in the planning file.

db.py: the `lookup_institution` docstring advertised compact-format
keys (n, c, t, h, …) left over from the pre-refactor dict layer.
The accessor actually returns full-name keys via `_institution_to_dict`
— update the docstring so the caller contract matches reality.

test_safe_requests_redirects: the `test_dns_rebinding_case_blocked_on_second_hop`
test does not model DNS rebinding; it mocks `validate_url` to return
[True, False] for two distinct URLs. That's a per-hop re-evaluation
test, not a rebinding one (which would require same hostname with
different getaddrinfo results across calls). Rename to
`test_second_hop_blocked_when_validator_rejects_redirect_target` and
rewrite its docstring + the module docstring so the label stops
overstating the coverage. Real rebinding coverage belongs alongside
the validator unit tests and is flagged there as a follow-up.

test_journal_migrations_encrypted: the test module had no sqlcipher3
guard — on a platform where sqlcipher3 is missing and
`LDR_BOOTSTRAP_ALLOW_UNENCRYPTED=true` is set, `DatabaseManager` falls
back to plain SQLite and the test silently passes. Add
`pytest.importorskip("sqlcipher3", ...)` at module top to skip
cleanly when the package is missing, and `assert
db_manager.has_encryption` at the top of each test function to fail
loudly when sqlcipher3 imports but the manager has turned encryption
off for any reason.

test_journal_rebuild_data_preservation: docstring claimed "every
column value intact" but only `name` and `quality_analysis_time` are
seeded and checked. Tighten the claim to what the test actually
covers without reducing the real value the test adds (diacritic +
CJK + padded-whitespace backfill coverage).

* docs(journal-quality): predatory policy, release notes, and durability comment (#3516)

* feat(journal-quality): clearer log milestones around first-run DB build

The "Building X ..." message is too terse — on a fresh install the
~30s download + insert looks like a hung process. Expand the start
message to mention the one-time nature + the download size, and
include the source count in the completion log so the server log
tells operators when the DB is ready to serve scoring.

Addresses the UX gap previously considered a blocker: users already
see the server log, so a milestone log line is enough (no UI progress
event needed).

* fix(journal-quality): set Windows readonly attribute after chmod

chmod 0o444 is a no-op on Windows — the compiled journal-quality
reference DB stays writable on Windows installs, violating the
read-only invariant. Combine the POSIX chmod with a best-effort
SetFileAttributesW(FILE_ATTRIBUTE_READONLY) on win32. Log a warning
if SetFileAttributesW fails; the check-journal-quality-readonly.py
pre-commit hook still enforces read-only opens in consumer code.

* feat(journal-quality): pre-check free disk space before bulk download

The five journal-quality data sources uncompress to ~1 GB of
intermediate working set plus the compiled reference DB. On a
small-disk machine, a mid-stream failure can leave an orphan
.tmp-* file that blocks the next build. Fail fast with a clear
"X.X GB available, 2 GB required" message before touching the
sentinel or the network.

Threshold is exposed as JOURNAL_QUALITY_MIN_FREE_DISK_BYTES in
constants.py so ops can tune it if needed. OSError from
shutil.disk_usage is non-fatal (logged, build proceeds) — don't
block a download just because disk stats are unavailable.

* security(journal-quality): stop leaking exception text into HTTP path

CodeQL alerts 7650 and 7684 flagged that str(exc) from a build_db
failure in download_journal_data() flows into the tuple's message
string, and from there through to the /api/journal-data/download
response. SQLAlchemy errors embed SQL statements and file paths —
sanitize at the source by returning only the exception class name.
Full traceback remains in logger.exception (server-side only).

Add tests/journal_quality/test_downloader_exception_sanitization.py
asserting that a simulated build_db error whose message contains
stack-trace-shaped substrings never reaches the caller.

* feat(safe-requests): add safe_get_with_retries and wire into journal-quality downloads

Bulk journal-data downloads currently abort on the first transient
network failure: a packet drop or short AWS S3 hiccup forces the
user to restart from scratch. Add a safe_get_with_retries wrapper
with exponential backoff (1/2/4s, 3 attempts by default), retrying
on ConnectionError, Timeout, HTTP 429, and HTTP 5xx. Honors the
Retry-After header when present. SSRF ValueErrors and non-429 4xx
responses are passed through unchanged.

The five journal-quality data sources (OpenAlex, DOAJ, predatory,
JabRef, institutions) now import the retry wrapper instead of the
bare safe_get. Call sites are unchanged beyond the import alias.

* feat(journal-quality): detect OpenAlex field-level schema drift

OpenAlex occasionally renames snapshot fields (the Works schema has
seen h-index and ref-count migrations in the last year). The existing
row-count floor catches a collapsed fetch but cannot tell the
difference between "212K journals with h_index correctly populated"
and "212K journals all silently None because the field was renamed".

Sample the first 100 parsed rows after the parse loop and refuse to
overwrite the snapshot if every one of them has h_index == None or
every one has cited_by_count == None. Raise a new SchemaDriftError
so operators can grep for it in logs and the CI release-gate job
can fail fast on upstream breakage.

* fix(migrations): squash the journal-model churn in 0007 + keep 0008/0010 as stubs

The pre-squash chain had 0007 add 17 bibliometric / trust-signal
columns + 3 indexes to the per-user journals table, 0008 add a
sjr_quartile column + index, and 0010 drop all of 0007/0008's
additions except three. On SQLite every batch_alter_table is a
full-table rebuild, so every live user pays for TWO back-to-back
rebuilds on the journals table within a single release for no
net schema gain.

New shape: 0007 adds only the columns the final form keeps —
name_lower, score_source, quality_model — plus their indexes and
the name_lower Python-side backfill (moved from 0009, because a
Unicode-correct backfill belongs with the column that needs it).
Downgrade drops the three it added.

0008 and 0010 remain as no-op stubs. A user whose alembic_version
row reads "0008" or "0010" from a prior upgrade still needs a
revision to walk through; deleting the files would strand them.
Stubs are cheap, one return statement each, and keep the chain
contiguous without forcing anyone to rewrite history.

0009 is simplified to its one remaining unique responsibility
(ix_research_resources_research_id); the journals.name_lower
work it used to duplicate now lives in the squashed 0007.

Verified end-to-end against 206 existing migration + schema tests
(including the full chain's up/down/up stairway per revision) and
four new squash-specific regressions in
tests/database/test_journal_migration_squash.py:
  - chain reaches head 0010 with the 7-column final shape
  - name_lower backfill handles diacritics (Café → café)
  - re-running run_migrations is idempotent
  - squashed 0007 is a no-op on a DB already stamped at 0010

* refactor(journal-quality): lookup_institution returns full-name keys

The on-disk JSON snapshot uses one-character keys (n, c, t, h, if,
w, cb, r) to save bytes across ~200K institutions. That's fine
on-disk but a bad Python API — callers have to memorize the
mapping, and a future schema change breaks every caller silently.

_institution_to_dict now returns full names (name, country, type,
h_index, impact_factor, works_count, cited_by_count, ror_id). The
snapshot-reading code in _populate_institutions keeps the compact
keys — only the public accessor changes.

Grep confirms zero live callers today (only a comment mention in
search_engine_openalex.py), so no migration needed.

* refactor(journal-quality): extract _openalex_common for shared S3 helpers

openalex.py and institutions.py duplicated three symbols:
_OPENALEX_S3_BASE, the `s3://openalex/` manifest prefix check, and
the s3_to_https translator. djpetti flagged this in PR #3081 review.

Move them to data_sources/_openalex_common.py (stdlib-only, no
circular imports) and import from both data-source modules. The
on-disk compact key format and manifest fetch URLs stay where they
are; only the duplicated helpers move.

* test(safe-requests): cover redirect-hop SSRF validation + DNS rebinding

safe_requests.py has always validated every redirect hop against the
SSRF allowlist (lines 208–250), but the existing test suite only
exercised the initial request. These five new tests drive the
redirect loop itself:

- redirect target is a private IP → blocked
- redirect target is AWS metadata (169.254.169.254) → blocked
- redirect loop exceeds 10 hops → raises ValueError("Too many")
- DNS-rebinding case (first hop validates, redirect validates false
  for the same hostname) → blocked on the second hop
- a legitimate redirect from one public URL to another is followed

* feat(search-utilities): HTML-safe variant of the journal quality tag

_format_quality_tag emits plaintext like "[Q1 ★★★★★]" which is fine
when the caller renders the containing string as Markdown or plain
text. Today every caller does that, so there's no live XSS. But the
tag is typically embedded alongside a search-result title that came
from an external search engine, and the first HTML-rendered consumer
that does {{ title + quality_tag | safe }} or equivalent would leak
any tags in the title.

Add _format_quality_tag_html(quality, *, title) that html.escape's the
title (angle brackets, ampersands, quotes) and appends the plaintext
tag. Existing callers are unchanged — this is the safe variant any
future HTML-rendered caller should reach for.

The existing helper gets a docstring warning so reviewers of future
PRs know which variant is appropriate.

* test(db): migrations 0006-0010 on a SQLCipher-encrypted DB

The existing test_encrypted_database_orm.py exercises ORM CRUD over
an encrypted DB but never explicitly walks the new journal-quality
chain. This test creates a fresh keyed DB via DatabaseManager (which
runs the full migration chain as part of create_user_database),
inserts a Journal row with every kept column, closes the engine,
reopens with the same key, and reads the row back.

The second test asserts the final journals column set (id, name,
name_lower, quality, score_source, quality_model, quality_analysis
_time) is exactly what test_schema_stability expects.

Guards against SQLCipher key-ordering regressions where a future
change to sqlcipher_utils would let batch_alter_table's rebuild
path see a non-keyed connection.

* test(db): data preservation across journals-table rebuild

Adding name_lower + its index in the squashed 0007 triggers a
SQLite batch_alter_table rebuild under the hood (ALTER ADD COLUMN
is implemented as a full copy). The rebuild runs inside a single
Alembic transaction, so SQLite guarantees atomicity — either the
new table is fully populated or the original stays untouched.

The test validates what successful output must look like:

- 100 rows with a mix of ASCII, diacritics, CJK, and whitespace-
  wrapped names all survive the chain
- name / quality_analysis_time values are preserved verbatim
- name_lower is backfilled via Python's str.lower() (Unicode-
  correct, unlike SQLite's ASCII LOWER())
- no _alembic_tmp_journals orphan table is left behind

Complements test_journal_migration_squash.py (which covers the
simpler idempotency + head-stamp cases).

* refactor(jabref): log abbreviation collisions at debug level

The jabref downloader loads 14 CSV files in order and silently
overwrites on duplicate keys. For abbreviations like "J Org Chem"
that appear in multiple vocabularies (general + ACS) the last
file loaded wins, with no audit trail.

Emit a debug-level log line on each overwriting collision,
mentioning the source filename, abbreviation, and the two
competing full names. Debug level (not info/warning) because the
collisions are expected — the current "last writer wins" behavior
is kept, this is purely observability for operators who care to
tail the log.

* docs(doaj): flag ternary-to-binary seal-field collapse

The DOAJ public CSV distinguishes three seal states: "yes", "no",
and blank (application never submitted). scoring.py only needs
the boolean floor today, so the importer collapses blank and "no"
into has_seal=False. A future tier that rewards "applied and was
denied" differently from "never applied" would need to preserve
the raw ternary — add a comment so that future change isn't
stalled rediscovering this.

No functional change; code path unchanged.

* docs(journal-quality): document the predatory-list whitelist override

Tier 1's auto-removal has a deliberate rescue clause: a journal
flagged by Stop Predatory Journals is kept if it's listed in DOAJ
or has h-index > PREDATORY_WHITELIST_HINDEX (default 10). This
deliberately lets mainstream publishers who occasionally appear
on community predatory lists (Frontiers, MDPI, Sage) through.

The behavior has been in the code since the feature shipped, but
it was undocumented — users seeing a flagged-but-not-removed
journal had no way to tell whether that was a bug or a policy
call. Add a "Predatory-List Overrides" section to
docs/journal-quality.md explaining the rule, the rationale, and
how to tighten or loosen it via PREDATORY_WHITELIST_HINDEX.

* docs(release): pending notes for the journal-quality redesign

Staging file documenting the changes introduced by #3081 so they
can be folded into the next tagged version's release-notes file.
Key entries:

- Major features: tiered scoring, journal dashboard, quality tags
- BREAKING: lists the 16 `journals` columns removed and points
  custom SQL consumers at the new reference DB accessor
- Upgrade cost note (one-time per-user table rebuild, typically
  <1 s, 2–5 s on very large libraries)
- Settings introduced (both opt-in)
- Operational improvements carried by the PR A fix-up stack
  (Windows readonly, disk-space pre-check, download retries)

* docs(journal-quality): explain synchronous=OFF durability tradeoff

The reference-DB build sets PRAGMA synchronous=OFF during bulk
insert. That looks scary at a glance because elsewhere in the
codebase the same pragma would risk corruption, but here it's
correct — the build writes to a unique .tmp-PID-RAND path, and
any crash mid-build orphans that temp file while leaving the
live DB untouched. The atomic os.replace() at the end of
build_db is what provides durability, not synchronous=NORMAL.

Add an inline comment so reviewers and grep-forensics readers
don't need to reconstruct this from the surrounding code.

* fix(journal-quality-docs): six accuracy fixes surfaced by 30-agent review

docs/journal-quality.md
- h-index quality bands: replace ≥ with strict > in the Quality Scale
  table and the Tier 2 threshold listing. scoring.py uses strict >
  at every boundary, so h=150 scores 8 (Strong), not 10 (Elite);
  the doc was off-by-one at every tier boundary.
- Quality Scale "Strong" row: change "h-index 40-149" to "41-150"
  to match the actual band (`> 75` through `> 150` inclusive-ish).
- Data-sources table: DOAJ row `~35K` → `~22K`. The code's three
  count claims (doaj.py docstring, description, _MIN_DOAJ_JOURNALS
  floor) all correctly say 22K, which matches the upstream DOAJ
  size. 35K overstates coverage by ~60%.
- Predatory-list override rationale: drop "Frontiers, MDPI, Sage"
  from the false-positive example. Only Frontiers is actually in
  the Stop Predatory Journals CSVs this code ingests; MDPI and
  Sage are not. Neutral phrasing preserves the argument without
  misattributing flag status to specific publishers.

docs/release_notes/pending-journal-quality-redesign.md
- Settings section: "both opt-in" was wrong. The per-engine toggles
  default `true` (opt-out), and three sibling toggles (arxiv,
  openalex, nasa_ads) ship alongside the one the notes named.
  Rewrite as "1 opt-in + 4 opt-out" listing all five keys.
- First-use download timing: "10-30 s" is under OpenAlex's own
  30-60 s floor, and the five sources fetch sequentially in
  downloader.py. Widen to "1-2 minutes" with the OpenAlex-alone
  baseline called out so operators don't expect 10 s.

src/local_deep_research/journal_quality/db.py
- Broaden the synchronous=OFF comment's lede to include
  `journal_mode = OFF`. The atomic-rename invariant actually
  protects the whole pragma set, not just synchronous; the final
  "Do NOT copy this pragma set" warning was body-mismatched.

* test(journal-quality): update stale assertions to match recent fixes

Three tests lagged behind earlier commits on this branch:

- test_journal_reputation_coverage: mock chain missed the new
  quality_model filter added in 55a99a7f2 (Tier 0 LLM-only cache).
  Both above/below-threshold cases get the extra .filter link.
- test_db::test_print_and_electronic_issn_both_survive: ISSNs are
  stored in canonical no-dash form (normalize_issn) as of 55a99a7f2;
  assertion updated to match.
- test_downloader::test_build_db_failure_returns_false: exception
  message is no longer surfaced to callers (info-disclosure
  hardening in da803376d); assert on exception class name instead.

* fix(journal-quality-ui): correct whitelist copy + h-index band operators (#3525)

Two UI-copy drifts surfaced by the review pass on #3516:

- Trust-signals bullet for "Predatory" described the flag without
  mentioning the whitelist carveout, so a user seeing a predatory
  journal in their results had no way to tell why it survived.
  Add the DOAJ-or-PREDATORY_WHITELIST_HINDEX rescue clause.
- Threshold-2 description had the same gap; match the trust-signal
  wording.
- Threshold-slider descriptions for 7 / 8 / 9 / 10 used `≥` for the
  h-index bands, but `scoring.py` uses strict `>` (matches the doc
  fix made in #3516 for `docs/journal-quality.md`). At each
  boundary value the UI overstated what the threshold keeps —
  e.g. threshold 10 described h-index ≥ 150 keeps Nature/Science,
  but a journal with h=150 exactly would score 8, not 10.

Pure template/string change; no JS logic touched.

* fix: Round 6-7 follow-ups — thread safety, resource leak, perf (#3452)

* fix: add lock around shared SearXNG engine in journal filter (Round 6)

The JournalReputationFilter instance is cached inside the parallel
search engine and shared across worker threads. When Tier 4 (LLM
analysis) is enabled, two concurrent filter_results calls could both
invoke self.__engine.run(query) on the same SearXNG instance, causing
the engine's mutable bookkeeping state (_last_results_count,
_search_results, rate tracker) to race.

Tier 4 is disabled by default and rarely hit, so contention cost is
negligible compared to the correctness guarantee.

* fix: Round 7 — resource leak + perf hotspots

1. BaseSearchEngine.close() now closes _preview_filters too (journal
   reputation filter is registered as preview, not content)
2. __clean_journal_name memoized per batch via local dict
3. _resolve_journal_id memoized per batch via journal_id_cache

* test: add savepoint isolation and _json_safe integration tests

- test_batch_with_failing_source_savepoint_isolation: verifies
  a 3-source batch persists all 3 when using savepoints
- test_json_safe_rejects_non_serializable_source: verifies a
  source containing a datetime object (non-JSON-safe) is
  correctly sanitized via _json_safe and the Paper row is
  persisted without crashing json.dumps() at flush time

* refactor(journal-quality): collapse migrations 0006-0010 into one

The journal-quality feature has not shipped, so its five-migration
history (with two no-op stubs at 0008 and 0010 preserved for mid-chain
dev DBs) is debt that protects a user population that doesn't exist.

Collapses into a single 0006_journal_quality_system.py that creates
the papers/paper_appearances tables, adds the three kept columns and
two indexes to journals (with the diacritic-safe name_lower backfill),
and adds ix_research_resources_research_id — the net effect of the
pre-squash chain. Deletes test_journal_migration_squash.py along with
its mid-chain regression tests (no longer reachable).

All migration test suites pass locally (271 tests across 7 files).

Dev databases on the branch stamped at 0006-0010 will need to be
reset — delete the file and let the app re-initialize on next start.

* fix(search): remove duplicate _preview_filters close loop

The close() method iterated _preview_filters twice — once before and once
after the _content_filters loop. safe_close() logs a warning on the second
invocation against an already-closed resource; keep a single pass.

* fix(migrations): use UtcDateTime in 0006 journal quality

Migration 0006 used sa.DateTime(timezone=True) on three timestamp columns.
Main's new check_datetime_timezone.py hook (commit bab0f61b6) rejects that
pattern outside tests, so the migration would fail pre-commit on rebase.
Switch to UtcDateTime with server_default=utcnow() to match the rest of
the codebase.

* fix(security): rate-limit journal-data download + CSRF header

- Add journal_data_limit (2/hour per authenticated user) in rate_limiter.py
- Decorate POST /api/journal-data/download to cap manual rebuilds
- Send X-CSRFToken in the dashboard's fetch; Flask-WTF already enforces
  CSRF at the blueprint level, so without this header the button would
  start returning 400

* test(arxiv): assert journal_ref is forwarded in previews

Parametrize test_paper_in_cache_no_pdf over journal_ref so the result
dict's journal_ref key is checked both when absent (None) and when
present (a realistic citation string). Guards against accidental removal
of the forwarding added in d88de731d4.

* fix(openalex): detect id rename, journal-only drift sample, surface SchemaDriftError

Three related drift-detection gaps:

- An ``id``→``source_id`` rename causes every record to be dropped at
  parse time, hitting the row-count floor with a generic RuntimeError
  that hides the cause. Track raw parse counts and raise a specific
  SchemaDriftError when parsed_records is healthy but parsed_with_id
  is zero. The check runs before the row-count floor so it wins.
- The ``h_index``/``cited_by_count`` drift sample scanned all source
  types, which would false-trigger on snapshots skewed to conferences
  or other types that legitimately lack ``h_index``. Filter the sample
  to ``type == "journal"`` records only.
- ``downloader.py`` collapsed ``SchemaDriftError`` into its class name
  as part of CodeQL info-disclosure hardening. Drift messages are
  developer-authored string literals with no SQL/path/stack content,
  so surface them verbatim while keeping generic exceptions scrubbed.

Also updates existing drift assertions to the new "journal sample"
phrasing and adds end-to-end tests for the id-rename and
conference-only-snapshot paths.

* test(metrics): cover journal-quality endpoints + cross-user isolation

Adds targeted coverage for the four endpoints the PR introduces, plus
an ownership test on the per-research endpoint:

- TestApiJournalQuality: auth, per_page clamp to 200, sort-injection
  pass-through to the DB-layer allowlist. Mocks
  get_journal_reference_db so the route logic runs without triggering
  the lazy network-fetch build.
- TestApiJournalDataStatus: auth check, dict-shape response.
- TestApiJournalDataDownload: auth check, authenticated POST reaches
  the handler (mocked downloader, no network).
- TestApiResearchJournals.test_other_users_research_id_returns_404:
  registers a second test user in a fresh client and confirms they
  cannot fetch user A's research_id — the per-user encrypted DB is
  the ownership boundary. Gracefully skips if multi-user registration
  is unavailable in the env.

* fix(db): validate order param in get_institutions_page

Matches the existing defensive guard in get_journals_page. The current
ternary is safe via ORM (.asc() / .desc() only), but the explicit
allowlist prevents future refactors from accidentally interpolating a
tainted value into raw SQL.

* refactor(db): drop redundant index=True on research_resources.research_id

Migration 0006 already creates `ix_research_resources_research_id` on
this column. Leaving `index=True` on the model means `create_all()`
(e.g. in ad-hoc tests or tooling) would add a second unnamed index on
the same column — wasted storage + write cost.

* fix(filter): strip zero-width and bidi chars in _sanitize_name

Replace the narrow C0/C1-only regex with log_sanitizer.strip_control_chars,
which covers C0/C1 + Arabic letter mark + zero-width space/joiner/mark +
bidi override (U+202A-E) + isolate (U+2066-9) + digit shape controls + BOM.

Tier 4 (LLM) is opt-in and the score is strictly validated, so the real
exploit surface is minimal — but a crafted bidi-override in a quoted
journal name could still confuse LLM or log rendering. Using the
comprehensive, audited pattern eliminates a regex drift point.

* fix(engines): forward ISSN from PubMed and OpenAlex previews

The journal reputation filter already reads `result.get("issn")` for
Tier 2/3 lookups, but neither OpenAlex nor PubMed was forwarding it.

- OpenAlex: extract `source.issn_l` (linking ISSN) and add to the
  preview dict alongside the existing `openalex_source_id`.
- PubMed: esummary already extracted `issn` / `essn` into `summaries`
  (line 766). Forward to the preview (prefer issn, fall back to essn).

NASA-ADS is not included — the esummary API we call does not return
ISSN (the field list uses bibstem codes instead).

Without ISSN, the filter falls back to name-only matching which is
slower and less reliable on journal-name variants ("Nat Commun" vs
"Nature Communications"). With ISSN the lookups hit the indexed column.

* fix(filter): propagate settings-read errors in create_default

The inner ``except Exception: enabled = True`` wrapped only the settings
snapshot read and silently defaulted the filter to enabled if anything
went wrong — a corrupted snapshot, a DB lock, an import error — all of
which should surface, not be masked. Per CLAUDE.md: no silent fallbacks.

Merge the inner catch into the outer one. Any error (settings read or
filter init) returns None, and ``logger.exception`` records the real
cause so operators can see what broke.

Adds a regression test asserting create_default returns None when
get_setting_from_snapshot raises.

* docs(journal-quality): troubleshooting, DB management, Tier 4 cost

- Add Tier 4 Cost & Latency callout (latency ~3-10s per unknown
  journal, ~300-500 tokens per analysis, cached 365 days by default).
- Add Troubleshooting section covering the common questions:
  low score, missing journal, performance.
- Add Database Management section with per-OS DB path, read-only
  enforcement notes, and force-rebuild instructions.
- Rename pending release note to 1.6.0.md (current version 1.5.6;
  this PR bumps minor because it adds a new dashboard + changes the
  journals table schema).

* test(migrations): dedicated upgrade/downgrade roundtrip for 0006

Migration 0006 consolidates five originally-separate revisions into one
atomic change. The existing generic alembic test doesn't exercise the
specific objects this migration creates.

Covers:
- papers table with doi/arxiv_id/pmid UNIQUEs and journal_id FK on
  ON DELETE SET NULL (preserves paper provenance when journals are
  removed).
- paper_appearances join table with both FKs on ON DELETE CASCADE
  and resource_id UNIQUE (dedup guard at the schema level).
- journals.name_lower backfill — diacritics survive Python str.lower.
- upgrade → downgrade → upgrade roundtrip asserts downgrade removes
  every object upgrade created, and that upgrade idempotently rebuilds.

The paper_appearances index test checks by column coverage rather than
index name: the ORM pre-creates the table via Base.metadata.create_all
elsewhere, so the migration's explicit idx_* name isn't what ends up
in the DB. That's a separate pre-existing issue, not regressed here.

* test(db): regression guard for get_institutions_page order allowlist

Exercises the defensive guard added in commit 23b57a054. A tainted
``order`` string must not crash or leak into SQL; the DB layer treats
anything other than "asc"/"desc" as "desc", so the two calls below
must return identical institution lists.

Mirrors the style of test_invalid_sort_column_defaults_to_quality in
TestGetJournalsPage.

* fix(journal-quality): stale sentinel recovery + live download progress

Two related problems the user hit on a fresh install:

1. Stale `.downloading` sentinel blocked every retry. When the download
   thread dies mid-way (HTTP timeout, client disconnect, SIGKILL) the
   `finally` cleanup never runs and the sentinel lingers. The next
   request got "Download already in progress" forever. Add a stale-age
   check (20 min > expected 7 min wall-clock) that reclaims the
   sentinel instead of refusing.

2. The progress UI was fake: jumped to 30% and sat there for ~7 minutes
   with no indication of what's happening or what source is being
   fetched. When the download died silently the user saw "Download
   failed" with zero context.

   Add a module-level `_download_state` dict updated at every phase
   transition (per-source start, DB build, success, failure). Expose
   it via the existing `/metrics/api/journal-data/status` endpoint
   under a `download_progress` key. The dashboard polls it every 2 s
   while a download is in flight and renders real text like
   "[23%] Downloading OpenAlex — source 1 of 5".

   Also probe the status on page load: if a download started elsewhere
   (background init, another tab) the dashboard shows the live
   progress instead of a stale "Not downloaded" banner with a fresh
   button.

The download is still a synchronous HTTP POST (closing the tab doesn't
cancel the server work), so the CTA text is updated to tell the user
they can close the tab and the download continues server-side.

* feat(journal-quality): parallel source downloads + per-source progress rows

Parallelize the 5 source downloads via ThreadPoolExecutor; restructure
the shared download state into per-source entries so the dashboard can
render one progress row per source (+ a sixth for the DB build step).

Each source streams from a different host (OpenAlex S3, DOAJ, GitHub
raw, OpenAlex REST for institutions) so there's no single-host
contention; wall-clock is now bound by the slowest source rather than
the sum. release-gate.yml already uses this pattern for the integration
test.

Also fixes a UX bug: the journals-table API returns 503 when the
reference DB isn't built yet, which the dashboard rendered as a scary
red "Failed to load journal data" box. The install CTA banner above
already communicates the state, so we silently ignore the 503.

test_openalex_failure now mocks all 5 sources because in parallel mode
non-OpenAlex workers still run (just want them to return 0 quickly).

* feat(journal-quality): per-partition progress callback for live bars

Dashboard feedback: the two OpenAlex sources sat at "running" for
30-60 s each with the bar showing a frozen 50% — no sense of motion
even though the server logs "5/39 parts" periodically.

- DataSource.fetch gains optional `progress_cb(done, total, detail)`.
- openalex.py + institutions.py call it on every partition (not just
  every 5th like the human-readable log).
- One-shot sources (doaj, jabref, predatory) take the kwarg but
  ignore it — they finish in <10 s so the 0 → 100 snap is fine.
- downloader._fetch_one wraps the callback to write a per-source
  `percent` field in _download_state; the status endpoint carries
  it to the dashboard.
- Frontend row bar uses that percent instead of the 50% placeholder
  it had for the running state.

11 downloader tests green; no test changes needed (mocks pass
through kwargs transparently).

* feat(journal-quality): pending marker when ref DB still downloading

On a fresh install the search can fire before the reference DB finishes
building — every journal then falls through the "no scoring data"
branch and gets marked score 3 ("low-confidence unknown"), which is
misleading because we don't actually know the journal is unknown; we
just haven't loaded the data yet.

Introduce a QUALITY_PENDING = "pending" sentinel in search_utilities.
filter_results checks `data_manager.available` at the top of the
batch; if False, it skips all scoring and tags each result with the
sentinel instead. The tag renderer recognizes the sentinel and emits:

  [journal quality data still downloading — check /metrics/journals
   and re-run the search once the build finishes]

This only fires during the narrow window between "user kicks off
install" and "reference DB built" — once the DB exists, normal
scoring resumes on every subsequent search. 63 filter + tag tests
still green (accept string sentinel alongside int|None).

* fix(filter): probe DB file directly instead of .available

The ``.available`` property on JournalQualityDB has side effects — it
calls ``_ensure_engine()`` which tries to lazy-build the DB and, when
a download is in flight, blocks for several minutes waiting for the
build to finish. That defeats the pending-marker logic I just added:
.available would eventually return True once the build completed, so
the fail-soft branch never fired.

Check ``journal_quality.db`` file existence directly (a cheap stat)
before deciding whether to mark results as pending. If the file isn't
on disk yet, we're still in the fresh-install window — skip scoring,
return results with the QUALITY_PENDING sentinel.

This also avoids the thundering herd of 30+ filter workers each
triggering a build attempt via ``.available``.

* docs(filter): clarify pending-marker copy — download in flight

Earlier copy said "check /metrics/journals and re-run once the build
finishes", which could be read as "the download hasn't started yet —
go trigger it". Reassure the user: the download IS already running in
parallel and may even be complete by the time they click through.
This avoids the "did my search error out?" reaction.

* fix(downloader): 30s cooldown cache on ensure_journal_data

Thundering-herd guard. During a search, every search engine's
reputation-filter worker (~30 threads) called ensure_journal_data
concurrently. On a fresh install (no data files yet) they all raced
to create the .downloading sentinel; one won, 29 got rejected and
each logged a WARNING. Observed: 30 identical warnings in a single
millisecond.

Module-level tuple cache: (timestamp, result). Successful calls
(data files already present) are still fast and uncached — that's a
single stat() and the caller gets the real answer. Only the
negative/"download failed or still running" result is cached, for
30 seconds. First caller does the real work; the other 29 within
the window get the cached (None, False) and move on. Cache entry
naturally self-expires, so subsequent batches re-check.

* fix(filter): strip arxiv journal_ref edge cases + respect exclude_non_published in pending mode

Two concrete gaps the fresh-install test surfaced:

1. Trailing empty parens. ArXiv journal_refs sometimes end with "()"
   when the citation year got stripped upstream, e.g.
   "Physical Review Research ()". Regex-strip whitespace-only trailing
   parens.

2. Truncated volume/page markers. ArXiv preview cuts citations
   mid-keyword: "Plasma Physics and Controlled Fusion, vol. 63" →
   "Plasma Physics and Controlled Fusion, v". Strip trailing
   ", v" / ", vol" / ", p" / ", pp" / ", no".

Also refines the pending-marker fail-soft path: when
exclude_non_published is True, results without a journal_ref are
still dropped even in pending mode. Only venued results carry the
marker. Previously the pending early-return short-circuited the
exclude-non-published check and returned all results.

9 new parametrized regex cases guard the two fixes + 3 regressions.

* debug(filter): log db_ready probe + pending-tag counts

Make the pending-marker path visible in the log. Previous code logged
a single generic WARNING without counts, so operators couldn't tell
whether the path fired or which results got the marker.

- Log on_disk / engine_cached / db_ready values at the probe site.
- Log exception stack if the probe raises.
- Log tagged / kept / dropped counts at the end of the pending branch.

* fix(filter): kick off background fetch when pending path fires

The pending-marker copy tells the user "by the time you check
/metrics/journals it may already be complete" — but that was a lie.
When I replaced the side-effect-ful ``.available`` check with the
cheap file-existence probe, I also removed the code path that
indirectly triggered ``ensure_journal_data``. Net result: the filter
correctly tagged results as pending but never started the download.
Users would see "pending" forever unless they manually clicked the
Download button on the dashboard.

Spawn the download in a daemon thread on first hit of the pending
path. A module-level threading.Lock guards the spawn — 30 concurrent
filter workers can't each start their own thread (the first one gets
through, rest see ``_bg_fetch_thread.is_alive()`` and bow out). The
30-second TTL cache in ``ensure_journal_data`` is a second line of
defence.

Daemon thread so it doesn't block process exit.

* docs(journal-quality): add 5th help step on data storage + refresh

Existing 4-step panel explains scoring but says nothing about where
the data lives or how to refresh it. User feedback asked for that
context. Add a 5th step scoped to admins:

- Path on Linux/macOS + Windows.
- Explicit note that the data is shared across all users on the
  server — a forced refresh affects everyone.
- Refresh recipe: stop server → delete files → restart → next
  search or dashboard visit re-downloads in the background.
- Marks it as "typically an admin task; normal users don't need to
  refresh" to discourage casual reloads.

No refresh button — it would affect all users and mid-search quality
scores would disappear, which is multi-tenant hostile. The existing
Download button already force-refreshes when needed.

* fix(downloader): clear orphan .downloading sentinel on startup

If the previous server process got killed mid-download (SIGKILL, crash,
restart during a fresh install), the ``finally`` cleanup in
download_journal_data never runs and the sentinel file survives on
disk. The new process then sees the orphan sentinel on every retry
and bows out with "Download already in progress" — but nothing is
actually downloading. The user is blocked for up to 20 minutes
(the _SENTINEL_STALE_SECS stale-reclaim window).

A fresh process cannot own an in-flight download, so any sentinel
present at module import time is by definition orphan debris. Unlink
it on startup with a clear WARNING log line so operators can see
what happened.

Hit this during fresh-install testing: restarted the server while
OpenAlex sources was still streaming; the sentinel survived; the
next search's background-fetch attempt was stuck waiting for a
download that no longer existed.

* fix(downloader): PID-based sentinel liveness check on every call

Complement to the startup orphan cleanup: even within a single server
process lifetime, a download thread can crash mid-flight and leave
the sentinel behind. The startup hook only catches cross-restart
orphans; this handles same-process ones.

Stamp the sentinel with the owner process's PID at creation time.
When a new call sees an existing sentinel it:

  1. Reads the PID.
  2. If the PID is not alive (ProcessLookupError from os.kill(..., 0))
     or the sentinel is malformed → reclaim immediately.
  3. If the PID matches our own → don't nuke self; treat as alive
     (the module-level lock should prevent this, but err safe).
  4. Otherwise → still owned, bow out with "already in progress".
  5. The 20-minute age-based reclaim remains as a last-resort fallback.

Update test_concurrent_download_blocked to stamp the current PID into
the simulated sentinel so the liveness check returns "alive" instead
of treating an empty sentinel as orphan and falling through to real
network calls.

* docs(journal-quality): rewrite step-5 help without HTML entities

The help_step macro renders its body as plain text, so <code>...</code>,
&amp;, &mdash;, and &apos; showed up as literal strings in the UI. Strip
the HTML and use plain Unicode characters (& instead of &amp;, — instead
of &mdash;, straight apostrophe instead of &apos;). Inline code becomes
plain monospaced-looking text — close enough given the surrounding
steps have no inline-code formatting either.

* feat(filter): QUALITY_PREPRINT sentinel + explicit per-score tag mapping

User feedback: the [Unranked ★] tier never appeared in reports, and
arxiv preprints had no tag at all — users couldn't tell the quality
column was blank because "no venue" vs. because "DB failed to load".

Two changes:

1. Add QUALITY_PREPRINT = "preprint" sentinel. The filter's
   _handle_no_venue path now sets result["journal_quality"] to this
   when Tier 3.5 (institution salvage) doesn't produce a numeric
   score. The tag renderer emits "[preprint — not in journal
   catalog]".

2. Rewrite _format_quality_tag with an explicit branch per score
   (1 through 10) instead of >= ranges. Adjusts:
   - Score 3 ("no scoring data" fallback) now renders [Unranked ★]
     instead of [Q4 ★]. Semantically correct: we don't know the
     venue, we're not claiming it's low-quality.
   - Score 4 still renders [Unranked ★] (DEFAULT for "in catalog,
     no h-index signal").
   - Out-of-set values fall through to f"[quality={value!r}]" so a
     broken scoring-logic change surfaces the raw value instead of
     silently bucketing into Q4.

Adds tests:
- score 3 → Unranked (the user-visible change)
- QUALITY_PREPRINT → preprint tag
- QUALITY_PENDING → existing downloading message
- out-of-range values surface raw in [quality=…]
- every VALID_QUALITY_SCORES member maps to a real tier tag

Also: downgrade() docstring gains a data-loss warning; release notes
update the outdated "sources are fetched sequentially" claim to
reflect the parallel ThreadPoolExecutor we shipped.

* feat(journals): denormalize container_title + journal_quality onto Paper

The "Your Research" tab of /metrics/journals was empty for every user
in the default config. Root cause: filter's Tier 1-3 scoring
(predatory/OpenAlex/DOAJ/institutions, covering >99% of lookups)
reads the bundled read-only reference DB and never writes Journal
rows to the per-user encrypted DB. Only Tier 4 (LLM, opt-in, default
OFF) creates them. With no Journal rows, the dashboard's
SELECT COUNT(*) FROM journals hit 0 and returned the empty state.

Fix: promote two fields to first-class Paper columns:

- container_title (String(500), indexed) — the cleaned name that
  keyed the filter's successful score. Always populated when the
  filter scored the journal. Dashboard GROUP BY key.
- journal_quality (Integer) — populated ONLY by the Tier 4 LLM path
  (expensive + non-deterministic → worth freezing). Tier 1-3 scores
  are deterministic and recomputed live from the ref DB so the
  dashboard tracks upstream data updates without staleness.

The existing Journal table + journal_id FK + LLM cache path are
unchanged.

Dashboard endpoints (/api/journals/user-research and /api/journals/
research/<id>) now group on Paper.container_title, batch-enrich via a
new ref-DB helper lookup_sources_batch (one SQL round-trip per page
load instead of N per-row lookups), and pick quality via a precedence
order: frozen LLM verdict → live ref DB score → NULL if neither.

Migration 0006 modified in place — it hasn't shipped yet. Dev DBs
stamped at 0006 need to be reset (as the migration's existing header
already notes).

* fix(scoring): cap preprint repositories at ACCEPTABLE (5)

arXiv (Cornell) and other preprint repositories were being rated Q1
ELITE (10) because derive_quality_score treated any source_type
the same — arXiv has h_index=674 + Q1 in OpenAlex, so it hit the
elite branch. But repositories aren't peer-reviewed: their h-index
reflects aggregate citation accumulation across the thousands of
papers they host, not venue rigor.

Fix: short-circuit source_type=="repository" to
REPOSITORY_QUALITY_DEFAULT (5) right after the predatory check, same
pattern as conferences. The filter's existing Tier 3.5 institution
salvage can still lift this to 6 when authors are at a strong
institution. Q-tier semantics stay meaningful for the 234K real
journals in the ref DB.

Bumped JOURNAL_DATA_VERSION v3→v4 so existing installs rebuild the
ref DB and pick up the corrected scores for the ~6,789 repository
entries.

* fix(normalizer): strip 'unknown' placeholder from container_title

OpenAlex and NASA ADS search engines emit journal="unknown" when the
upstream record has no venue indexed. The citation normalizer's
waterfall fallthrough picked that up as container_title, which then
(a) leaked into Paper.container_title so the dashboard showed a
literal "unknown" row grouping across papers from multiple real
journals, and (b) matched a real OpenAlex source actually named
"unknown" (Q1, h_index=5, score=8) during name-based ref-DB lookup,
producing a nonsensical Q1 rating.

Fix at both layers:
- OpenAlex + NASA ADS engines now emit None for both journal and
  journal_ref when the underlying venue is missing, matching what
  journal_ref already did.
- Normalizer strips literal "unknown" / empty values from the
  container waterfall defensively in case any other engine ever
  emits the same sentinel.

Covers "Unknown" / "UNKNOWN" / whitespace-padded variants.

* fix(journals): tag score source + fail-closed predatory on filter crash

Two correctness blockers surfaced by the post-rewire audit.

B2 — llm_cached gate was unsound. save_research_sources used
`journal_id is not None` as the gate for persisting Paper.journal_quality,
but _resolve_journal_id matches by name_lower alone — so any prior LLM-
enabled session's Journal cache row made the gate True, and a subsequent
Tier-2 score got stamped as if it were an LLM verdict. Violated the
"only LLM verdicts are frozen" invariant documented at citation.py:49-54.

Fix: __score_journal now returns (score, source_tag) where source_tag
identifies which tier produced the value — "openalex", "doaj",
"institution", "llm" (Tier 4 live scoring OR cache hit on a prior LLM
row), "conference", "low_confidence", or None for predatory. The filter
attaches source_tag to each scored result, and save_research_sources
gates journal_quality persistence on `source_tag == "llm"` instead of
the FK-presence check.

S4 — filter top-level except re-admitted predatory. The catch-all at
the end of filter_results returned `results` (raw input) instead of
`filtered` (predatory-free), so any Tier 1 auto-removed predatory
journals would leak back into the output when the filter hit an
unexpected exception mid-batch.

Fix: return `filtered` instead. Initialize `filtered = []` before the
try so the except branch can always reference it even if the crash
fires before Pass 1 populates anything. Losing in-flight non-predatory
results is preferable to breaking the predatory-removal safety contract.

* fix(journals): schema-drift, container_title dedup, stale-version warn

B.1 — drop server_default=utcnow() from migration 0006's papers +
paper_appearances timestamp columns. The Paper model uses Python-side
default=utcnow() + onupdate=utcnow() (citation.py:102-105), but
0001_initial_schema.py's create_all() path renders tables from the
model (no SQL-level default), while the migration-replay path was
getting server_default. Two environments, two schemas. Align on
the client-side default.

B.2 — pop container_title from citation_fields before building the
Paper row so the value lives only in the indexed column, not
duplicated into the paper_metadata JSON blob. The CSL-JSON exporter
already captures the raw value inside citation_fields["csl_json"]
during normalize_citation, so bibliography export is unaffected.

B.3 — add stale-data-version warning. JOURNAL_DATA_VERSION bumps
(v3→v4 in the repository-cap fix) were silently unnoticed by any
code path except the admin dashboard banner: _ensure_engine only
checked PRAGMA user_version (schema), not version.json (data). The
filter hot path served stale scores until a user visited
/metrics/journals. Now _warn_on_stale_data_version fires once per
engine lifetime at WARNING level — no auto-rebuild (user consent via
the dashboard's Download button remains the explicit refresh), just
visibility.

B.4 — drop idx_paper_appearances_paper from the migration. The
model's index=True on PaperAppearance.paper_id is the single source
of truth, matching the existing ResearchResource.research_id pattern
at research.py:186-189.

C.3 — docstring polish + FP-protection comments so future audits
don't re-flag these as bugs.

* fix(journals): failed-count log + Journal module + name_lower UNIQUE

B.5 — save_research_sources now tracks per-source failure count and
emits a summary WARNING at end-of-batch when drops occurred. The
broad per-source except is intentional (isolation), but previously
`saved_count` couldn't distinguish "all saved" from "some silently
dropped".

C.1 — move `Journal` out of `logs.py` into its own
`database/models/journal.py`. Re-export from logs.py keeps the
existing `from ...database.models.logs import Journal` compat path
used by test_schema_stability.py.

C.2 — add UNIQUE on `Journal.name_lower`. Two rows with different-
cased `name` values (e.g. "Nature Medicine" vs "NATURE MEDICINE")
would both pass the existing `name` UNIQUE check while agreeing on
`name_lower`, splitting the LLM cache. Narrow but real because the
Tier 3.6 LLM-relabel path can produce different casings.

Migration 0006 pre-dedupes case-folded `name` collisions BEFORE the
batch_alter_table column add — SQLite enforces the new UNIQUE during
the table-copy step of batch_alter, so collision cleanup had to
happen first. Keep lowest id per group (first-writer-wins); the
cache is reproducible.

Migration uses SQLAlchemy Core (reflected Tables + sa.select /
sa.update / sa.delete) rather than raw sa.text() strings per project
preference.

* test(downloader): stamp live PID in sentinel fixtures

Two disk-check tests broke when the PID-based sentinel liveness check
shipped (commit f4cfc9d25) — they ``touch()``'d an empty ``.downloading``
file, which the new ``_sentinel_owner_alive`` correctly treats as
orphan (empty read_text().strip() fails int() parse → ValueError →
orphan). The downloader then reclaims the sentinel and doesn't
short-circuit with "already in progress".

Fix: add ``_stamp_live_sentinel`` helper that writes the current PID
so the liveness check sees an alive owner and the download refuses
as expected by the test's assertion.

Pre-existing failure, not from this audit's work — spotted while
running the broader regression suite.

* test(journals): update fixtures for new signatures + safety contract

Fixes 12 CI test failures introduced by this audit's changes:

- **nasa_ads engine tests** (2) — updated to expect ``None`` (not the
  ``"unknown"`` literal) when no pub/bibstem is available. The engine
  now emits None at both ``journal`` and ``journal_ref``; the old
  sentinel was leaking through the normalizer's container_title
  fallback and matching a real OpenAlex source named "unknown".

- **schema parity test** (1) — added explicit
  ``UniqueConstraint(..., name="uq_journals_name_lower")`` in the
  Journal model's ``__table_args__`` so ``compare_metadata`` sees the
  same constraint name the migration creates. Without the explicit
  name, SQLAlchemy auto-generated a different constraint name and
  ``test_migrations_produce_schema_matching_models`` reported drift.

- **coverage + tiers tests** (~9) — the filter's ``db_ready`` probe
  was blocking every scoring-path test in CI (no ``journal_quality.db``
  file present). Added an autouse fixture to the filters directory's
  conftest that patches ``Path.exists/stat`` for that specific file
  so the probe returns True. Individual tests can still override if
  they want to exercise the pending path.

- **2 tests of the old safety-contract inversion** — renamed and
  updated to expect ``filtered`` (predatory-free) instead of the raw
  input list on filter crash. The S4 fix in this PR's main commits
  changed that behavior deliberately to prevent predatory re-admission.

Merge `main` into the branch picked up 5 unrelated commits; no
conflicts.

* fix(journals): log dropped DOAJ Seal +1 bump

When the LLM score is 8 and the journal has the DOAJ Seal, the +1 bump
lands on 9 — which is not in VALID_QUALITY_SCORES {1,4,5,6,7,8,10} —
so the bump is dropped. Previously this was silent, hiding the fact
that the Seal had no effect on Strong-tier journals. Add a debug log
so operators can see the skip, and a regression test locking the
behavior in.

* fix(journals): clamp echoed dashboard page to total_pages

An attacker could request /api/journals?page=10**9 and the route
would echo the unbounded page number in the JSON response, making the
UI render nonsense pagination state. SQLite's OFFSET on the indexed
ORDER BY caps work at total rows so there is no DoS, but the UX bug
is real. Clamp the echoed page at the route layer (no DB-method
signature change) and reuse the already-computed total_pages.

* chore(hooks): make mode=ro readonly regex case-insensitive

SQLite accepts case-insensitive URI parameter values, so mode=RO,
mode=Ro, etc. are all valid read-only opens. The pre-commit hook's
regex was case-sensitive and would have missed those forms. Add the
IGNORECASE flag and cover the new forms with tests.

* a11y(journals): add th scope and sr-only labels on dashboard

The journal-quality dashboard tables were missing scope=\"col\" on
their <th> cells, so screen readers could not announce column context
for each data cell. The filter inputs (search box + tier/source
selects) also had no associated <label>, leaving them unnamed for
assistive tech. Use the existing .sr-only class from styles.css.

* chore(templates): use url_for for journals link in metrics.html

The sidebar already routes the Journals nav via url_for(). The
metrics.html nav bar was the lone outlier with a hardcoded path,
which would silently break if the route prefix ever changed.

* docs(release): note dashboard 503 and filter-warmup trick on first launch

The Journals dashboard page loads fine on a fresh install, but the
/api/journals data endpoint returns 503 until the reference DB
finishes building. Document the exact response and the warmup tip:
kick off a research request in parallel to spawn the background
build thread.

* test(journals): tier fallthrough and short-circuit regression tests

Two new regression tests close gaps in the tier-pipeline coverage:

1. When every tier (predatory, OpenAlex, DOAJ, institution) misses and
   Tier 4 LLM scoring is disabled, the low-confidence floor should
   tag the result with score=3 and source='low_confidence'. Guards
   the only explicit 'no data at all' output path.

2. When Tier 2 (OpenAlex) produces a score, later tiers (DOAJ,
   institution salvage) must not run. Asserts call_count==0 on the
   downstream lookups so any future refactor that accidentally
   unconditionally calls them is caught.

* fix(journal-quality): merge-readiness polish + pytest scheduler teardown

- Docstring: DOAJ Seal → 8 (was stale "→ 6") in
  advanced_search_system/filters/journal_reputation_filter.py. Constants,
  scoring.py, docs/journal-quality.md, the dashboard template, and tests
  all already use 8. Closes the outstanding docstring-accuracy thread.
- Dashboard: allow `quartile` as a sort column in journal_quality/db.py
  `_SORT_COLUMNS` allowlist. The clickable "Quartile" header in
  templates/pages/journal_quality.html silently fell back to sort-by-quality
  because the backend rejected the column. `quartile` is indexed
  (models.py:64) and get_journals_page already applies .nulls_last().
- Docs: docs/journal-quality.md says "Analytics → Journals" to match the
  actual sidebar section (components/sidebar.html:71); release notes were
  already correct.
- CI: drop phantom `journal_data_downloader.py` whitelist entry from
  .github/scripts/check-file-writes.sh — file does not exist; real path
  `journal_quality/downloader.py` is already matched on the same line.
- Style: collapse redundant `except (ValueError, Exception)` → `except
  Exception` in Tier 4 of the filter (`ValueError` is a subclass).
- Tests: stop BackgroundJobScheduler before dropping its singleton in the
  `reset_all_singletons` autouse fixture, so the APScheduler thread does
  not emit to a closed pytest stderr sink during teardown. Fixes the
  "ValueError: I/O operation on closed file." failure on "All Pytest
  Tests + Coverage" that this PR's expanded test count reliably reproduces.

* fix(migration): NFKC-normalize name_lower; highest-quality wins dedupe

The migration backfill and the filter's cache-write paths previously
used bare str.lower() while the reference-DB scoring.normalize_name()
uses NFKC+lower+strip. For names with Unicode compatibility characters
(e.g. "Physics Letters TM"), these produce different name_lower values,
causing silent cache misses and — when a normalized form ever meets a
bare form — UNIQUE-constraint violations that would abort the upgrade.

Also fixes the dedupe tiebreaker: previously picked lowest-id (first-
writer-wins), which can discard a quality=9 LLM verdict in favor of an
older quality=5 row. Now sorts by -quality (highest first), then id ASC.

Changes:
- migration 0006: import unicodedata; NFKC-normalize the dedupe grouping
  key and backfill expression; select quality column and rewrite dedupe
  sort to prefer highest-quality row with lowest-id tiebreaker.
- filter: import normalize_name from journal_quality.scoring; replace
  three call sites of name.lower() in the cache-write path.
- tests: flip assertion in existing dedupe-collision test (now verifies
  highest quality wins); add NFKC roundtrip test, NFKC-variant dedupe
  test, downgrade-preserves-data test, and filter NFKC-import guard.

* fix(schema): align paper_id + research_resources indexes across migration and model

The model declared index=True on PaperAppearance.paper_id (citation.py:159)
but the migration never called op.create_index for it, so alembic-upgrade
paths had no index while create_all paths did. Similarly, research_id
had the opposite asymmetry: migration created ix_research_resources_research_id
but the model avoided index=True with a stale comment, leaving create_all
paths without the index. Result: 20+ call sites filtering by research_id
ran full-table scans on fresh installs / test fixtures.

Changes:
- migration 0006: add explicit op.create_index for
  ix_paper_appearances_paper_id with _index_exists idempotency guard
- research.py: replace stale comment with __table_args__ that declares
  Index("ix_research_resources_research_id", "research_id") so both paths
  produce the same named index
- tests: assert named paper_id index exists after migration; add
  create_all coverage test for research_resources index

* fix(dashboard): escape data source fields in renderSourcesBanner

Template interpolated s.name, s.url, s.license, s.license_url, and
s.dataset_url raw into innerHTML; s.description had only a partial
"<" escape. DataSource attribute values come from hardcoded Python
string literals today, so this is defense-in-depth rather than an
exploitable vuln — but any future DataSource subclass whose fields
originate from network or DB input would become a stored XSS vector.

Changes:
- Add safeHref() helper next to escHtml(): allowlists http(s):,
  mailto:, and rooted paths. .trim() + ^ anchor reject leading-
  whitespace javascript:/data: bypasses. Returns '#' on failure
  (never '#...' — fragment-injection vector).
- renderSourcesBanner: wrap text interpolations with escHtml(), URL
  interpolations with safeHref(). Drop the intermediate `desc`
  variable and its incomplete .replace(/</g, '&lt;') — escHtml()
  handles all five dangerous characters.
- Add a function-level comment establishing the invariant: every s.*
  field MUST go through escHtml or safeHref.

Also documents why the IntegrityError retry branch in
research_sources_service (lines 228-268) is not unit-tested: a
mock-based approach hits PendingRollbackError before the retry runs,
because SQLAlchemy savepoint rollback does not fully reset session
state after a constraint violation. A real concurrency test would
need threading infrastructure that does not exist in this suite.

* docs(code): annotate known-deferred issues at their sites

Adds KNOWN-DEFERRED comments at each site that the 5-round review
flagged as lower-priority, so future reviewers understand the
reasoning instead of re-investigating:

- metrics_routes.py: unbounded SELECT DISTINCT container_title (reject
  .limit because it silently undercounts predatory journals); MAX
  journal_quality aggregation semantics (stability-over-freshness by
  design, not a stale-score bug); DEBUG log left in during development.
- citation.py: doi String(255) length rationale (CrossRef recommends
  <=200; pathological >2000 chars fails insert rather than corrupts);
  source_engine retained for future per-engine analytics; resource_id
  UNIQUE semantics (one resource → one paper, intentional).
- journal.py: name index=True redundant with unique=True, deferred;
  name_lower index=True redundant with UNIQUE constraint, deferred;
  score_source always "llm" today, retained for future multi-source.
- journal_quality/models.py: quartile index=True unused today;
  Institution.impact_factor always NULL from OpenAlex.
- 0006 downgrade: uq_journals_name_lower not explicitly dropped —
  SQLite batch_alter_table rebuilds the table anyway; Postgres would
  need drop_constraint, tracked as portability follow-up.
- constants.py: invariant that score 9 is intentionally absent from
  VALID_QUALITY_SCORES, paired with a matching note on the dead
  branch in search_utilities._format_quality_tag.
- sidebar.html: aria-label accessibility TODO; added aria-hidden on
  the icon so this commit actually improves screen-reader output.
- docs/journal-quality.md: 212K vs 280K number reconciliation note.

* test(migration): update rebuild data-preservation test for NFKC backfill

The migration's Step 3 backfill now uses NFKC + lower + strip (see
0006_journal_quality_system.py and f6cb349a0). The existing test
asserted row.name_lower == seed_name.lower(), which is bare
lowercase and left surrounding whitespace intact — assertion held
for the old buggy behavior.

Add a _expected_name_lower helper that mirrors the migration's
backfill expression so the assertion locks in NFKC semantics rather
than bare .lower(). This is the same invariant tested in
test_migration_0006.py::test_backfill_nfkc_roundtrip at a different
granularity (100 mixed-Unicode rows through the full migration
chain, not a single row through step-3 alone).

* test(openalex): expect None for missing venue in _format_work_preview

The "unknown" sentinel is intentionally stripped at the engine boundary
so it never reaches the citation normalizer or matches a real OpenAlex
source named "unknown" (Q1, h_index=5). Tests were stale — update both
to match the documented contract.

* fix(nasa-ads): preserve "Last, First" author pairs through to CSL normalizer

NASA ADS returns each name as "Last, First". The previous code
comma-joined them for display, then citation_normalizer split that
string back on commas — turning two authors into four literal
singletons. Add a structured authors_csl field at the engine
boundary and have normalize_citation prefer it over the display
string fallback.

* fix(institutions): skip malformed JSON lines instead of aborting fetch

Mirrors the openalex.py pattern: a single bad line in any partition
must not kill the whole monthly rebuild. Wrap json.loads in
try/except (json.JSONDecodeError, ValueError); count + log first 10
malformed lines, suppress further warnings; the existing
_MIN_INSTITUTIONS floor still aborts if too many records were lost.

* fix(metrics): return 400 for non-integer page/per_page params

Previously a query like ``?page=abc`` raised ValueError out of the
``int(...)`` calls, which the broad outer except caught and turned
into a generic 500. Wrap the conversion in a narrow try/except so
client mistakes surface as 400 (Bad Request) with a clear message,
and keep the outer 500 path for genuine internal errors.

* fix(institutions): NFKC-normalize names in build and lookup paths

Canonical name normalization is normalize_name (NFKC + lower + strip)
in journal_quality/scoring.py — used for sources, predatory tables,
and abbreviations. Institutions diverged: bare .lower().strip() was
applied symmetrically on both writer and reader sides, so lookups
worked for ASCII but Unicode-equivalent inputs (ligatures, fullwidth,
NFKD-decomposed accents) silently missed across the index.

Replace the bare normalization at every institution writer/reader
site with normalize_name() to match the canonical contract. Snapshot
rebuild on next data download will re-normalize stored name_lower
values; intermediate lookups remain symmetric.

* test(quality): make test_orm_imports_used assert; clarify mock test docstring

test_orm_imports_used previously only printed a count and never
asserted — a phantom test that could never fail. Add a sanity check
that DB-operation patterns still match anything, plus an 80% ratio
guard so a regression where files stop using the ORM would surface.

Also clarify test_save_research_sources_success: the 1:1 add-count
holds only for non-academic URLs (the test inputs). Academic sources
trigger a 3:1 add ratio (ResearchResource + Paper + PaperAppearance);
that path is integration-tested in test_paper_dedup_integration.py.

* fix(ui): use local escape helpers consistently in details.js and journal_quality.html

details.js defines escapeHtml/escapeHtmlFallback as a closure at the
top of the file, then ignores it 130 lines down by using
``window.escapeHtml ? window.escapeHtml(x) : x`` ternaries. The
intent was a fallback when the global helper hasn't loaded — but the
local closure already provides that fallback, so the ternary's
else-branch silently emits unescaped HTML when window.escapeHtml is
missing. Switch to the local escapeHtml so escaping is unconditional.

journal_quality.html: ``${t.label}`` interpolated into innerHTML
without escHtml. Numeric today, but the explicit escHtml(String(...))
contract guards against future API changes that emit a string field
under the same name.

* chore(ci): align journal-data-integration action pins with rest of repo

The new workflow pinned harden-runner@v2.16.0 and setup-pdm@v4.4
while every other workflow in the repo uses v2.17.0 / v4.5. Align
both pins so the audit trail across the 50+ workflows stays
consistent and the new workflow picks up the same upstream fixes.

* fix(quality): narrow LLM exception handling and add predatory min-record floor

__llm_clean_journal_name caught bare Exception and logged at DEBUG —
silently absorbed every failure including programming errors that
deserve a stack trace. Narrow to the recoverable network/parse
errors (ConnectionError, TimeoutError, ValueError) and surface them
at WARNING so they're visible during triage. Log the exception class
name only (not the message) to satisfy the sensitive-logging hook.

Predatory data source previously wrote whatever it fetched, even if
the upstream returned 0 rows on 2 of the 3 CSVs. That silently
disabled predatory filtering for everyone. Add a 100-entry floor
that raises before overwriting the on-disk snapshot — the previous
good build stays in place when the upstream is partially broken.

* feat(papers): promote publication year to indexed first-class column

Year is a natural filter/group axis for the journal dashboard —
"papers in journal X from 2020-2024" — but living inside the
metadata JSON blob meant every such query paid for json_extract on
every row and could not use an index.

Migration 0006: papers.year INTEGER NULL + idx_papers_year added at
table-creation time. No in-place upgrade branch for pre-release
installs — keeps the migration simple; a fresh install or clean
re-stamp reaches the right schema.

Model: Paper.year declared alongside the other indexed columns;
kept ALSO in paper_metadata JSON so the CSL-JSON blob stays
complete and existing JSON readers keep working.

Write path: save_research_sources now copies citation_fields["year"]
into indexed["year"] (column) while leaving the original in the
metadata blob. _merge_identifiers uses the same first-write-wins
semantics already applied to doi/arxiv_id/pmid.

Dashboard: per-research and user-aggregate journal endpoints now
return year_min/year_max per journal (MIN/MAX over Paper.year),
and the per-research table gains a "Years" column rendering
"2020–2024" or "2023" or "—".

* fix(search): run OpenAlex enrichment before preview filters so Tier 2 can use source_id

The JournalReputationFilter is registered as a preview filter on
every scientific engine (arxiv, pubmed, openalex, nasa_ads,
semantic_scholar) and uses result["openalex_source_id"] for Tier 2
journal lookups (filter.py:868). Previously
enrich_results_with_source_ids ran AFTER _get_full_content — after
the preview filters had already fired with empty source_ids. Tier 2
silently degraded to fragile name matching.

Move the enrichment step between _get_previews and the preview
filter loop so the field is populated by the time the filter reads
it. Non-scientific engines still skip the enrichment entirely.

* docs(filter): clarify __clean_journal_name is regex-only, not LLM (djpetti review)

The prior docstring read "Uses regex ... followed by JabRef
abbreviation expansion ... the expensive Tier 4 LLM result is cached
at the DB layer instead" which implied this method coordinates with
or includes the LLM path. It does not — __llm_clean_journal_name is
a separate salvage step invoked only when bundled tiers miss and
enable_llm_scoring is on.

Update the docstring to state explicitly: this method is regex-only
and returns unexpanded abbreviations / location suffixes unchanged;
the LLM path is separate and opt-in.

* refactor(data-sources): extract shared manifest iteration helper (djpetti review)

openalex.py and institutions.py both download OpenAlex S3 snapshots
and shared identical code for:
- manifest URL allowlist validation
- per-partition tmp-file download + cleanup lifecycle
- per-line malformed-JSON suppression (first-10 warnings + 1 notice)

Centralize in _openalex_common.py via ``validate_manifest_entries``
and ``iter_partitions`` so the two callers stay aligned (and can't
drift) on the lifecycle and suppression policies. Each caller still
owns its own record-handling logic, progress reporting, and per-
source floor checks — those have caller-specific state that doesn't
belong in the helper.

Adds tests/journal_quality/test_openalex_common.py covering the
helper directly (allowlist accept/reject, per-partition yields,
tmp cleanup on happy path AND on exception, malformed-line
suppression).

* docs(quality): record design decisions for predatory threshold and CodeQL-reviewed sites

Three places attracted repeat attention during PR #3081 review but
landed with "keep as-is" decisions. Drop a comment at each site so
future reviewers (human or AI) don't re-derive the same conclusion.

1. PREDATORY_WHITELIST_HINDEX (constants.py): h-index is not an
   evidence-based predatory signal per mBio 2019 / PMC 2020.
   Tuning the `>` / 10 boundary changes behavior only at the
   boundary and has no literature support. Real improvement is
   more signals (JCR, OASPA), not this constant.

2. _normalize_doi (openalex_enrichment.py): the anchored
   ``startswith`` pattern is the CodeQL-recommended mitigation
   for py/incomplete-url-substring-sanitization. A prior bot
   comment (alert 7635) against an older snapshot is no longer
   raised; refactoring to bare-first is equivalent for every
   URL shape OpenAlex actually returns.

3. Journal-download success response (metrics_routes.py):
   ``message`` is trace-free by construction (downloader.py
   guarantees class-name-only for exception derivatives). CodeQL
   alerts 7650/7684 cited by a stale bot comment are no longer
   raised; replacing with a fixed literal would regress the
   dashboard popup which renders the per-source counts verbatim.

* docs(ci): document why journal_quality is in check-file-writes allowlist

Adds a block comment above the allowlist regex explaining that each
entry writes to disk without encryption by design, what kind of data
it writes, and the rule for adding new entries (public data, not
user-specific, justification required).

* refactor(citation): drop Paper.journal_quality; resolve quality live

A frozen per-paper Tier 4 score creates a real staleness footgun: if
the LLM re-scores a journal later (new model, manual override, bug
fix) the per-paper snapshot goes stale and the only way to fix it is
to delete and re-ingest.

Resolve current quality live in the dashboard:
- Tier 4: batch-look up the user's journals.quality by NFKC-normalized
  container_title after the container_title GROUP BY aggregation.
- Tier 1-3: bundled reference DB (unchanged path).

The papers table is brand new in migration 0006, so we remove the
column from the migration and model rather than creating it and
dropping it later. Inline comments in both files document the
deliberate absence so the column isn't re-introduced.

User-visible behavior: unchanged. The UI only ever shows a single
resolved quality — it never distinguished frozen vs live. Responses
still emit "quality" and "score_source" with labels llm/openalex/doaj.

Tests: removes three Paper.journal_quality persistence tests in
TestMergeIdentifiersJournalColumns (first-write-wins no longer
applies to a column that doesn't exist), renames the column-nullable
migration test to test_container_title_nullable, and adds a
test_papers_has_no_journal_quality_column regression guard.

---------

Co-authored-by: Daniel Petti <djpetti@gmail.com>
2026-04-20 23:28:03 +02:00
LearningCircuit
b516e5fe34 refactor: delete 6 dead files + 17 test files in advanced_search_system (#3184)
Verified via codebase-wide grep (zero production imports for each):

Source files deleted:
- query_generation/adaptive_query_generator.py - orphaned query generator
- source_management/diversity_manager.py - orphaned diversity system
- search_optimization/cross_constraint_manager.py - orphaned clustering
- constraint_checking/intelligent_constraint_relaxer.py - orphaned relaxer
- evidence/requirements.py - exported but never used
- answer_decoding/browsecomp_answer_decoder.py - exported but never instantiated

Also deleted 17 corresponding test files and updated __init__.py exports.
2026-04-19 12:07:44 +02:00
LearningCircuit
f4fad9196c refactor: delete dead entity_aware_source_strategy + clean stale conftest (#3205)
* refactor: delete dead entity_aware_source_strategy + clean stale conftest entries

Verified: EntityAwareSourceStrategy has zero production usage - not in
search_system_factory.py, not in strategies/__init__.py, not imported
by any other strategy. Only referenced in source_based_strategy.py
docstring comments.

Also cleaned 4 stale entries from tests/strategies/conftest.py
STRATEGY_IMPORTS list for strategies already deleted or being deleted.

* docs(notes): rewrite pr-3205 notes — reference git, don't duplicate

Notes are commentary on the code that lives in git, not a mirror of it.
Drop the verbatim prompt blocks and the NER code snippet; keep a
short prose summary per novel idea plus a pointer to PR #3205 for
the pre-deletion code.

Net effect: LOC down, density up. Someone who wants the exact
EntityAwareQuestionGenerator prompts can `git show 032b22232^:src/...`
or read the PR diff.
2026-04-19 11:01:30 +02:00
LearningCircuit
3c66fa0ec3 feat: add strategy-deletion documentation hook (#3529)
* feat: add strategy-deletion documentation hook

Any commit that deletes a .py file under
src/local_deep_research/advanced_search_system/strategies/ now requires
adding or updating a .md file under docs/strategies/deleted/ in the
same commit. This preserves novel prompts, heuristics, and thresholds
before they disappear from the living tree.

The hook exempts __init__.py and base_strategy.py (infra, not
strategies) and reads git diff --cached directly so it catches
deletions (pre-commit's default file list omits them).

docs/strategies/deleted/README.md explains the convention and
includes a file template. Existing deleted strategies aren't
retroactively flagged — the hook is forward-looking.

* refactor(hook): broaden scope to entire advanced_search_system tree

The hook now triggers on deletions of any .py file under
src/local_deep_research/advanced_search_system/ — not just under
strategies/. Question generators, constraint checkers, filters,
candidate explorers, and other components under that tree also carry
novel ideas worth documenting before deletion.

Exempt list narrowed to __init__.py aggregators only. base_*.py files
are NOT exempt: deleting a base class is a significant refactor that
deserves a notes file.

README updated to reflect the broader scope.

* refactor(hook): handle rename-out-of-scope, case-insensitive exempt, inline template

Three improvements based on AI code review on PR #3529:

1. Rename-out-of-scope is now treated as a deletion. `git mv` on a
   strategy/question-generator out of src/local_deep_research/advanced_search_system/
   removes it from the tracked module even though the file survives
   elsewhere; the hook now catches that case. Renames *within* the scope
   are legitimate refactors and continue to pass. Copies (C status) are
   also handled cleanly — a copy leaves the original, so it doesn't
   count as a deletion.

2. The exempt check now lowercases filenames, defending against case-
   insensitive filesystems (macOS, Windows) where __Init__.py would
   otherwise slip past.

3. The blocking error message now prints the full notes-file template
   and a short checklist of what "Novel ideas preserved here" should
   contain. Previously the hook said "see README"; now the developer
   can copy the skeleton directly from the terminal output and start
   filling it in without context-switching.

No behaviour change for the common path (simple deletion of a .py file
in scope without a notes file still blocks with the same exit code).

* docs(hook): notes should reference git, not duplicate it

Rewrite the README and the inline error-message template so the notes
convention is clearly "commentary on the code in git" — not a mirror
of it.

Before: authors were asked to paste verbatim prompts, numeric
constants, hardcoded lists, and heuristic recipes into the notes
file. That re-hosts what git already stores permanently and makes the
notes files long and tedious.

After: each novelty bullet is 1-2 sentences explaining what the
component did that was different from the successor, why the
difference was interesting, and whether it was validated. Readers
who want the exact prompt follow the deletion PR link or
`git show <sha>:<path>`.

The hook template and error message both explicitly warn against
pasting code blocks. The README rewritten around the "reference,
don't duplicate" principle with a worked example of the intended
shape.
2026-04-19 10:50:38 +02:00
LearningCircuit
bab0f61b66 chore(hooks): require UtcDateTime in migrations too (#3523)
Tighten check-datetime-timezone so the UtcDateTime rule applies to
both models and migrations. Supersedes the inverted approach in #3515,
which tried to accept sa.DateTime(timezone=True) inside migrations.

- Rewrite the AST walker: handle sa.Column / bare Column, positional
  type arg at any index, bare Column(UtcDateTime) without parens (the
  hook's own example), and ast.IfExp with both branches inspected
  independently so a violation in either arm is still flagged.
- Anchor the path filter on src/local_deep_research/ to stop
  false-positives on tests/database/models/ and partial-name matches
  like database/models_backup/.
- Update .pre-commit-config.yaml name/description and the stale
  CI_CD_INFRASTRUCTURE.md hook table entry.
- Add tests/hooks/test_check_datetime_timezone.py with 20 cases:
  violations (models / migrations / conditional types / batch runs /
  bare names), allows (UtcDateTime with import, combo import order,
  empty / syntax-error files), and path-filter boundaries.
2026-04-18 21:47:17 +02:00
LearningCircuit
285eb07fb7 fix(journal-reputation): sync stale threshold default 42 (#3524)
Two sites still document / read the legacy default of `4` even though
the authoritative default in `src/local_deep_research/defaults/
default_settings.json` has been `2` since the journal-quality
redesign (PR #3081 family) lowered it.

- `docs/CONFIGURATION.md:534`: table cell documented default `4`;
  corrected to `2` and added the "drops predatory (score 1) only"
  note already used in `docs/journal-quality.md` and the JSON
  description.
- `advanced_search_system/filters/journal_reputation_filter.py:72`:
  `get_setting_from_snapshot("search.journal_reputation.threshold",
  4, ...)` — the fallback is effectively unreachable in production
  (settings are seeded from `default_settings.json` on first-run),
  but the mismatch was misleading to readers and would silently
  change filter behavior for any caller that bypasses the snapshot.
2026-04-18 21:38:47 +02:00
LearningCircuit
ce0fdf2fdd chore(python): bump supported floor from 3.11 to 3.12 (#3518)
## Root cause of PR #3480 failure

The weekly PDM update bot (`update-dependencies.yml`) ran on Python 3.x
(latest, currently 3.13/3.14) while the project declared
`requires-python = ">=3.11,<3.15"`. PDM's resolver evaluates candidates
against the interpreter it's running on, not the project's
`requires-python` floor. That let the bot pick packages that recently
dropped 3.11 support:

- arxiv 3.0.0 (requires >=3.10, breaks on 3.11 install attempt)
- rich 15.0.0 (requires >=3.9.0 per new metadata)
- virtualenv 21.2.4 (dropped 3.11)
- importlib-resources 7.1.0 (requires >=3.10)

The resulting `pdm.lock` was valid on 3.13 but would fail to install on
3.11/3.12, so a downstream `pdm lock --check` caught the mismatch and
the bot PR needed a manual `pdm lock` follow-up commit.

A prior attempt (PR #3507) tried to patch this with `pdm lock --refresh`
in the bot — that only rewrites hashes; it can't un-pick packages that
violate the floor. The real fix is to align the resolver's interpreter
with the `requires-python` floor.

## What this PR does

1. **Raises the floor to 3.12** in `pyproject.toml` (`requires-python`,
   `[tool.mypy] python_version`). Python 3.11 goes EOL Oct 2027 and
   ecosystem packages are already dropping it; 3.12 has the largest
   PyPI install share (~30%) and upstream support through Oct 2028.
2. **Pins the bot runner to '3.12'** (was `3.x`) — resolver now runs at
   the floor, guaranteeing chosen versions install across the whole
   supported range.
3. **Bumps all other CI workflows from 3.11 → 3.12** so they stay at or
   above the new floor (17 workflows).
4. **Regenerates `pdm.lock`** under Python 3.12 — this naturally drops
   pins of packages whose new versions require >3.11. Net: 1003 lines
   removed (no more 3.11 wheel entries).
5. **Updates docs**: `docs/developing.md` prereq, `docs/SQLCIPHER_INSTALL.md`
   Dockerfile snippet.

## Breaking change

Users on Python 3.11 can no longer `pip install local-deep-research`.
Python 3.11 users should upgrade to 3.12+ before taking future releases.

## Replaces

Closes #3507 (the `pdm lock --refresh` band-aid).
2026-04-18 13:48:26 +02:00
LearningCircuit
d18887df24 fix(auth): atomic post-login settings + regression test, supersedes #3487 (#3502)
* fix(auth): atomic settings reload + app.version update on login

Previously, the post-login settings-version-mismatch path committed
twice: once after load_from_defaults_file() wrote ~498 default
setting rows, and again after update_db_version() wrote the
app.version marker. app.version is NOT in default_settings.json —
it is only ever written by update_db_version(). Any failure between
the two commits (crash, lock timeout, engine dispose mid-transaction)
left app.version unwritten, so db_version_matches_package() kept
returning False and every subsequent login re-ran the 498-row bulk
insert. This is the "sticky loop" that made container restarts
ineffective for the reported login-hang-after-idle symptom.

Changes:

1. SettingsManager.update_db_version now accepts commit=True
   (default, backward-compatible). Passing commit=False stages
   the version row in the session but does not commit, so the
   caller can combine it with other writes into one atomic
   transaction.

2. _perform_post_login_tasks step 1 now uses that flag to run
   load_from_defaults_file + update_db_version in a single
   session.commit() at the end. Either both persist or neither
   does — no more partial state.

Test plan:
- Existing test_update_db_version tests still pass (default
  commit=True preserves the old behaviour).
- New test_update_db_version_commit_false verifies that passing
  commit=False stages the row but does not call session.commit().

Part of the login-hang series. Independent of the other PRs.

* test(auth): lock in post-login atomicity + dispose-survival invariants

Follow-on to the atomic settings reload in the previous commit. Three
load-bearing properties are now guarded by regression tests and in-code
invariants:

1. Mid-write failure rolls back to a clean pre-write state — the next
   login retries fresh instead of entering the sticky loop that PR
   #3487 tried to prevent with a speculative dispose skip guard.
2. Happy-path atomic block restores both defaults and `app.version`
   together.
3. `engine.dispose()` does NOT break a thread holding a checked-out
   connection — SA 2.0's documented contract (`QueuePool.dispose`
   drains only idle entries, `Engine.dispose` calls `pool.recreate()`).
   20-iteration stress test against a real SQLCipher+WAL engine.

Also:

- Strengthened the comment on the post-login atomic block
  (`routes.py`) as an explicit ATOMICITY INVARIANT: splitting into
  two commits regresses to the sticky loop.
- Documented the caller contract for `load_from_defaults_file` and
  `update_db_version` (`settings/manager.py`): pass `commit=False`
  and own the terminal commit yourself.
- Rewrote the dispose-loop comment in `connection_cleanup.py` to
  record the SA 2.0 safety argument, so nobody re-adds a
  `checkedout() > 0` skip guard without a real reproducer (see PR
  #3487 discussion).
- Added ADR-0004 addendum summarising the PR #3487 investigation and
  pointing at the regression guard.

No change to `connection_cleanup.py` logic — dispose remains
unconditional. Supersedes PR #3487.
2026-04-16 23:04:01 +02:00
LearningCircuit
bc3680d21c docs: update pool-sizing comments, FD calculations, and create ADR-0004 (#3477)
Follow-up to the NullPool removal in d796a240. Addresses stale
comments and documentation that still referenced the old pool_size=10,
max_overflow=20, per-thread NullPool engines, and dead-thread engine
sweeps — all removed by the refactor.

- encrypted_db.py: update __init__ comment block to reflect
  pool_size=20, max_overflow=40, pool_timeout=10 (60 total), and
  drop the inline NullPool reference at the session creation site.
- connection_cleanup.py: replace stale thread_engines metric with
  pool_checked_out in the "WHAT IT LOGS" documentation block.
- processor_v2.py: rewrite NullPool reference in daemon cleanup
  comment to describe the QueuePool session return pattern.
- architecture.md: update FD budget table (21→41 steady, 81→121
  peak), remove NullPool row, fix Mermaid flowchart node.
- troubleshooting.md: update FD formula (81→121), replace dead-thread
  engine sweep description with periodic pool dispose.
- Create docs/decisions/0004-nullpool-for-sqlcipher.md — the 5
  existing ADR-0004 references across the codebase pointed to a
  non-existent file; now they resolve.
2026-04-14 21:09:42 +02:00
LearningCircuit
37a87297c3 docs: fix stale pool-size comments and NullPool references after #3441 (#3462)
PR #3441 removed per-thread NullPool engines and changed pool_size
from 10→20 / max_overflow from 20→40, but several comments and docs
still referenced the old values and removed infrastructure.

- Update pool_size/max_overflow numbers in encrypted_db.py comments
- Remove dead ADR-0004 path reference (file never existed)
- Remove redundant has_per_database_salt() warning that fired on
  every cache-hit call (open_user_database already covers cache miss)
- Fix NullPool reference in processor_v2.py comment
- Fix stale thread_engines metric doc in connection_cleanup.py
- Fix stale dead-thread engine sweep comment in connection_cleanup.py
- Update architecture.md flowchart, FD budget table math (21→41,
  81→121), and key files table roles
- Update troubleshooting.md sweep description
2026-04-14 00:01:25 +02:00
LearningCircuit
8e11dcf729 refactor(db): remove per-thread NullPool engines to fix FD leak (#3441)
Previously DatabaseManager kept a dedicated per-(username, thread_id)
NullPool engine in `_thread_engines` for background-thread metric
writes, alongside the per-user QueuePool engine in `connections`.
Orphaned entries leaked SQLCipher+WAL file handles (3 FDs per active
connection) when @thread_cleanup did not fire, eventually exhausting
the 1024 FD soft limit and causing werkzeug's per-request selector to
fail on every request.

Route metric writes through the shared per-user QueuePool engine, which
is already created with check_same_thread=False and is safe to use
from background threads. FD usage is now bounded by
pool_size + max_overflow per user instead of scaling with background
thread count.

Also:
- Bump pool_size=20, max_overflow=40, add pool_timeout=10 to absorb
  concurrent research + HTTP + metric writers against the shared pool.
- Add pool_checked_out observability to the periodic Resource monitor.
- Delete ~200 lines of thread-engine bookkeeping:
  cleanup_thread_engines, cleanup_dead_thread_engines,
  maybe_sweep_dead_engines, cleanup_all_thread_engines, _sweep_lock,
  _last_sweep_time, _thread_engine_lock, _thread_engines.
- Force QueuePool on the SQLCipher integration-test fixture so
  concurrent-write tests exercise real pooling (not StaticPool).
- Update docs/architecture.md and web/database/README.md.

Known follow-up: parallel_constrained_strategy.py uses max_workers=100
which could spike pool pressure under worst-case load; sessions are
short-lived so sustained contention is unlikely, and pool_timeout=10
will surface it as errors rather than deadlock.

1996 passed, 8 skipped across tests/database and tests/web/auth.
2026-04-12 20:02:22 +02:00
LearningCircuit
061cd83dd4 feat: add is_lexical flag to auto-enable LLM relevance filtering for keyword-based engines (#3403)
* feat: add needs_reranking flag to auto-enable LLM relevance filtering for keyword-based engines

Engines with poor native relevance ranking (arXiv, PubMed, Wikipedia,
GitHub, Mojeek, etc.) now auto-enable LLM-based result filtering via
a new `needs_reranking` class attribute. This fixes the priority bug
where the global `skip_relevance_filter=True` incorrectly overrode
auto-detection for engines that genuinely need filtering.

Priority is now: per-engine setting > needs_reranking > global skip.
The global skip only affects unclassified engines.

Closes #2297

* fix: address 7 code-review issues on needs_reranking branch

1. Rename needs_reranking → needs_llm_relevance_filter for consistency
   with enable_llm_relevance_filter and skip_relevance_filter naming
2. Fix Paperless dead code: replace non-existent _apply_content_filters
   with proper _filter_for_relevance() call in custom run() override
3. Fix misleading skip_relevance_filter description to accurately
   reflect checkbox behavior and keyword engine exceptions
4. Delete 4 vacuously-true inline tests that duplicated factory logic
   instead of calling the real factory (coverage tests already exist)
5. Add needs_llm_relevance_filter to EXTENDING.md and OVERVIEW.md
6. Clarify is_generic comment: generic does not imply good ranking
7. Upgrade no-LLM log from debug to warning when filtering was
   requested but no LLM is available (with should_filter guard)

* fix: remove Paperless fallback that overrode valid empty LLM filter results

Replace the fallback that restored all previews when the LLM filter
returned empty with an info log. The base class _filter_for_relevance()
already handles errors internally (returns previews[:5] on exception
or JSON parse failure). An empty result means the LLM legitimately
found nothing relevant — trust it, don't override it.

* refactor: rename needs_llm_relevance_filter → is_lexical

The flag describes what the engine IS (lexical/keyword-based search)
rather than what it needs. This is a general classification that can
drive multiple behaviors beyond just the relevance filter — e.g.
query optimization strategies, result deduplication, or UI hints.
Matches the existing is_* naming pattern (is_scientific, is_generic).

* Revert "refactor: rename needs_llm_relevance_filter → is_lexical"

This reverts commit c322d478a1.

* Reapply "refactor: rename needs_llm_relevance_filter → is_lexical"

This reverts commit 853dfe90bd.

* feat: add is_lexical classification flag alongside needs_llm_relevance_filter

Separates classification from behavior:
- is_lexical: informational flag indicating the engine uses keyword/lexical
  search. Reusable for query optimization, UI hints, deduplication, etc.
- needs_llm_relevance_filter: behavioral flag that the factory reads to
  auto-enable LLM relevance filtering on the engine instance.

Both flags are set on all 15 keyword-based engines. The factory only
checks needs_llm_relevance_filter for filtering decisions.

* fix: improve relevance filter error handling and logging

- Return [] on all error paths instead of hiding failures behind
  previews[:5] fallback — failures should be visible, not masked
- Log errors at error level (not warning) for LLM parse failures
- Add engine name prefix to all log messages for traceability
- Add token estimate debug log to help diagnose context overflow
- Reduce log noise: routine operations are debug, only summary is info
- Consolidate validation into single check

* fix: address PR review findings for relevance filter

- Fix literal \n in EXTENDING.md code block
- Remove 'Maximum results to return' from LLM prompt (LLM decides)
- Add INPUT/KEPT/REMOVED debug logging for filter quality analysis
- Add is_lexical + needs_llm_relevance_filter to ElasticsearchSearchEngine
- Delete vacuously-true test_missing_llm_returns_none test
- Downgrade no-op skip_relevance_filter log from info to debug

* refactor: extract relevance filter into dedicated module

Pull the inline _filter_for_relevance() logic out of BaseSearchEngine
into a new web_search_engines/relevance_filter.py module.

- Use with_structured_output() with Pydantic schema; let LangChain
  pick the per-provider default method (JSON schema on Ollama,
  tool-calling on Anthropic, responseSchema on Gemini).
- Trim prompt: drop URLs, cap snippets at 200 chars.
- Suppress reasoning on Ollama thinking-by-default models via
  reasoning=False — saves 30-60s per call on qwen3 dense variants.
- Treat empty LLM responses as valid judgments; log a warning on
  batches >2 so users notice a misbehaving model.
- On exception or parse failure, return first N previews (cap=5 or
  max_filtered_results) to avoid overwhelming downstream.

* refactor(relevance_filter): cleanup + add direct tests

* feat(relevance_filter): batch previews in parallel for speed and reliability

Adds two tunable parameters to the LLM relevance filter:

- batch_size: split previews into chunks before sending to the LLM.
  Each batch uses local indices [0..batch_size-1] mapped back to
  global. Default 10. Smaller batches are faster per call AND more
  reliable on weaker models that struggle with many indices in one
  context.

- max_parallel_batches: dispatch batches concurrently via a
  ThreadPoolExecutor. Default 4. Result order is preserved across
  parallel batches.

Both exposed as BaseSearchEngine class attributes
(relevance_filter_batch_size, relevance_filter_max_parallel_batches)
so individual engines can override.

Failure semantics:
- Hard exception on any batch -> capped slice fallback (unchanged).
- Parse failure on a single batch -> skip that batch only, keep
  results from successful batches.

Adds 4 direct unit tests covering chunk/index mapping, batch_size=None
single-call mode, failed-batch-skip-keeps-others, and parallel dispatch
order preservation. All 120 tests pass.

* refactor(relevance_filter): drop structured output, parse plain text

The Pydantic with_structured_output() path had several issues:
- qwen3 dense models returned prose instead of JSON, raising
  OutputParserException and disabling the filter for that call
- grammar-constrained output on Ollama was 6-10x slower than plain
  text generation (~24s vs ~4s for 50 previews)
- per-provider quirks (function_calling latency, schema bikeshedding)

Switch to plain llm.invoke() and parse integers from the response with
a tightened regex (word-boundary, no decimal fractions). The prompt
now instructs the model to output ONLY the indices, which combined
with the regex is robust against prose-injection of small numbers.

Removes RelevanceResult Pydantic class, _invoke_structured, the
_BATCH_FAILED_PARSE sentinel, and the "all batches failed" branch
(all dead under the new contract). Updates tests to mock llm.invoke
directly. Tightens default batch_size to 5 and parallel batches to 10
based on benchmark runs against Ollama.

* docs: fix stale _filter_for_relevance docstring after text-parsing rewrite
2026-04-06 23:04:47 +02:00
LearningCircuit
83f632e069 fix: treat empty environment variables as unset to fix provider selection (#3362)
* fix: treat empty environment variables as unset to fix provider selection

When deploying via Docker/Unraid templates, all environment variables
are created even when left blank (e.g. LDR_LLM_ANTHROPIC_API_KEY="").
The check_env_setting() function previously treated these empty strings
as valid overrides, which caused provider settings to be blanked out
and prevented proper provider selection on fresh installs.

Empty env vars are now treated as unset, allowing database defaults to
take effect normally.

Fixes #3339

* fix(tests): update test to match empty env var behavior

Update test_env_override_empty_string to assert that empty environment
variables are treated as unset (returning DB value) rather than
overriding with empty string. This aligns with the fix for #3339.

* docs: add ecosystem context for empty env var handling decision

Document that treating empty environment variables as unset is standard
practice across major projects (botocore, viper, Turborepo, Go stdlib,
Docker Compose) with references to the PR discussion.

* feat: add warning log for empty env vars, fix references, add tests and docs

- Log warning when empty env vars are detected (helps users diagnose
  Unraid/Docker template issues)
- Replace misleading viper/Docker Compose references with CPython
  official docs and Pallets/Click PR #2223
- Add unit tests: empty string returns None, warning is logged,
  provider/model/multiple keys handled
- Add integration tests: empty string with no DB value, checkbox,
  number settings
- Document empty env var behavior in unraid.md, docker-compose-guide.md,
  and env_configuration.md

* docs: recommend DISABLED instead of Web UI for blocking settings

Users can set env vars to a non-empty invalid value like "DISABLED"
to explicitly block a key, which is simpler than navigating the UI.
2026-04-05 12:19:44 +02:00
github-actions[bot]
0ad4529b7e chore: auto-bump version to 1.5.6 (#3364)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-04-04 14:13:57 +02:00
github-actions[bot]
7131d82596 chore: auto-bump version to 1.5.3 (#3345)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-04-01 20:32:38 +02:00
github-actions[bot]
69bd0c67de chore: auto-bump version to 1.5.2 (#3333)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-04-01 02:20:27 +02:00
github-actions[bot]
b608714698 chore: auto-bump version to 1.5.1 (#3320)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-03-30 22:53:33 +02:00
github-actions[bot]
1cd3f18250 chore: auto-bump version to 1.5.0 (#3071)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-03-30 14:56:51 +02:00
LearningCircuit
75467eee13 docs: ADR-0003 reject universal raise-without-from enforcement (#3266)
* docs: add ADR-0003 rejecting universal raise-without-from enforcement

Document the decision to reject PR #3225's check-raise-without-from hook.
Enforcing raise...from e everywhere conflicts with the codebase's existing
PII protection strategy: check-sensitive-logging and fix-exception-logging
hooks prevent exception chain leakage, and several files intentionally
break chains when wrapping user-facing exceptions.

* docs: add exception handling policy to pre-commit hooks directory

Quick-reference guide for developers working near the hooks, covering
when to use raise...from e, from None, or omit chaining entirely.
Cross-references ADR-0003.

* docs: add ADR-0003 references to exception logging hooks

Add inline notes to check-sensitive-logging.py and fix-exception-logging.py
explaining why raise...from e is not enforced universally — exception chains
would re-expose the PII these hooks are designed to strip.

* docs: remove redundant EXCEPTION_POLICY.md

The inline docstring comments in check-sensitive-logging.py and
fix-exception-logging.py plus ADR-0003 already cover this. No other
hooks have companion markdown files in .pre-commit-hooks/.
2026-03-28 15:15:39 +01:00
LearningCircuit
81a5498e77 docs: add ADR-0002 documenting pre-commit hook review decisions (#3251)
* docs: add ADR-0002 documenting pre-commit hook review decisions

Document the batch review of 9 pre-commit hook PRs (#3218-#3231):
4 accepted, 5 rejected with specific technical rationale.

Add rejected-hooks comment block to .pre-commit-config.yaml linking
to the ADR, preventing re-proposals without new information.

* docs: add Related PRs table to ADR-0002 for quick navigation

* fix(docs): address review findings in ADR-0002

- Fix hook count: 43 (not 38), increases to 45 (not 41)
- Remove redundant date from title
- Move Related PRs to end of document (after Consequences)
- Fix principles: narrow file-scoped and non-duplicative rules,
  reframe AI-aware as identifier false-positive concern,
  add self-contained principle
- Make codespell#196 a proper hyperlink
- Backtick-quote raise...from as code
2026-03-28 13:00:37 +01:00
LearningCircuit
819fafe8c2 feat: add automatic database backup system (#3006)
* feat: add automatic database backup system with review fixes

Adds encrypted database backups triggered on login, based on PR #2565
with critical fixes from code review applied.

New backup module:
- BackupService: encrypted backups via sqlcipher_export(), atomic
  rename, per-user locking, disk space validation, backup verification
- BackupScheduler: singleton with ThreadPoolExecutor (max 2 workers),
  non-blocking background backup, atexit shutdown
- Configurable via settings: backup.enabled, backup.max_count (3),
  backup.max_age_days (7)

Review fixes applied (not in original PR):
- Add PRAGMA busy_timeout = 10000 to prevent instant failure on
  concurrent writer lock contention
- Use settings defaults (or 3/7) instead of raising ValueError when
  backup settings are missing (djpetti's review feedback)
- Integrate into _perform_post_login_tasks background thread pattern
- Add stale .tmp file cleanup in _cleanup_old_backups
- Fix stat() TOCTOU in cleanup loop with FileNotFoundError handling
- Enforce directory permissions with os.chmod after mkdir
- Use safe_close() instead of bare .close() in finally blocks
- Fix .gitignore to not ignore backup source code

Includes 94 tests (4523 lines) and security documentation.

* fix: update key derivation API and add crash recovery tests

- Replace _get_key_from_password (private, old 1-arg API) with
  get_key_from_password (public, with db_path for per-DB salt)
  to match current main's key derivation interface
- Add 3 end-to-end crash recovery tests using real SQLCipher:
  1. Full round-trip: backup, delete original, open backup, verify
     all rows and integrity_check pass
  2. Wrong password rejection: backup can't be decrypted with wrong key
  3. Encryption verification: backup file has no plaintext SQLite header
- Tests skip when SQLCipher is not installed (CI Docker image has it)

* feat: purge old-key backups on password change + 9 new tests

Security fix: after a password change, old backups remain encrypted
with the old (potentially compromised) password. Per NIST SP 800-57,
OWASP A02, and patterns from VeraCrypt/Bitwarden/Signal, old backups
should be purged and replaced with a fresh backup using the new key.

Changes:
- Add BackupService.purge_and_refresh() method that deletes all
  existing backups and creates a fresh one with the current password
- Integrate into change_password route (auth/routes.py)
- Add empty-file check to _verify_backup (0-byte files were passing)
- Add gitleaks allowlist entry for auth/routes.py

New tests (9):
- TestPasswordChangeBackupSecurity (3 real SQLCipher tests)
- TestBackupCorruptionDetection (3 real SQLCipher tests)
- TestBackupRetentionEnforcement (3 mocked tests)

* test: rewrite crash recovery test with correct SQLCipher connection API

Fixes from 6-agent verification round:
- Use create_sqlcipher_connection() instead of manual connect+key+pragmas
- Wrap wrong-password checks in pytest.raises around connection factory
- Add @pytest.mark.timeout(120) for CI stability
- Add encryption header check for fresh backup after purge_and_refresh
- Use inline patches, fix docstring step count

* test: add 15 more backup system tests

New test classes:
- TestBackupDiskSpaceAndAtomicity (3): missing source DB, atomic
  rename pattern, size_bytes accuracy
- TestBackupFilePermissionsExtended (1): backup file 0o600 mode
- TestPurgeAndRefreshEdgeCases (6): no existing backups, multiple
  old backups, .tmp cleanup, list ordering, get_latest edge cases
- TestBackupServiceInitValidation (3): boundary values for
  max_backups, max_age_days, empty username

* feat: reduce backup defaults and add pre-migration backup

- Change max_backups default from 3 to 2 and max_age_days from 7 to 2
  to reduce disk usage for databases with large PDF BLOBs while keeping
  a safety net against corruption overwriting the only backup.

- Add synchronous pre-migration backup in open_user_database() that
  triggers before Alembic migrations run. Only fires when
  needs_migration() returns True (version upgrades), not on every login.
  Backup failure is logged as error but does not block migration.

* fix: use get_setting default parameter for backup.enabled

The expression `sm.get_setting("backup.enabled") or True` always
evaluates to True (False or True == True), making it impossible for
users to disable backups. Use the get_setting default parameter
instead, which is the established pattern throughout the codebase.

* fix: address review findings from 6-round 30-agent review

Critical fixes:
- Fix _verify_backup() salt mismatch: pass db_path=self.db_path to
  set_sqlcipher_key so backup verification uses the correct per-database
  salt instead of the legacy salt. Without this, all v2 database backups
  fail verification and are silently deleted.
- Fix purge_and_refresh() race condition: hold per-user lock for the
  entire purge+create operation to prevent a concurrent backup from
  writing an old-key backup between purge and fresh backup creation.
- Fix DETACH not in finally: wrap DETACH DATABASE in its own finally
  block so the attached backup file is always released even if
  sqlcipher_export() raises. Remove no-op conn.commit() after DETACH.

Important fixes:
- Fix _cleanup_old_backups/list_backups/get_latest_backup TOCTOU: use
  safe_mtime helper that catches FileNotFoundError in sort key lambda.
- Fix list_backups timezone: use tz=UTC consistent with codebase.
- Fix get_backup_scheduler() thread safety: remove redundant module-
  level singleton; rely on thread-safe __new__.
- Fix docs: replace VACUUM INTO with sqlcipher_export() throughout.
- Fix test_no_raw_sql.py: add backup_service.py to skip list.
- Fix test readonly dir: skip when running as root in Docker.

* fix: address djpetti review + add 6 high-value backup tests

Review feedback (djpetti):
- Restore max_age_days default to 7 (2 days was too aggressive — a
  weekend gap would delete all backups)
- Replace `or 2`/`or 7` fallbacks with `get_setting(key, default)`
  which is the established codebase pattern (30+ uses)
- Keep max_backups=2 for disk space savings

New integration tests (real SQLCipher, in test_backup_crash_recovery_ci.py):
- test_backup_preserves_all_schema_objects: compare sqlite_master
- test_backup_passes_foreign_key_check: PRAGMA foreign_key_check
- test_restored_backup_accepts_new_writes: INSERT/UPDATE + durability

New unit tests (mocked, in test_backup_service.py):
- test_backup_created_when_migration_needed
- test_no_backup_when_no_migration_needed
- test_migration_proceeds_when_backup_raises

* feat: limit backups to one per calendar day to prevent corruption propagation

A corrupted database that overwrites all backups via rapid login cycles
is the primary risk for a 2-backup rotation. Now create_backup() skips
if a backup with today's date prefix already exists in the backup dir.

Exceptions that always create a backup regardless:
- Pre-migration backups (force=True) — schema changes are the highest
  risk moment and must always have a safety net
- purge_and_refresh() on password change — calls _create_backup_impl()
  directly, bypassing the daily check (security requirement)

* fix: sort daily backup glob + wrap DETACH in try/except

- Use max(existing_today, key=lambda p: p.name) instead of
  existing_today[0] for the daily backup limit check, since glob()
  returns results in arbitrary filesystem order.
- Wrap DETACH DATABASE in try/except inside the finally block to
  prevent masking the original sqlcipher_export exception if DETACH
  also fails.

* fix: check purge_and_refresh result instead of logging unconditional success

The return value of svc.purge_and_refresh() was discarded, so a failed
fresh backup after password change logged "Backups refreshed" falsely.
Now checks result.success and logs error if backup creation failed,
making it visible that the user has zero backups after purge.

* test: add daily backup limit tests + add missing warning log

New tests (TestDailyBackupLimit, 3 tests):
- test_skips_when_backup_exists_for_today: verify create_backup skips
  when a backup with today's date already exists
- test_force_bypasses_daily_limit: verify force=True enters
  _create_backup_impl even when today's backup exists
- test_proceeds_normally_for_different_day: verify yesterday's backup
  doesn't trigger the daily skip

Also: add logger.warning for failed .tmp file deletion in
purge_and_refresh (was silently swallowed with bare except pass).

* docs: add disk space warning and disable instructions to backup settings

Update backup.enabled description to mention disk usage and how to
disable. Update docs with clearer disk space guidance noting that
backups can be disabled via settings if space is limited.

* fix: reduce default max_backups from 2 to 1

Encrypted backups cannot be compressed (AES-256 has maximum entropy),
so each backup equals the full database size. With large databases
containing PDFs (100s of MB), keeping 2 backups doubles disk usage.

The daily backup limit already prevents the corruption-overwrite
scenario that was the original justification for 2 backups. Users
who want extra safety can increase max_backups in settings.

* feat: add backup status warnings to research page

Add two dismissable warnings to the existing warning system:
- "Database Backups Disabled" when backup.enabled is False
- "No Backups Found" when enabled but none exist yet

Uses the existing warning_checks infrastructure (yellow alert boxes
on the research page). Backup check uses a lightweight filesystem
glob — no password or encryption needed.

Removes flash-based approach from login (research page doesn't
render flash messages).
2026-03-26 10:52:29 +01:00
github-actions[bot]
0ea808fb04 chore: auto-bump version to 1.4.0 (#2714)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-03-25 23:01:03 +01:00
LearningCircuit
86898fb071 feat: give MCP agent control over sub-research iterations and search engine (#3067)
* feat: give MCP agent control over sub-research iterations and search engine

The MCP agent can now configure sub-research tools with optional parameters:

- **iterations** (1-15): Control how many search rounds the sub-strategy
  runs. Use fewer (2-3) for quick checks, more (10-15) for exhaustive
  research. Previously hardcoded to 8 for focused-iteration.

- **search_engine**: Override which search engine the sub-research uses.
  For example, the agent can now run `focused_research` with
  `search_engine: "pubmed"` to do deep iterative research specifically
  against PubMed, or `"arxiv"` for scientific papers. Previously,
  sub-research always used the primary configured engine (usually SearXNG),
  losing access to specialized databases.

This lets the agent make smarter delegation decisions, e.g.:
  ACTION: focused_research
  ARGUMENTS: {"query": "mRNA cancer vaccine clinical trials",
              "iterations": 5, "search_engine": "pubmed"}

The system prompt is updated to teach the agent about these options.
Override engines are properly cleaned up after use.

* refine: prefer more iterations over breadth, raise limit to 25

Updated system prompt and tool descriptions to guide the agent toward
using more iterations (10-20+) rather than broader queries. Raised the
iteration clamp from 15 to 25 to support exhaustive deep dives.

* docs: note web_search gives precise control over search queries

Highlight in the system prompt that web_search is also useful for
crafting specific search queries with exact phrases, date ranges, or
site filters — giving the agent more direct control over what reaches
the search engine.

* refactor: simplify MCP tools — web_search default, only focused_research for deep work

Removed source_research and quick_research tools from the MCP agent.
These were legacy strategies that added confusion without clear value:
- source_research (source-based) overlapped with focused_research
- quick_research (standard) was an old strategy with no clear advantage

The agent now has a cleaner toolset:
- web_search: primary tool for most queries, gives precise control
- focused_research: deep iterative research with configurable iterations
  and search_engine override (e.g. pubmed, arxiv)
- search_[engine]: quick single queries against specific databases
- download_content: fetch full article text

System prompt rewritten to emphasize web_search as the default action
and focused_research for complex topics needing depth.

* prompt: recommend specialized engines for domain-specific questions

Added explicit guidance in the system prompt mapping question domains
to the best engine: medical → pubmed, scientific → arxiv, background →
wikipedia, news → wikinews. This helps the agent pick the right tool
instead of defaulting to web_search for everything.

* fix: increase observation limits 10x so agent sees full sub-research output

OBSERVATION_MAX_LENGTH: 5000 → 50000
HISTORY_OBSERVATION_MAX_LENGTH: 1000 → 10000

The previous limits severely truncated focused_research output — the
agent would run 8+ iterations of deep PubMed research but only see
1000 chars of the result when synthesizing the final answer. Now the
agent retains much more context from sub-research for better synthesis.

* feat: enable semantic_scholar and openalex for auto search by default

Added use_in_auto_search=true to both settings files so the MCP agent
can use them as specialized search tools. Updated system prompt to
mention them alongside arxiv and pubmed.

* fix(test): update tool count after removing source_research and quick_research

* prompt: guide agent to do quick searches first before focused_research

* prompt: recommend search_arxiv as quick first step for scientific topics

* fix: address all 8 review rounds — prompt, tests, docs, limits, progress

- Rewrote system prompt: one clear decision table, no contradictions
- Fixed docstring (1-15 → 1-25), progress message shows actual engine
- Added comments explaining MCP is for large-context LLMs (32k+)
- Added 10 tests for iterations/search_engine/clamping/cleanup
- Updated docs/mcp-server.md for removed tools
2026-03-25 22:12:14 +01:00
LearningCircuit
40b885e2f2 feat: add semantic search over research history (#1475)
* feat: add semantic search over research history

Add a Research History collection that indexes all research reports
and sources for semantic search:

- Add ResearchDocumentLink model to track research → document mappings
- Add research_report and research_source source types
- Create ResearchHistoryIndexer service for on-demand indexing
- Add API endpoints for collection status, indexing, and search
- Add semantic search UI panel on history page with progress tracking
- Add "Save to Collection" feature on results page

The indexer creates Document entries from research reports (markdown)
and sources with content, links them to the Research History collection,
and triggers FAISS indexing for semantic search.

Users can also save individual research to custom collections via
the new "Save to Collection" button on the results page.

* fix: add research_document_links to expected tables

Document new table added for semantic search over research history.

* fix: address review findings for research history semantic search

- Fix exception detail leaks in 5 error handlers (CodeQL flagged)
- Fix N+1 query in search (60 queries → 1 batch join query)
- Fix type matching bug: use ResearchDocumentLink.link_type instead
  of broken UUID string matching on source_type_id
- Fix memory issue: use count() + yield_per(50) instead of .all()
- Extract get_rag_service() to rag_service_factory.py to fix circular
  import (service → routes)
- Add LinkType enum for ResearchDocumentLink.link_type column
- Replace magic strings with constants in indexer
- Wrap history_search.js in IIFE, use shared window.escapeHtml
- Remove duplicate escapeHtml and unused isSafeUrl from save_to_collection.js
- Add 9 tests (5 indexer service + 4 route tests)

* fix: address pre-commit failures and review findings for semantic search

Blocking fixes:
- Remove unused `to_bool` import (pre-commit failure)
- Remove redundant `{e}` from logger.exception() calls (pre-commit)
- Add CSRF token to POST fetch in history_search.js
- Escape API error messages in innerHTML (XSS)
- Guard division by zero in streaming progress when total==0
- Fix getResearchIdFromUrl() to use URLBuilder.extractResearchIdFromPattern
  instead of URLSearchParams (route is /results/<uuid>, not ?id=)
- Fix JS field name mismatch: documents_created→documents_added,
  documents_linked→sources_indexed to match API response

Important fixes:
- Use LinkType.SOURCE enum instead of string literal "source"
- Use standard CSRF pattern (window.api.getCsrfToken) in save_to_collection
- Add warning log in streaming method for failed indexing (matches non-streaming)
- Remove dead `seen_research_ids` set (populated but never used for filtering)
- Fix similarity score formula: use distance directly for IndexFlatIP (cosine),
  1/(1+distance) only for L2

* fix: address code review findings for research history semantic search

Bug fixes:
- Use LinkType.SOURCE enum instead of "source" string literal (indexer:552)
- Use LinkType.REPORT enum instead of "report" string literal (rag_routes)
- Strip exception text from error messages to prevent info leakage (indexer:262)
- Fix distance-to-similarity formula for cosine/IP metric (rag_routes)
- Pass db_password through rag_service_factory to LibraryRAGService
- Add session password retrieval to rag_routes get_rag_service wrapper

Security (XSS):
- Escape data.error in innerHTML (history_search.js:236)
- Replace onclick injection with addEventListener pattern (save_to_collection.js)
- Remove window.saveToCollection global exposure

Hygiene:
- Export ResearchDocumentLink and LinkType from models/__init__.py
- Validate limit param as int with bounds [1, 50] (rag_routes)
- Remove duplicate sqlcipher_utils.py entry in .gitleaks.toml
- Remove dead code: seen_research_ids set (populated but never read)
- Fix redundant exception vars in logger.exception() calls

* fix: address round 2 code review findings for semantic search

- Add None guard for chunk_text in search results (crash fix)
- Remove redundant rag_service.db_password override in index_collection
- Add @require_json_body to 3 new POST endpoints (CSRF mitigation)
- Guard research.query against None/empty at all title derivation sites
- Export RagDocumentStatus from database models __init__.py

* fix: address review findings rounds 7-9 for semantic search

- Fix URL pattern: use URLBuilder.resultsPage() instead of query param (404 fix)
- Fix XSS: replace onclick inline handler with event delegation in save_to_collection
- Fix similarity scoring: use (distance+1)/2 mapping for cosine [-1,1] to [0,1]
- Escape dateStr fallback in history_search for defense-in-depth
- Use LinkType.REPORT enum instead of string literal
- Add @require_json_body to add_research_to_collection and search_research_history

* fix: address round 3 code review findings for semantic search

- Add read-only get_collection() to ResearchHistoryIndexer so search
  route doesn't create collections/indexes as a side effect
- Fix force=True being silently ignored when docs already in collection
  by populating doc_ids_to_index from existing_doc_ids
- Fix escapeHtml fallback treating 0/false as empty string (match
  xss-protection.js null/undefined check + String() coercion)
- Escape research_created_at in catch branch of date formatting

* fix: address review findings rounds 10-11 for semantic search

- Remove @require_json_body from index_single_research (body is optional)
- Remove duplicate LinkType import in search_research_history
- Add query length limit (10000 chars) before embedding call

* fix: address 9 blocking review findings for semantic search PR

Database:
- Revert Document.resource_id FK from CASCADE back to SET NULL (prevents
  destructive deletion of documents when a ResearchResource is removed)
- Restore foreign_keys="[Document.resource_id]" on Document.resource
  relationship (required for SQLAlchemy disambiguation with bidirectional FKs)

Backend (research_history_indexer.py):
- Replace yield_per(50) with materialized ID queries in both
  index_all_research() and index_all_research_streaming() to avoid
  SQLite/SQLCipher locking from nested session writes on an active cursor
- Propagate force parameter to _index_documents_to_rag() so force=True
  actually triggers re-indexing in FAISS
- Replace deprecated session.query(Model).get() with session.get()
  (required for SQLAlchemy 2.0 compatibility)
- Check for existing Document by hash before inserting to avoid
  IntegrityError on unique document_hash constraint (reuses document
  when same content is re-indexed)
- Use len(content.encode("utf-8")) for accurate byte-size file_size

Frontend:
- Escape result.similarity and result.research_id in innerHTML to
  prevent XSS in history_search.js
- Add response.ok checks before response.json() on all 4 fetch calls
  across history_search.js and save_to_collection.js

* fix: address round 2 review findings for semantic search PR

Critical:
- Normalize query vector (L2) before FAISS search to match indexed
  vectors — langchain's FAISS wrapper normalizes during indexing but
  the route bypassed the wrapper, producing wrong similarity rankings

High:
- Guard against IntegrityError when hash-reused Document already has a
  ResearchDocumentLink (unique=True on document_id) or DocumentCollection
  entry — check for existing rows before inserting
- Remove user-supplied collection_id from error message to prevent
  information disclosure

Medium:
- Add ResearchDocumentLink cleanup to delete_document_completely() since
  SQLite FK cascades are not enforced (no PRAGMA foreign_keys = ON)
- Move get_source_type_id() calls inside 'if document is None' block to
  avoid unnecessary DB queries when reusing existing documents
- Use explicit content.encode("utf-8") consistently for hash computation

Low:
- Escape dateStr in innerHTML (history_search.js)
- Escape data.documents_added/sources_indexed in innerHTML
  (save_to_collection.js)

* fix: reuse similarity_search_with_score instead of manual FAISS search

Replace ~80 lines of manual FAISS search (embed_query, L2 normalization,
index.search, docstore ID mapping, custom similarity formula) with
langchain's similarity_search_with_score(), matching the pattern already
used in search_engine_collection.py. Also:
- Remove dead `or {}` after @require_json_body decorator
- Fix DOMContentLoaded race in history_search.js (readyState check)
- Remove duplicate urls.js load in history.html (already in base.html)
- Add 3 tests: hash collision reuse, force=True propagation, index_all

* refactor: replace custom search route with generic collection search endpoint

Replace the 170-line search_research_history() route with a generic
search_collection() endpoint that delegates to CollectionSearchEngine.
This reuses the same search code the research pipeline uses instead of
reimplementing FAISS search in route code.

- POST /api/collections/<id>/search replaces POST /api/research-history/search
- Research metadata enrichment extracted to _enrich_with_research_metadata()
- UI: unified search bar with text/semantic mode toggle (brain icon)
- Semantic search panel slimmed to indexing controls only
- history_search.js exports semanticSearchHistory() instead of managing its own UI

* fix: address 4 integration review findings for collection search refactor

- F1: Add .ldr-btn-outline.active CSS rule so semantic toggle shows visual feedback
- F2: Fall back to source_id when document_id missing in collection search metadata
- F3: Guard window.semanticSearchHistory call with helpful loading message
- F4: Return needsIndexing sentinel when no collection indexed, show guidance UX

* refactor: remove ResearchDocumentLink model — use existing Document columns

Document already has research_id, resource_id, and source_type_id columns
that fully track which research produced which document and whether it's
a report or source. ResearchDocumentLink was a redundant junction table
duplicating these relationships.

- Remove ResearchDocumentLink model and LinkType enum from library.py
- Remove from model exports, cascade_helper, and schema stability test
- Rewrite indexer to query Document.research_id/resource_id directly
- Rewrite _enrich_with_research_metadata to join Document→SourceType
- Extract _ensure_in_collection helper to reduce duplication
- Update tests to assert on Document fields instead of link table

* docs: clarify hash-dedup constraint in _create_document_from_report

document_hash has unique=True, so identical content must share a
Document row. Add comment explaining research_id points to the
first creator in the hash-collision case.

* refactor: restore research_report/research_source as source type categories

These are seed data rows in the existing source_types table, not schema
changes. They give research reports and sources their own category via
Document.source_type_id, which is required (nullable=False).

* feat: hybrid search mode as default on history page

Replace the toggle button with a Bootstrap dropdown showing three modes:
- Hybrid (default): instant text filter + semantic results appended below
- Text Only: title/query filter, no API call
- AI Only: semantic search only

Hybrid mode shows text matches immediately, then appends deduplicated
semantic results after a 500ms debounce. Race conditions are guarded
by a hybridSearchId counter. Not-indexed state silently falls back to
text-only with no error.

* feat: tiered ranked results for hybrid search mode

Replace the two-section layout (text above, semantic below) with a
three-tier ranked list:
- Tier 1: items matching both title and content, sorted by similarity
- Tier 2: text-only matches in recency order
- Tier 3: semantic-only matches below a divider, sorted by similarity

Tier 1 cards show an AI match badge with similarity % and a 2-line
snippet preview. Tier 3 items attempt to find the full history record
for full action buttons, falling back to a simplified View-only card.

Remove renderHybridSemanticSection from history_search.js (replaced by
buildTieredResults + renderMergedResults in history.js). Split debounce
into separate input and semantic timers.

* feat: render markdown in snippet previews via marked + DOMPurify

Snippets from semantic search often contain markdown (bold, code,
emphasis). Use marked.parseInline() + DOMPurify.sanitize() to render
them as rich inline HTML instead of escaped plaintext. Falls back to
escapeHtml when libraries aren't loaded.

* refactor: extract semantic search into research_library/search/ subpackage

Backend:
- Create research_library/search/ with routes/ and services/ subdirs
- Move research_history_indexer.py to search/services/ (re-export stub
  at old location for backward compatibility)
- Extract 6 research history + collection search routes from rag_routes.py
  (3169→2795 lines) into search/routes/search_routes.py with own blueprint
- Register search_bp in app_factory.py

Frontend:
- Create shared semantic_search.js exposing window.SemanticSearch with:
  renderSnippet, buildTieredResults, createSemanticResultCard, isSafeExternalUrl
- Create semantic-search.css with all semantic search styles (moved from
  history-icons.css + consolidated inline styles)
- history.js and history_search.js now use shared module instead of
  private implementations

Tests:
- Move test files to tests/research_library/search/ parallel structure
- Fix mock paths for rag_service_factory refactor across 3 test files

* fix: guard against non-array API responses and mode-change race in hybrid search

Three bugs found during 8-round architectural review:

1. (CRITICAL) semanticResults = results || [] passes non-array objects
   (like error responses) to buildTieredResults, causing TypeError.
   Fix: use Array.isArray() check.

2. (WARNING) Switching search mode while semantic results are in-flight
   causes hybrid results to overwrite text-only render. Fix: check
   searchMode === 'hybrid' in the async callback.

3. (WARNING) Semantic results with null research_id produce String(null)
   = "null", collapsing to one map entry and creating broken navigation
   links. Fix: skip results with falsy research_id.

* refactor: extract search mode strings into shared LDR_CONSTANTS

Add js/config/constants.js loaded globally via base.html, following
the existing urls.js pattern. Replace magic strings 'hybrid', 'text',
'semantic' in history.js with LDR_CONSTANTS.SEARCH_MODE constants.

* refactor: rename Research History collection display name to History

Users see "History" in the menu, so the collection should match.
Add RESEARCH_HISTORY_COLLECTION_NAME and description as constants
in constants.py rather than hardcoding strings.

* docs: add semantic search architecture guide with mermaid diagrams

Add docs/architecture/SEMANTIC_SEARCH.md covering:
- Indexing pipeline (ResearchHistory → Documents → FAISS)
- Search pipeline (Hybrid/Text/AI-Only modes)
- Three-tier merge algorithm
- File structure (backend + frontend)
- API routes reference
- Reusing on other pages guide

Also: populate search/services/__init__.py with ResearchHistoryIndexer
export, and add cross-references in OVERVIEW.md and DATABASE_SCHEMA.md.

* tests: add route tests for 3 untested research-history endpoints

- TestGetResearchHistoryCollectionRoute: happy path (200 + fields) and
  exception (500) for GET /library/api/research-history/collection
- TestIndexResearchHistorySSERoute: happy path SSE (200 text/event-stream,
  correct data: lines) for GET /library/api/research-history/index
- TestIndexSingleResearchRoute: happy path (200), error status (400), and
  exception (500) for POST /library/api/research-history/index/<id>
- Tighten test_index_research_creates_documents assertion from >= 1 to == 2
  (1 report + 1 source) in test_research_history_indexer.py

* fix: resolve critical bugs and pattern violations in semantic search feature

- Fix CSS never loading: change {% block styles %} to {% block extra_head %}
- Fix undefined var(--primary): replace with var(--primary-color) in 6 locations
- Fix LDR_CONSTANTS block-scoped: use window.LDR_CONSTANTS for global access
- Fix broken /library/api/collections/list URL: use correct /library/api/collections
- Centralize 5 hardcoded API URLs into URLS.LIBRARY_API constants
- Replace 6 console.error calls with SafeLogger.error
- Replace hardcoded "completed" strings with ResearchStatus.COMPLETED enum
- Add threading.Lock concurrency guard inside SSE generate() for bulk indexing
- Eliminate nested get_source_type_id sessions: inline SourceType queries (4 calls)
- Add RAG indexing failure logging and rag_warning in result dict
- Add ARIA accessibility: role=button, tabindex, aria-expanded, keyboard handler
- Replace inline onclick handlers with addEventListener
- Remove dead try/catch around EventSource constructor
- Add beforeunload cleanup for active EventSource
- Apply ruff format to llm_utils.py and settings_routes.py (pre-existing CI fix)

* fix: correct test patch targets after indexer refactoring

- Route tests: patch ResearchHistoryIndexer at definition module
  (lazy imports inside function bodies aren't patchable at the route module)
- Service tests: remove all get_source_type_id patches (function was replaced
  with inline SourceType queries; DB fixtures already seed the source types)

* fix: SSE lock timeout, extract inline styles, correct DATABASE_SCHEMA docs

- Add 10-minute wall-clock timeout to SSE bulk indexing to prevent DoS
  via indefinite lock hold (checked each iteration in generate())
- Extract structural inline styles from history.html into CSS classes:
  ldr-semantic-panel-header, ldr-indexing-status, ldr-progress-track,
  ldr-progress-bar, ldr-search-row
- Fix DATABASE_SCHEMA.md: SourceType is a normalized table (not an enum)
  with values: research_download, user_upload, manual_entry,
  research_report, research_source

* fix: resolve test failures and edge cases from review

- Fix test_library_init.py: update source type count from 3 to 5, mock
  ensure_research_history_collection in all initialize_library_for_user tests
- Initialize source_count = 0 before branch to prevent UnboundLocalError
- Filter out empty-string report_content (report_content != "") to prevent
  entries from being stuck as permanently pending
- Sanitize SSE data.percent with Number() clamped to [0,100] to prevent
  CSS injection via style.width

* fix: address review findings — generic errors, RAG cleanup, tests

- Remove research_id from error message (match codebase pattern)
- Wrap get_rag_service() in context manager to release resources
- Clarify force parameter docstring behavior
- Document threading.Lock process-local limitation
- Add 4 tests: SSE lock contention, add-to-collection success/404,
  RAG service cleanup verification

* fix: SQLite 999 var limit, partial RAG recovery, source_type_id warnings

- Fix #3: Replace materialized set + notin_() with subquery in
  get_indexing_status() to avoid SQLite SQLITE_MAX_VARIABLE_NUMBER crash
  when >999 research entries are indexed; use .count() instead of len(set)
- Fix #7: In index_research() existing_docs branch, query full
  DocumentCollection objects to inspect the indexed flag; queue docs
  that are in the collection but have indexed=False for re-indexing;
  update early-return guard to also check doc_ids_to_index
- Fix #14: Add logger.warning() in _create_document_from_report() and
  _create_document_from_source() when the SourceType row is not found

* fix: per-user lock, SSE safety, race guards, data contracts, and UX bugs

- Per-user index lock instead of global lock (#1)
- Use response.call_on_close for lock release instead of generator finally (#2)
- Add semantic search race guard with semanticSearchId counter (#11)
- Remove hybrid loading indicator on mode-changed early return (#5)
- Close EventSource before overwrite in triggerIndexing (#6)
- Always set research metadata fields in _enrich_with_research_metadata (#10)
- Wrap button text in <span> for mobile CSS (#12)
- Guard SafeLogger usage in constants.js (#16)
- Update tests for per-user lock and partial RAG indexing recovery

* fix: wrap DB session in try/except, guard renderSemanticResults, fix modal leak and transition

- Move try/except to wrap get_user_db_session in add_research_to_collection
  so DB errors return JSON instead of raw 500 HTML
- Add typeof guard for window.renderSemanticResults alongside existing
  semanticSearchHistory check in semantic search mode
- Use bootstrap.Modal.getOrCreateInstance instead of new bootstrap.Modal
  to prevent duplicate instance creation on repeated clicks
- Use double-rAF for settling transition so browser paints the 0.6 opacity
  frame before removing the class

* fix: prevent IntegrityError on missing SourceType, fix Library button class, reset progress color

- Return None from _create_document_from_report/_create_document_from_source
  when SourceType rows are missing instead of proceeding with
  source_type_id=None which violates the NOT NULL constraint
- Fix Library button CSS class: btn-outline → ldr-btn-outline to match
  all other action buttons
- Reset progressText.style.color on new indexing attempt so error red
  doesn't persist into subsequent runs

* test: add partial RAG retry and ensure_research_history_collection tests

- TestIndexResearch: add test_index_research_retries_unindexed_documents
  verifying that a second index_research call with index_to_rag=True
  returns "success" (not "skipped") and calls _index_documents_to_rag
  when DocumentCollection.indexed is False after the first pass

- TestEnsureResearchHistoryCollection: new class with three tests
  covering create-when-missing, return-existing-id, and exception
  re-raise paths of ensure_research_history_collection

* test: add 404 and enrichment-default-fields cases to TestSearchCollectionRoute

- test_collection_not_found_404: mocks get_user_db_session as a context
  manager whose filter_by().first() returns None, asserts 404 with
  success=False and "not found" in error
- test_enrich_default_fields_when_document_not_in_db: mocks two successive
  get_user_db_session calls (collection lookup + enrichment join) and
  CollectionSearchEngine; when the join returns no rows the enrichment
  branch falls into the else path and sets type='source', research_id=None,
  research_title='', etc.

* refactor: use handle_api_error, namespace HistorySearch globals, URL and bootstrap guards

- Replace 4 inline try/except error handlers with handle_api_error()
  from research_library.utils for consistent error response format
- Namespace 5 window.* globals from history_search.js under
  window.HistorySearch to avoid polluting the global scope
- Replace hardcoded /api/delete/ URL with URLBuilder.deleteResearch()
- Add typeof bootstrap guard in save_to_collection.js modal creation

* docs: expand Research History collection description

Clarify that indexing enables AI-powered semantic search and that
the collection is used by the History page search in AI/Hybrid mode.

* fix: remove hardcoded setting fallbacks from rag_service_factory

- Remove inline default values from get_setting() calls — the settings
  system loads defaults from JSON config files automatically
- Replace silent fallback on invalid JSON text_separators with ValueError
- Replace `or` fallbacks on collection fields with proper `is not None`
  checks to avoid swallowing legitimate 0 or empty values
- Fix test mock to use properly escaped JSON string for text_separators
- Update test_invalid_json_text_separators to expect ValueError

* fix: use DocumentCollection join for research history counts

Replace get_indexing_status() call in get_research_history_collection
route with inline DB queries that count via DocumentCollection join,
matching how the collection page counts. This fixes the mismatch where
the History page showed "1/24 indexed" while the collection page showed
"25 indexed" — the old logic counted by source_type_id which missed
documents added through the collection page directly.

* feat: add convert_all_research() and POST /convert-all route

Adds ResearchHistoryIndexer.convert_all_research(force) which converts
all completed research entries into library Documents within a single DB
session, avoiding the nested-session issues on SQLite that arise when
calling index_research() (which opens its own session) in a loop.

Also adds POST /library/api/research-history/convert-all that delegates
to the new method, accepting an optional `force` JSON field.

Tests cover happy path, already-converted skipping, force re-conversion,
missing SourceType early-return, and source document creation.

* feat: auto-convert research to documents on completion

Thread user_password through cleanup_research_resources →
notify_research_completed so the auto-conversion hook can open
the user's encrypted database.

After research completes, automatically create Document rows in
the History collection (index_to_rag=False — documents only, no
FAISS indexing). Users trigger FAISS via "Index All" on the History
page or the collection page's index button.

* refactor: remove dead code from ResearchHistoryIndexer

Delete get_indexing_status() (replaced by inline queries in the route),
delete index_all_research() (non-streaming, never called in production),
remove rag_indexed/rag_warning from index_research() return dict (no
consumer reads them), drop the unused Callable import, and remove the
corresponding TestIndexAllResearch and TestGetIndexingStatus test
classes.

Also clean up add_research_to_collection route: drop the index_to_rag
pass-through and its docstring entry since the method default (True)
is sufficient.

* feat: chain convert-all before SSE indexing on History page

The "Index All" button now first POSTs to /convert-all to ensure
any unconverted research entries are turned into Documents, then
proceeds with the existing SSE stream for FAISS indexing. The
convert step is fast (~1-2s) and non-fatal — if it fails, FAISS
indexing proceeds anyway since the SSE stream also handles
conversion internally.

* fix: pass user_password on early-termination path, fix skipped counter

- Extract user_password from kwargs before the termination check so
  cleanup_research_resources gets it on all paths (not just normal completion)
- Fix convert_all_research() skipped counter: count total eligible entries
  before filtering so skipped = total_eligible - candidates when force=False

* fix: History page shows same document counts as collection page

- Switch from indexed_research/total_research to indexed_documents/
  total_documents so both pages show the same numbers for the same
  collection
- Auto-backfill unconverted research entries on History page visit
  (idempotent — skips already-converted entries)
- Update label from "research indexed" to "documents indexed"

* refactor: replace SSE EventSource with POST+poll in history_search.js

Switch the History page indexing from a long-lived SSE stream to the
same POST /index/start → poll /index/status pattern already used by
collection_details.js. Removes activeEventSource, adds
indexingPollInterval, startPolling(), and checkAndResumeIndexing()
matching the collection page's field names and 2-second interval.
Also removes the now-unused RESEARCH_HISTORY_INDEX URL constant.

* refactor: remove custom SSE indexing infrastructure from backend

- Remove `index_research_history` SSE endpoint and its per-user lock
  infrastructure (`_user_index_locks`, `_user_index_locks_guard`,
  `_get_user_lock`) from search_routes.py
- Remove `import time`, `import threading`, `stream_with_context`, `Response`,
  and `json` imports that were only needed by the SSE endpoint
- Remove the `indexer.convert_all_research()` auto-call from
  `get_research_history_collection` (now read-only)
- Remove `index_all_research_streaming()` and `_index_documents_to_rag()`
  methods from ResearchHistoryIndexer; FAISS is handled by the collection's
  background worker
- Remove `index_to_rag` parameter from `index_research()` and its
  `if index_to_rag` block; update processor_v2.py call site accordingly

* test: remove tests for deleted SSE endpoint and _index_documents_to_rag

- Delete TestIndexResearchHistorySSERoute (index_research_history SSE
  endpoint, _get_user_lock, and index_all_research_streaming are gone)
- Delete TestRAGServiceCleanup (_index_documents_to_rag is removed)
- Remove all _index_documents_to_rag patch.object calls and index_to_rag
  keyword arguments from remaining TestIndexResearch / TestForcePropagation
  / TestHashCollisionReuse tests
- Strip mock_rag assertions that referenced the removed RAG call

* fix: three frontend robustness fixes in history_search.js

- Add cachedCollectionId null guard in triggerIndexing() with user-facing error message
- Wrap startResp.json() in try/catch to handle non-JSON server responses
- Add pollErrorCount to stop polling after 5 consecutive network errors

* refactor: remove source indexing from History collection, extract auto_convert_research

- Remove source document creation (MIN_SOURCE_CONTENT_LENGTH, SOURCE_TYPE_SOURCE,
  _create_document_from_source, ResearchResource queries) from ResearchHistoryIndexer;
  History collection now only indexes report documents.
- Add module-level auto_convert_research() function to research_history_indexer.py
  with built-in exception handling, replacing the inline try/catch in processor_v2.py.
- Update re-export stub and __init__.py to expose auto_convert_research.
- Allow db_password=user_password pattern and file paths in .gitleaks.toml.

* test: update tests for report-only History collection (no sources)

- Fix test_index_research_creates_documents: expect 1 doc (report only)
- Replace test_converts_sources_with_sufficient_content with
  test_converts_report_only_no_sources (asserts only report created)

* fix: round 2 review — import path, rag defaults, double commit, JS bugs

- Fix wrong relative import in processor_v2 (.. → ...) that broke
  auto-conversion entirely
- Restore rag_service_factory fallbacks for local_search_* settings
  that have no JSON defaults (prevents int(None) crash on fresh install)
- Remove double commit in index_research existing-docs branch
- Fix N+1 SourceType query in convert_all_research loop
- Change inner JOIN to outerjoin on SourceType in search enrichment
- Add auto-conversion on GET /research-history/collection endpoint
- Reset pollErrorCount before new polling session
- Set isIndexing before await to prevent double-click race
- Use URLS config for index/start and index/status endpoints
- Add searchInput null guard in handleSearchInput
- Remove stale hybrid-loading-indicator before appending new one

* fix: round 3 review — null guards, auto-convert test coverage

- Add null guards for indexed-count/total-count DOM elements
- Add test that GET endpoint calls convert_all_research
- Add test that convert_all_research failure doesn't cause 500

* fix: round 4 review — stale sources_indexed, DOMPurify config, docs

- Remove stale sources_indexed reference from save_to_collection success msg
- Remove loadCollections() call from error handler (hides error message)
- Add restrictive DOMPurify config to renderSnippet (limit allowed tags)
- Update SEMANTIC_SEARCH.md API table to match actual routes

* fix: round 5 — ID type mismatch, misleading force docstring

- Use String(h.id) for consistent ID comparisons (dataset values are
  always strings, API IDs may be numeric)
- Fix index_research docstring: force does NOT trigger FAISS indexing

* fix: polling response.ok guard, stale collection cache, hybrid UI cleanup

- Check response.ok before .json() in polling/resume to prevent infinite
  loop on HTTP errors (history_search.js)
- Clear cachedCollectionId on 404 so next search shows "needs indexing"
  instead of permanent failure (history_search.js)
- Remove stale hybrid-loading-indicator on mode switch (history.js)
- Re-trigger handleSearchInput after delete to preserve hybrid/semantic
  state instead of losing Tier 1 badges and Tier 3 results (history.js)
- Restore original button HTML on save error instead of showing stuck
  "Saving..." spinner (save_to_collection.js)

* fix: remove dead index_single_research endpoint, orphaned stub, escape document_count

- Remove unused index_single_research route (add-to-collection covers
  the same use case via collection infrastructure)
- Delete orphaned re-export stub at research_library/services/ (no
  consumers, module lives in search/services/)
- Escape document_count in innerHTML for consistency with all other
  fields in the same template
- Update docs and tests to match

* fix: session rollback on flush error, SourceType check, status guard, IntegrityError handling

- Add session.rollback() in convert_all_research per-entry except block
  to clear PendingRollbackError before next iteration
- Return error (not silent success) when _create_document_from_report
  returns None due to missing SourceType
- Guard index_research against non-COMPLETED research to prevent
  indexing partial content
- Handle IntegrityError on commit in index_research for concurrent
  auto-convert + manual indexing race condition

* refactor: simplify index_research by reusing _create_document_from_report

The 130-line index_research method duplicated existence checks, hash
dedup, and DocumentCollection linking that _create_document_from_report
already handles internally. Collapsed to ~35 lines that validate the
research, delegate to the helper, and commit.

- Remove redundant "existing docs" branch (60+ lines)
- Remove unused force parameter (no frontend caller)
- Remove dead get_collection() method (no callers)
- Update tests: remove TestForcePropagation, update idempotency assertion

* refactor: deduplicate _get_rag_service_for_thread by reusing rag_service_factory

_get_rag_service_for_thread duplicated ~95% of the settings resolution
logic from rag_service_factory.get_rag_service (default settings loading,
JSON parsing, collection settings lookup). Replace with a thin wrapper
that delegates to the factory and propagates db_password to sub-managers
via the property setter for thread-safe access.

Reduces 140 lines → 28 lines. Settings changes now only need to be made
in one place (rag_service_factory).

* fix: batch rollback bug in convert_all_research, guard nullable .value

- convert_all_research: commit per-entry instead of batching 100, so a
  single failure only rolls back that entry (not the whole batch)
- rag_service_factory: guard collection.embedding_model_type.value for
  nullable column to prevent AttributeError
- docs: fix endpoint count (4, not 5) in SEMANTIC_SEARCH.md

* fix: response.ok check, URL encoding, doc diagram path

- history_search.js: add response.ok check before parsing JSON in
  triggerIndexing (consistent with other fetch calls in the file)
- history.js: use URLS.PAGES.LIBRARY + encodeURIComponent instead of
  hardcoded string for library navigation
- SEMANTIC_SEARCH.md: fix diagram path to include /library/api/ prefix
2026-03-22 22:54:29 +01:00
LearningCircuit
54b9dc2579 ci: remove OSSAR scan from release gate (#2911)
OSSAR's summary step hardcodes "192 ESLint Warnings" and specific file
names regardless of actual scan results, providing zero dynamic signal.
It also uses the deprecated `set-output` command.

CodeQL + Semgrep + Bearer already provide comprehensive SAST coverage.
ESLint checks are handled by pre-commit hooks.
2026-03-22 21:47:06 +01:00
LearningCircuit
05b96fbe3f refactor: move engine module paths from settings DB to hardcoded registry (#2843)
* refactor: move engine module paths from settings DB to hardcoded registry

Engine implementation details (module_path, class_name, full_search_module,
full_search_class) are internal wiring, not user configuration. Storing them
in the settings DB created a security attack surface requiring blocklist
validation and route blocking.

Changes:
- New engine_registry.py with frozen dataclass entries for all 24 engines
- search_engines_config.py injects registry data after loading DB settings
- search_engine_factory.py passes engine_config to full search wrapper
- Remove ~52 module/class entries from 9 JSON defaults files
- Remove BLOCKED_SETTING_PATTERNS, is_blocked_setting(), and 4 call sites
- Remove absolute→relative normalization from module_whitelist.py
- Update docs, tests, and golden master

* fix: remove TestGetBlockedSettingsError that references removed function

The get_blocked_settings_error() function was removed as part of the
engine registry refactor. This test class was added on main after the
PR was created and wasn't caught by conflict resolution.

* fix: remove TestSaveSettingsPostBlockedSetting that tests removed blocking logic

BLOCKED_SETTING_PATTERNS and is_blocked_setting() were removed as part of
the engine registry refactor. This test was added on main and references
the now-removed blocking behavior.

* fix: inject ENGINE_REGISTRY into parallel/meta engine _get_search_config()

Both ParallelSearchEngine and MetaSearchEngine manually extract config
from settings_snapshot without going through search_config(). Since
module_path/class_name are no longer in the settings DB (they live in
the hardcoded registry), these engines would silently fail to discover
sub-engines on fresh installations.

Fix: inject ENGINE_REGISTRY values after extraction, matching the
pattern used in search_config().

Also fixes MetaSearchEngine's stale check for
"search.engine.auto.class_name" in settings_snapshot — this key no
longer exists in settings DB, so auto engine config would be skipped.

* fix: update tests for engine registry refactor

- test_whitelist_config_consistency: check ENGINE_REGISTRY instead of
  JSON defaults (module_path/class_name no longer in defaults)
- test_meta_search_engine_high_value: expect registry-injected
  module_path/class_name in _get_search_config() output
- test_meta_search_engine_extended: registry overwrites snapshot values
- test_settings_routes_coverage: remove blocked setting tests (blocking
  logic removed — registry is now the security mechanism)
- test_settings_routes_deep_coverage2: same as above

* fix: add 5 missing engines to registry, strip module_path from their settings

Add gutenberg, openlibrary, pubchem, stackexchange, and zenodo to
ENGINE_REGISTRY (were added to main in #1540 after this branch diverged).

Remove module_path/class_name from their settings JSON files and golden
master, matching the pattern established for all other engines.

Expand test_engine_registry.py to scan per-engine settings_*.json files
and verify no settings files still contain module_path/class_name.

* fix: inject full_search_module/class in meta/parallel engine _get_search_config()

The registry injection in MetaSearchEngine and ParallelSearchEngine was
missing full_search_module and full_search_class fields, making it
inconsistent with the main search_config() injection. This would cause
full-search wrappers to fail when created through meta/parallel engines.

* fix: resolve pre-commit formatting issues and sync pdm.lock after merge with main
2026-03-20 20:22:10 +01:00
LearningCircuit
d89c96353d remove: dedicated vLLM provider (use openai_endpoint instead)
The in-process vLLM provider (requiring torch+transformers+vllm ~10GB) is
obsolete — vLLM is universally run as a server and accessed via its
OpenAI-compatible API, which the openai_endpoint provider already handles.

Removes vllm from: config, pricing, rate limiting, hardware warnings,
frontend dropdowns, pyproject.toml optional deps, docs, default_settings.json,
golden master, benchmark template, and all related tests (37 files, -436 lines).

Keeps vLLM mentions in openai_endpoint context (labels, docs) since that's
the correct usage path.
2026-03-20 18:49:54 +01:00
LearningCircuit
9988f70318 refactor: remove fallback LLM (FakeListChatModel) from all providers (#2717)
* cleanup: remove @pytest.mark.requires_llm decorators and fallback LLM doc references

Remove the `@pytest.mark.requires_llm` decorator from all test files since
the fallback LLM infrastructure is being removed. Update docs to remove
references to `LDR_TESTING_USE_FALLBACK_LLM` and `LDR_USE_FALLBACK_LLM`
environment variables from troubleshooting and CI configuration tables.

* test: remove fallback LLM references from test files

Remove all fallback-related test code: TestGetFallbackModel classes,
FakeListChatModel assertions, check_fallback_llm parameters, and
LDR_USE_FALLBACK_LLM skipif markers. Replace fallback-returning tests
with ValueError-expecting tests for missing API keys and unavailable
providers.

* cleanup: remove remaining use_fallback_llm references from source and tests

Remove use_fallback_llm() imports and calls from db_utils.py and
rate_limiting/tracker.py. Clean up test files that referenced
check_fallback_llm, get_llm_setting_from_snapshot, and
LDR_USE_FALLBACK_LLM env var.

* cleanup: remove remaining fallback LLM references from test files

Remove all use_fallback_llm mocks, LDR_USE_FALLBACK_LLM env var checks,
and related skip logic from test files since the fallback LLM feature
has been removed from source code.

- test_db_utils.py: Remove use_fallback_llm mock patches from 4 tests
- test_rate_limiter.py: Replace use_fallback_llm mock with is_ci_environment
- test_tracker.py: Replace fallback mode test with CI mode test
- test_tracker_quality_stats.py: Remove 8 use_fallback_llm decorators
- test_openai_api_key_usage.py: Remove LDR_USE_FALLBACK_LLM skipif
- test_llm_provider_integration.py: Remove LDR_USE_FALLBACK_LLM skipif
- test_ci_config.py: Remove LDR_USE_FALLBACK_LLM env var setting
- test_search_system.py: Remove LDR_USE_FALLBACK_LLM skipif
- run_all_tests.py: Remove LDR_USE_FALLBACK_LLM log line
- test_env_auto_generation.py: Remove testing.use_fallback_llm mapping
- test_lmstudio_provider.py: Fix docstring referencing removed function

* refactor: remove fallback LLM from providers, settings, CI, and tests

- Remove FakeListChatModel import and get_llm_setting_from_snapshot wrapper
- Update all provider imports to use get_setting_from_snapshot directly
- Remove LDR_USE_FALLBACK_LLM env var from CI workflows
- Remove use_fallback_llm setting and registry function
- Remove skip_if_using_fallback_llm fixture from conftest.py
- Update tests to expect ValueError instead of fallback model

* refactor: remove fallback model from llm_config and thread_settings

- Remove get_fallback_model() and all call sites in get_llm()
- Replace fallback returns with descriptive ValueError raises
- Remove LDR_USE_FALLBACK_LLM env check block from get_llm()
- Remove check_fallback_llm parameter from get_setting_from_snapshot
- Remove get_llm_setting_from_snapshot convenience wrapper
- Add ValueError re-raise in Ollama model-not-found path
- Regenerate golden master with ensure_ascii=False for proper Unicode

* fix: restore requires_llm skip mechanism and fix CI test failures

Three fixes for CI regressions from fallback LLM removal:

1. Restore @pytest.mark.requires_llm decorator and skip fixture
   (skip_if_no_real_llm) that checks LDR_TESTING_WITH_MOCKS env var.
   Re-add decorators to 17+ tests across 9 files that need real LLMs.

2. Fix type coercion in test_openai_api_key_usage.py by converting
   fixture from dict format to simplified raw-value format, bypassing
   get_typed_setting_value string coercion.

3. Fix golden master format mismatch by adding ensure_ascii=False to
   test serialization to match regeneration script. Narrow pre-commit
   hook trigger to only defaults/*.json files.

* fix: remove remaining fallback LLM references from coverage tests

- Delete TestGetFallbackModel class from test_llm_config_coverage.py
  (5 tests that imported removed get_fallback_model)
- Update test_llm_config_missing_coverage.py: 6 tests that expected
  FakeListChatModel fallback now expect ValueError/exception raises
- Remove use_fallback_llm mocks from test_rate_limiting_tracker_coverage.py
  (delete 4 fallback-specific tests, fix 9 tests)
- Remove use_fallback_llm mocks from rate_limiting/test_tracker_coverage.py
  (fix _make_tracker helper and 25 tests)
- Add @pytest.mark.requires_llm to test_analyze_documents_minimal
- Merge upstream main to pick up new coverage test files

* fix: remove dead LDR_USE_FALLBACK_LLM env var from accessibility tests CI

This env var was added to the accessibility test server but has no
effect since the fallback LLM code was removed.

* fix: align pre-commit hook description and error listing with defaults-only trigger

The hook file pattern was narrowed to defaults/ only, but the description
and error-listing code still referenced config/. Remove dead config/ path
from the file listing and update messaging to match.

* fix: update test_llm_config_deep_coverage.py for fallback LLM removal

File was added on main after branch diverged. Remove TestGetLlmFallbackEnvVar
class (tests removed functionality) and update test_provider_lowercased to
expect ValueError instead of fallback model.

* fix: improve "none" provider error message and fix stale CI-mode test

- Add explicit handler for provider="none" with user-friendly message
  instead of misleading "this is a bug" error
- Fix test_load_estimates_skipped_in_ci_mode: _load_estimates no longer
  checks is_ci_environment, test now correctly verifies deferred loading
  behavior in non-programmatic mode
- Update 4 test assertions to match new "none" provider error message
2026-03-20 13:24:59 +01:00
LearningCircuit
add97b1793 docs: polish installation docs after migration (#2889)
* docs: move detailed installation instructions from README to dedicated pages

README Installation Options section (~200 lines) replaced with a compact
table linking to docs/installation.md (hub page), docs/install-pip.md
(dedicated pip guide), and existing docker-compose and Unraid guides.
No content lost — everything is now in focused doc files.

* docs: trim redundant pip section in installation hub page

The pip section in docs/installation.md duplicated nearly all of the
Quick Install content from docs/install-pip.md. Replace with a brief
summary + single install command + link to the dedicated guide,
consistent with the hub-and-spoke pattern used by the Unraid section.

Addresses review feedback from djpetti on PR #2819.

* docs: restore missing installation info from README migration

- Add NVIDIA Container Toolkit full install commands (Ubuntu/Debian) with
  distro note for RHEL/Fedora/Arch to docs/installation.md
- Add GPU docker-compose alias convenience tip
- Add DIY docker-compose configuration guidance (GPU driver, context
  length, keep alive, model selection)
- Add Windows PDF export warning (Pango/WeasyPrint) to docs/install-pip.md
- Fix SQLCipher wording: pre-built wheels available, not "requires
  system-level libraries"
- Restore ldr-web command instead of python -m invocation

* docs: follow-up polish for installation docs migration

- Restructure README Quick Start with clear Option 1/2/3 labels
- Update deprecated LDR_ALLOW_UNENCRYPTED to LDR_BOOTSTRAP_ALLOW_UNENCRYPTED
- Add "Open http://localhost:5000" to install-pip.md after ldr-web step
- Add back-link from install-pip.md to installation overview
- Add Docker/Docker Compose install prerequisite links to installation.md
- Cross-link NVIDIA toolkit commands from docker-compose-guide to installation.md
- Use double quotes for volume spec in Docker Run for cross-platform compat

* docs: restore original Quick Start ordering (Docker Run first)
2026-03-20 11:26:48 +01:00
LearningCircuit
abbd19584a docs: move detailed install instructions from README to dedicated pages (#2819)
* docs: move detailed installation instructions from README to dedicated pages

README Installation Options section (~200 lines) replaced with a compact
table linking to docs/installation.md (hub page), docs/install-pip.md
(dedicated pip guide), and existing docker-compose and Unraid guides.
No content lost — everything is now in focused doc files.

* docs: trim redundant pip section in installation hub page

The pip section in docs/installation.md duplicated nearly all of the
Quick Install content from docs/install-pip.md. Replace with a brief
summary + single install command + link to the dedicated guide,
consistent with the hub-and-spoke pattern used by the Unraid section.

Addresses review feedback from djpetti on PR #2819.

* docs: restore missing installation info from README migration

- Add NVIDIA Container Toolkit full install commands (Ubuntu/Debian) with
  distro note for RHEL/Fedora/Arch to docs/installation.md
- Add GPU docker-compose alias convenience tip
- Add DIY docker-compose configuration guidance (GPU driver, context
  length, keep alive, model selection)
- Add Windows PDF export warning (Pango/WeasyPrint) to docs/install-pip.md
- Fix SQLCipher wording: pre-built wheels available, not "requires
  system-level libraries"
- Restore ldr-web command instead of python -m invocation
2026-03-20 10:46:32 +01:00
LearningCircuit
76d8518a1b docs: pip install now works natively on Windows (#2766)
* docs: update Windows install docs — pip install now works natively

sqlcipher3 0.6.2+ ships self-contained Windows wheels (5.9MB .pyd with
SQLCipher + OpenSSL statically linked). No compilation, Visual Studio,
or system libraries needed. Update README and SQLCipher guide to reflect
this, removing the "for developers" framing and outdated warnings.

Refs #494

* docs: fix README consistency issues from review

- Align SQLCipher wording across quick start and Option 3 sections
- Replace "skip it" with "use standard SQLite instead"
- Replace duplicate pip snippet in Docker section with cross-reference link
- Use `ldr-web` consistently instead of `python -m local_deep_research.web.app`
- Add Windows PDF export note (WeasyPrint/Pango) to Option 3
- Replace SQLCipher guide link in quick start with WeasyPrint setup link
2026-03-18 19:18:08 +01:00
LearningCircuit
7d37d35b2f fix: normalize full_search_module paths and remove dead serpapi references (#2826)
* fix: remove dead serpapi full_search_module/class references

The serpapi engine pointed to `.engines.full_serp_search_results_old` with
class `FullSerpAPISearchResults`, but neither the module file nor the class
exist. All engines (including serpapi) use `.engines.full_search` /
`FullSearchResults`. Update defaults, golden master, docs, and remove the
stale whitelist entry.

* fix: normalize full_search_module in search_config()

search_config() normalized legacy absolute module_path values but skipped
full_search_module. Extend the normalization loop to cover both keys for
consistency with the defense-in-depth normalization in get_safe_module_class().

* fix: check full_search_module key in pre-commit hook

The pre-commit hook only validated module_path keys in JSON files. Extend it
to also check full_search_module, and add regression tests for both cases.

* fix: add debug logging for absolute module path normalization

When get_safe_module_class() normalizes an absolute path to relative form,
log the conversion at debug level for easier debugging of Docker user issues.
2026-03-18 19:03:28 +01:00
LearningCircuit
8ea4787626 fix: rename "Custom OpenAI Endpoint" to "OpenAI-Compatible Endpoint" (#2745) (#2818)
Users selecting Llama.cpp couldn't find the right provider for custom
endpoints because four different names were used across the codebase.
Standardize on "OpenAI-Compatible Endpoint" — the industry-standard
naming used by LM Studio, Ollama, Open WebUI, vLLM, and others.

Changes:
- Provider class: provider_name → "OpenAI-Compatible Endpoint"
- Legacy config, default_settings.json, golden master: consistent name
- JS fallbacks (settings.js, benchmark.html): updated dropdown labels
- Llama.cpp label clarified to "(Local GGUF files only)"
- Docs (faq.md, env_configuration.md): updated references
- Tests: updated assertions and docstrings

No breaking changes — internal keys (openai_endpoint, OPENAI_ENDPOINT),
setting paths, class/function names, and file names are unchanged.
2026-03-18 08:18:20 +00:00
github-actions[bot]
e6d45ab5bb chore: auto-bump version to 1.3.60 (#2709)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-03-13 23:56:51 +01:00
github-actions[bot]
d67438b239 chore: auto-bump version to 1.3.59 (#2527)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-03-10 00:00:35 +01:00
LearningCircuit
19c1777e97 docs: fix inaccurate credential sweep wording and inconsistent file paths (#2614)
- Change "credential sweep (every request)" to "dead-thread credential
  tracking cleanup (every request)" in teardown_appcontext table row,
  since cleanup_dead_threads() removes thread credential tracking entries
  rather than performing active credential security clearing
- Use full src/local_deep_research/ paths in Key Files table to match
  the convention used in the main Key Source Files table
2026-03-08 21:01:15 +01:00
LearningCircuit
0b23d58e85 docs: thread lifecycle, FD budget, and resource exhaustion (#2605)
* fix: prevent file descriptor exhaustion from dead thread engine accumulation

Three root causes addressed:

1. Dead thread engine accumulation (primary): _thread_engines grows
   unboundedly as crashed/terminated threads leave orphaned NullPool
   engines. Add cleanup_dead_thread_engines() that sweeps entries for
   threads no longer in threading.enumerate(). Integrate via throttled
   sweep in teardown_appcontext (every 60s) and periodic sweep in the
   queue processor loop (every 6 iterations).

2. Generic downloader stream=True leak (secondary): generic.py used
   stream=True but never read or closed the response body, holding
   connections open. Removed stream=True since only status_code and
   headers are inspected.

3. Docker default 1024 FD limit (contributing): Add nofile ulimit
   (65536) to docker-compose.yml so the container has headroom for
   WAL mode databases, thread pools, and connection pools.

* fix: address review findings — sweep lock, credential cleanup, flaky test

- Add _sweep_lock to prevent TOCTOU race on _last_sweep_time in
  maybe_sweep_dead_engines() (concurrent teardowns could all pass the
  interval check)
- Move alive_ids computation inside _thread_engine_lock to prevent
  race between snapshot and engine dict mutation
- Sweep dead _thread_credentials (plaintext passwords) alongside engines
  in processor_v2.py and app_factory.py teardown
- Fix flaky test_sweeps_after_interval: replace time.sleep(0.15) with
  _last_sweep_time backdating
- Add tests for credential sweep and module-level cleanup_dead_threads()

* fix: close search engine sessions after research, fix stream=True leak properly

Three improvements to the FD exhaustion fix:

1. generic.py: Restore stream=True (removing it is unsafe — GenericDownloader
   handles ALL URLs and would download multi-GB files into memory). Use context
   manager instead to ensure the streamed connection is properly closed on all
   return paths, preventing socket FD leaks.

2. research_service.py: Add use_search.close() and system.close() in finally
   block of run_research_process(). Search engine HTTP sessions (e.g.
   SemanticScholar's SafeSession) were never explicitly closed after research,
   relying on non-deterministic GC for cleanup.

3. search_system.py + strategies: Add close() method to AdvancedSearchSystem
   and BaseSearchStrategy, with overrides in ConstraintParallelStrategy and
   ConcurrentDualConfidenceStrategy to shut down persistent ThreadPoolExecutors.

Also adds detailed design comments throughout the codebase documenting:
- Why NullPool engines don't leak FDs (memory leak only)
- Why stream=True must NOT be removed from the diagnostic block
- The dual sweep trigger architecture (request-driven + queue-driven)
- Thread ID recycling limitations
- Search engine lifecycle and cleanup responsibilities

Fixes flaky test_removes_dead_thread_entries by using threading.Barrier to
prevent thread ID recycling during test.

* fix: unregister user from news scheduler on logout

The logout handler never called scheduler.unregister_user(), causing:
- Passwords to persist in scheduler memory for up to 48 hours
- Orphaned APScheduler jobs to keep running after logout
- Orphaned jobs to re-create QueuePool engines (~10 FDs each) after
  close_user_database() disposed the original, contributing to FD leaks

Add scheduler unregistration before close_user_database() so running
jobs can finish gracefully while the DB engine is still available.
Add design comment documenting the logout cleanup order.

* test: remove ineffective patch in logout scheduler test

The `routes.get_news_scheduler` patch was ineffective because the logout
handler imports `get_news_scheduler` dynamically inside the function body,
so the name never enters the routes module namespace. The `create=True`
flag masked this by silently creating a new attribute. The real patch on
`subscription_manager.scheduler.get_news_scheduler` is sufficient.

* fix: remove nofile ulimit override from docker-compose.yml

Docker containers inherit ulimits from the Docker daemon, which typically
runs with LimitNOFILE=infinity (1073741816+). Setting nofile to 65536
could actually *lower* the limit for most users, hurting large
installations. The FD leak root causes are already fixed in this PR
(dead-thread engine sweep, session close, scheduler unregister), so the
safety net is unnecessary. Let users and their Docker daemon config
control this.

* fix: add try-except to strategy executor shutdown, elevate scheduler unregister log level

- Wrap executor.shutdown(wait=False) in try-except in strategy close()
  methods for consistency with parallel_search_engine.py pattern
- Change logger.debug → logger.warning for scheduler unregister failure
  on logout, since failure means password stays in scheduler memory

* docs: add comments explaining non-obvious design decisions from deep review

- SQLCipher WAL FD cost (1-3 FDs per connection, multiplied by users)
- Logout cleanup ordering: why unregister before close, known race window
- shutdown(wait=False): why non-blocking, safety via double-cleanup pattern

* docs: add thread lifecycle, FD budget, and resource exhaustion documentation

Knowledge captured from PR #2591 deep review (5 rounds of verification):
- architecture.md: Thread & Resource Lifecycle section with cleanup layers,
  mermaid diagram, FD budget table, and key files reference
- troubleshooting.md: Resource Exhaustion section with diagnosis commands
  and solutions for FD exhaustion
- docker-compose-guide.md: Resource Limits note explaining nofile/memlock
- web/database/README.md: Thread Safety & Connection Model section
- Cross-references added between all 4 docs
- Updated Areas for Improvement (container optimization → resource observability)
- Added encrypted_db.py and thread_local_session.py to Key Source Files
2026-03-08 16:22:17 +01:00
LearningCircuit
8d32f5f9e3 refactor: eliminate server_config.json — env-var-only server settings (#2505)
* refactor: eliminate server_config.json, make server settings env-var-only

Remove the JSON file-based server configuration and sync mechanism.
All 8 server settings (host, port, debug, HTTPS, allow_registrations,
and 3 rate limits) are now read exclusively from environment variables
via get_typed_setting_value() with the existing LDR_* naming convention.

- Rewrite server_config.py: remove get_server_config_path(),
  save_server_config(), sync_from_settings(); simplify load_server_config()
  to use get_typed_setting_value(key, None, ...) for all settings
- Add rate_limit_settings to the config dict (was only via .get() fallback)
- Remove sync_from_settings calls from 3 sites in settings_routes.py
- Hide server settings from UI (visible: false, editable: false) in
  default_settings.json and settings_security.json
- Add security.rate_limit_settings entry to settings_security.json
- Fix swapped min/max on web.port (was min:65535, max:0)
- Update descriptions to reference env var names
- Rewrite test_server_config.py: remove 21 JSON-file tests, keep 13
  defaults/fail-closed tests, add 8 env var override tests (35 total)
- Regenerate golden master settings
- Remove server_config.py from check-file-writes.sh exemption list
- Update docstrings in rate_limiter.py and app.py

* fix: address review findings for server_config.json elimination

- Fix save_all_settings response: return dict (keyed by setting key)
  instead of list, matching GET /settings/api shape; include missing
  visible, min_value, max_value, step fields so visibility filter works
- Fix JS consumer: use dict key access instead of .find() on response
- Fix docs: LDR_WEB_PORT is the correct env var for server bind port,
  not LDR_APP_PORT; add clarifying note
- Remove stale KNOWN_NUMERIC_ISSUES entry for web.port (now fixed)
- Add tests: empty-string and whitespace env var edge cases for
  allow_registrations fail-closed, and env-var override coverage

* feat: add deprecation migration path for server_config.json (#2549)

Users who set `allow_registrations: false` via the UI (persisted in
server_config.json) would silently lose that setting on upgrade,
re-enabling open registration. Docker users are especially at risk
since named volumes persist the file across container upgrades.

Add read-only migration: if server_config.json exists, honor its
values as fallbacks (env var > legacy file > default) and log
deprecation warnings guiding users to migrate to env vars.

No write-back logic is re-added — save_server_config() and
sync_from_settings() remain removed per the PR's intent.

* feat: show web UI warning when legacy server_config.json is detected

Adds a dismissible warning banner in the web interface when the
deprecated server_config.json file exists, using the existing
warning_checks system. Addresses reviewer feedback from PR #2505.

* fix: address review findings for server_config.json elimination

- Change web.host, web.port, web.use_https type from SEARCH to APP
  in both default_settings.json and golden_master_settings.json
- Add 3 tests for check_legacy_server_config() covering dismissed,
  missing file, and file-exists branches
- Add autouse fixture to clear LDR_APP_ALLOW_REGISTRATIONS env var
  in test_server_config.py to prevent test pollution from dev shell

* fix: round 2 review findings for server_config.json elimination

- Fix flaky test_all_four_warnings_simultaneously by mocking
  get_server_config_path to prevent real server_config.json on disk
  from breaking exact set equality assertion
- Add dismiss_legacy_config to _make_settings_manager defaults and
  rename test_all_six_settings_read → test_all_seven_settings_read
- Add orchestrator-level tests for legacy_server_config warning
  (exists/absent/dismissed scenarios)
- Add fail-closed guard for legacy JSON allow_registrations string
  values (e.g. "disabled" → False) to match env var guard
- Log warning for unrecognized keys in legacy server_config.json
  to surface typos like "Port" instead of "port"
- Regenerate CONFIGURATION.md to remove stale server_config.json
  reference in app.debug description

* fix: round 3 review findings — test quality and migration docs

- Replace vacuous `is not None` assertion with meaningful env-var-vs-legacy
  guard priority test using unrecognized values on both paths
- Add positive test for DEPRECATED banner when recognized keys present
- Rename misleading test name to reflect actual scope (hardware + context)
- Add migration section to env_configuration.md for server_config.json users
2026-03-08 16:09:02 +01:00
LearningCircuit
f246fa6044 docs: add comprehensive MCP server documentation (#2546)
* docs: add comprehensive MCP server documentation

- Create standalone docs/mcp-server.md with full MCP server docs covering
  installation, configuration, all 7 tools, research strategies guide,
  ReAct agentic strategy deep dive, MCP client setup, error handling,
  security model, Docker deployment, usage examples, and troubleshooting
- Add MCP Server section to docs/features.md under Advanced Features
- Add MCP Server CLI section to docs/cli-tools.md
- Fix search.search_strategy -> search.strategy in server.py and tests
  to match renamed setting from #2550

* fix(docs): correct 9 issues found in MCP server documentation review

- Revert search.strategy → search.search_strategy in server.py and tests (6 occurrences)
- Fix collection_name description: it's an engine ID, not a display name
- Fix invalid JSON in analyze_documents return example
- Add missing MCP Server CLI entry to cli-tools.md TOC
- Add unknown error type to error handling table
- Fix broken MCP security guide external link
- Clarify Docker section: MCP must run on host (STDIO can't bridge containers)
- Fix "7 research tools" → "7 tools (4 research, 3 discovery)" in features.md
- Add temperature valid range note (0.0-2.0)

* feat(mcp): add `search` tool for raw search results without LLM

Add a new MCP tool that calls a specific search engine and returns raw
results (title, link, snippet) without LLM processing. This enables
external AI agents to perform fast, cost-free searches and handle
result analysis themselves.

- Required `engine` parameter with validation against available engines
- API key presence check before engine creation
- Body-to-snippet normalization for consistent output
- 8 test cases covering success, errors, and edge cases
- Updated docs with tool count (7→8) and parameter reference

* fix(mcp): set thread-local settings context in search tool

Some engine constructors (e.g., arxiv's JournalReputationFilter) call
get_llm() internally without passing settings_snapshot, falling through
to the thread-local settings context. Set and clean up the context so
these engines can resolve settings correctly.

* docs: add OpenClaw MCP client configuration (#2562)

Add OpenClaw configuration subsection alongside Claude Desktop in the
MCP server guide, as suggested in PR #2546 review.

* docs: add Claude Code config, individual search engine examples, and openclaw

- Add Claude Code MCP configuration (.mcp.json) to README and mcp-server.md
- Add search tool to README tools table with LLM Cost column
- Add individual search engine examples (arxiv, pubmed, wikipedia, openclaw)
- Highlight search tool usefulness for monitoring and subscriptions
- List common engines in mcp-server.md search tool section
2026-03-06 03:12:52 +01:00
LearningCircuit
df52e3ec3e feat: implement Reddit feedback improvements (#1909)
* feat: implement Reddit feedback improvements

Based on user feedback from r/LocalLLaMA, this commit addresses several
documentation and usability issues:

Documentation:
- Add macOS port 5000 conflict documentation (AirPlay Receiver conflict)
- Create comprehensive reverse proxy guide (Caddy, Nginx, Traefik)
- Add debug logging guidance with platform-specific paths

UI/UX:
- Add "Advanced" badge and tooltip to Detailed Report mode to set expectations

Feature:
- Add opt-in LLM prompt/response logging (LDR_LOG_LLM_CALLS=true) for debugging

Closes feedback from: reddit.com/r/LocalLLaMA/comments/1qdj2nn/

* fix: correct mypy type issues in llm_log_utils

* docs: remove reverse proxy guide, keep inline note instead

The reverse proxy configuration is generic infrastructure knowledge
not specific to LDR. Replaced the guide link with a one-liner noting
that LDR uses HTTP polling and works with any standard reverse proxy.

* perf: pre-compile regex patterns in llm_log_utils

Avoids recompiling 6 regex patterns on every sanitize() call.

* refactor: keep docs and CSS, remove LLM logging feature

- Keep macOS port 5000/AirPlay troubleshooting docs
- Keep debug logging documentation (fixed LDR_LOG_LEVEL → LDR_ENABLE_FILE_LOGGING)
- Keep .ldr-mode-badge CSS and Advanced badge UI change
- Restore correct Nginx WebSocket reverse proxy config
- Remove LLM logging feature (suggest as separate focused PR)

* docs: add security note at top of Debug Logging section

Move the log file security warning to a prominent blockquote at the
start of the section so it is not overlooked.
2026-03-06 01:32:31 +01:00
LearningCircuit
a466877273 security: gate global scheduler control behind setting (#2035)
* security: gate global scheduler control behind setting

The news scheduler is a global singleton — starting, stopping, or
triggering it affects all users. Add a setting to control whether
these operations are accessible via API.

- Add `news.scheduler.allow_api_control` setting (default: true)
  - Env var: LDR_NEWS_SCHEDULER_ALLOW_API_CONTROL
  - Also configurable via settings UI
- Add `@scheduler_control_required` decorator that checks the setting
- Apply to destructive endpoints: start, stop, check-now, cleanup-now
- Read-only endpoints (status, users, stats) remain accessible to
  any authenticated user

Multi-user deployments can set `LDR_NEWS_SCHEDULER_ALLOW_API_CONTROL=false`
to prevent any user from starting/stopping the global scheduler.

* test: add tests for scheduler_control_required decorator

Tests cover:
- Decorator allows execution when setting is enabled
- Decorator returns 403 when setting is disabled
- Error response includes informative message
- Correct setting key is checked
- Function name is preserved (wraps)

* fix: make scheduler API control setting non-editable for security

The news.scheduler.allow_api_control setting controls a global security
boundary (the scheduler singleton affects all users). Following the
precedent set by app.allow_registrations, this setting should not be
editable from the UI — it must be configured via environment variable
LDR_NEWS_SCHEDULER_ALLOW_API_CONTROL only.

Also adds integration tests verifying that mutating scheduler endpoints
(start, stop, check-now, cleanup-now) return 403 when disabled, while
read-only endpoints (status, users, stats) remain accessible.

* fix: change allow_api_control default from true to false

Make scheduler API control secure-by-default. Since the setting is
editable: false (env-var-only), no existing UI state is affected.
Users who want API control must now explicitly opt in via
LDR_NEWS_SCHEDULER_ALLOW_API_CONTROL=true.

* fix: regenerate config docs and add audit logging for scheduler gate

Regenerate CONFIGURATION.md to include the new
news.scheduler.allow_api_control setting (fixes check-config-docs CI).
Add logger.warning when scheduler API control is blocked, for audit
trail in multi-user deployments.
2026-03-05 01:01:58 +01:00
github-actions[bot]
d896970152 chore: auto-bump version to 1.3.58 (#2458)
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-03-01 23:34:00 +00:00