mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
b1cbc6fbe085d56effd5dbf499cf8071759d60f6
63 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bc45bbf2e7 |
fix(ci): make LDR research workflow honestly fail on Python crash (#4226)
* fix(ci): make LDR research workflow honestly fail on Python crash A real run (job 77511717371, PR #4225) crashed with glibc 'double free or corruption (!prev)' but the workflow reported success and the caller posted a hollow PR comment. Two cooperating defects: the script's exit code was discarded inside set +e / set -e, and `jq .` exits 0 on a zero-byte response.json so the JSON-shape check passed on empty input. Capture the exit code, harden the validation order (exit -> non-empty -> jq -e shape -> .error -> .research non-empty), tee stderr to a log surfaced in the ::error:: annotation, upload the artifact with if: always() so failed runs leave debuggable evidence, and flush stdout in a finally block in the script so a SIGABRT during interpreter shutdown after json.dumps can't drop the otherwise-completed output. Matches the house pattern from dockle.yml and the jq -e idiom from release-gate.yml. * chore(ci): enable faulthandler in ldr-research.py Dumps a Python traceback to stderr on SIGABRT/SIGSEGV/SIGFPE/SIGBUS/SIGILL before the signal re-fires. Pairs with the stderr-capture plumbing earlier in this PR: on the next glibc abort the ::error:: annotation will show "Fatal Python error: Aborted" plus the actual Python stack frame, making the deps-level investigation possible without a re-run. Verified locally: a deliberately aborted child process emits its frame through faulthandler before exiting on signal 6. |
||
|
|
653707a556 |
fix(encoding): add encoding="utf-8" to bare open() / read_text / write_text in examples and scripts (#4118)
Cleanup follow-up to #3797. The check-open-encoding hook was originally scoped with exclude: ^(tests/|examples/|scripts/) because those directories had ~45 pre-existing bare open() calls and addressing them was out of scope for the core Windows bug fix. This commit: * adds encoding="utf-8" to 45 read/write call sites under examples/ and scripts/ — JSON benchmark results, config-doc generators, workflow status pages, and the datetime-timezone pre-commit hook * narrows the hook exclude to ^tests/ only, so future regressions in examples/scripts/ are blocked at commit time Windows users running the benchmark scripts and config-doc generator would previously hit silent failures or UnicodeDecodeErrors on non-ASCII content under cp1252. The package itself was already protected by #3797. |
||
|
|
2723331f67 |
chore(ci): cut workflow-status.md regen diff noise (#4066)
The auto-regenerated workflow-status.md on every version-bump PR produced ~15 rows of churn that wasn't signal: - Status emoji column flipped between ✅ / · / ⏳ depending on which event last ran (e.g. backwards-compatibility flipped ✅→· because the most recent run was a skipped workflow_call, not because it regressed). The live badge column to its right is the source of truth for current status anyway, and run history lives in GitHub Actions itself. Drop the column. - Last activity buckets oscillated across this week / last week / 2 weeks ago for healthy daily/weekly workflows. Coarsen to last 30 days / 1-3 months ago / 3-6 months ago / long ago / never so a healthy workflow sits in one bucket indefinitely. Net effect: regenerations in steady state produce zero diff. Real signal (new stale/disabled workflows, aging past the 30d bucket) still surfaces. |
||
|
|
9755a900eb |
ci(research): extract reusable LDR-research workflow + add issue-trigger caller (#3987)
* ci(research): extract reusable LDR-research workflow + add issue-trigger caller
Three triggers will end up calling the same install-and-run-LDR
plumbing (PR diff today, issue body now, Reddit posts later). Factor
out the middle of the workflow into a reusable workflow so we don't
have to maintain the same logic in three places, and add the
issue-trigger caller on top of it.
Changes:
- .github/workflows/ldr-research-reusable.yml (new) — workflow_call
workflow that takes a fully-assembled query and returns a
comment-ready markdown blob via artifact. Inputs include
forward-compat knobs the future Reddit caller will need
(max-query-length, max-sources, comment-footer override,
include-sources-section, output-truncate-chars).
- .github/workflows/e2e-research-test.yml — refactored from a single
job to three jobs (build-query → research-via-reusable →
post-comment). Behaviour is preserved: same headers, same footer,
same diff truncation at MAX_DIFF_SIZE, same label-removal on
completion.
- .github/workflows/issue-research.yml (new) — triggers on
`issues: types: [labeled]` gated by the same `ldr_research` label
the PR workflow uses (GitHub event-type gating means they don't
conflict). Output has two sections: "For the reporter" (cautious
framing) and "For maintainers" (raw research context). Issue body
is sanitized (control-char strip, 4000-char truncation) and never
reaches a shell.
- scripts/ldr-research.py — renamed from ldr-diff-research.py
(`git mv`, history preserved). Drops --mode, --static-query,
--max-diff-size: query now comes from stdin only and the caller
workflow does prompt assembly. Output JSON shape: {research,
sources, findings, iterations}.
- .github/labels.yml — register ldr_research and ldr_research_static
so they exist canonically rather than via on-the-fly creation.
Reddit research is a follow-up PR; this PR ships the abstraction
shape it will need.
* docs(ci): regenerate workflow status dashboard for new LDR workflows
The check-structure CI gate requires every workflow file to have a row
in docs/ci/workflow-status.md. Regenerate to add rows for the two new
workflows added in this PR. The live-status flips on unrelated rows
(gitleaks, ossf-scorecard, responsive-ui-tests-enhanced, osv-scanner)
are accurate snapshots of current status — the auto-regen workflow
keeps them fresh on its own schedule.
* ci(research): address review feedback — label cleanup, delimiter, artifact
Three small follow-ups from the AI review on this PR:
1. Label cleanup on build-query failure. The post-comment job had
`if: always() && needs.research.result != 'skipped'`, which meant
that if build-query failed, research was skipped and the entire
post-comment job (including the label-removal step) was skipped
too — leaving a stuck `ldr_research` label on the PR/issue.
Switch to `if: always()`; the download and post steps already
self-guard with `needs.research.outputs.success == 'true'`, so
only the label-removal step runs in the failure path.
2. Randomized GHA output delimiter. `__LDR_QUERY_EOF__` was a fixed
string; a query containing that exact line could prematurely
terminate the multi-line output. Use $$/$RANDOM/nanosecond as the
delimiter base. Defense-in-depth — collision was already
astronomically unlikely.
3. Optional `artifact-suffix` input on the reusable workflow. Until
now the artifact name was
`ldr-research-{run_id}-{run_attempt}-{github.job}`, which
collides if a caller invokes the reusable multiple times in one
run. The Reddit follow-up will use a matrix call, so add a
caller-provided suffix now and sanitize it to artifact-safe
chars. Existing callers don't pass it; default empty preserves
today's name.
* ci(research): fix per-line truncation in reusable workflow
Two follow-ups from the second review pass:
1. The awk-based backstop truncation in `Write query to file` was
per-line (operating on $0 / length($0)), not total. A long
multi-line query with many short lines would silently bypass the
max-query-length cap. Swap for a wc -c + head -c approach that
truncates total bytes. Verified locally that a 114-byte
multi-line input with all-short-lines is now correctly truncated
to ~100 bytes.
2. Remove the unused EXIT_CODE capture in `Run LDR Research`. The
step relies on JSON validation for error detection; capturing
$? without using it was just dead code inherited from the
original workflow.
|
||
|
|
91b68acafd |
docs(ci): auto-generated workflow status dashboard (#3966)
* docs(ci): add auto-generated workflow status dashboard Adds `docs/ci/workflow-status.md` — a single page that surfaces every GitHub Actions workflow in the repo, grouped by role, with action items (disabled / stale / manual-only) at the top. Live status badges link to each workflow's runs page. Auto-generated from the workflow YAML files + the GitHub API by `scripts/generate_workflow_status.py`. Why: the GitHub Actions tab is chronological-mixed (poor "is anything red right now?" view), and the static workflow table in `CI_CD_INFRASTRUCTURE.md` drifts when workflows are added/renamed (PR #3963 fixed three factually wrong header claims for exactly this reason). A reference page that mechanically reflects current state + identifies dormant workflows answers both gaps. What's surfaced today (verified live): - **Disabled**: `nuclei.yml` (caller commented out in `release-gate.yml:177`). - **Stale**: `update-precommit-hooks.yml` — its weekly Friday cron has been **failing for 10+ consecutive weeks** (since at least 2026-03-06). This was discovered by the dashboard, not previously tracked. - **Manual-only**: `check-config-docs.yml`, `sync-main-to-dev.yml` (both intentionally manual; the dashboard shows them so they're not forgotten). Generator design notes: - Resolves reusable workflows correctly: `gh run list --workflow=X.yml` is empty for `workflow_call`-only workflows. The script walks the call graph (release.yml → release-gate.yml → semgrep.yml etc.), fetches the parent run's job list, and matches by **job key** parsed from the caller YAML (not by name heuristic — `gitleaks-scan` ↔ `gitleaks-main.yml` would otherwise collide with `gitleaks.yml`). - Picks "primary trigger" per workflow so e.g. `codeql.yml` (PR + push + cron + workflow_call) gets its glyph from the gated daily run, not a stale PR run. - Stale check walks the *recent* runs list to find last success — a workflow that ran red yesterday and green a week ago is not stale. - Manual edits outside the `<!-- BEGIN/END GENERATED -->` markers are preserved on regeneration; the timestamp lives inside the markers so post-marker content is fully user-owned. - Preflights `gh auth status` and rate limit before any per-workflow call — fails fast with actionable message instead of partial output. CI integration: - `.github/workflows/check-workflow-status.yml` runs `--check-structure` on PRs touching workflows, the dashboard, or the generator. Pure structural check (no API calls, no live data) — fast and deterministic. Live regeneration stays on demand. Cost: ~340 GitHub API calls per regeneration, ~45 sec wall-clock, ~6.8% of the 5000/hr authenticated quota. * fixup(ci): review-pass corrections to workflow status dashboard Surfaced by three rounds of code-review + correctness + security agents on the original PR. Four small fixes; no behavioral change to the generated dashboard's content. 1. **Recognize commented job keys** — `JOB_KEY_RE` now accepts an optional `# ` prefix. Previously, when an entire job block was commented out (e.g. `release-gate.yml:175-181` for nuclei), the commented `uses:` line inherited the *previous* active job's key (`gitleaks-scan`) instead of the correct `nuclei-scan`. Latent — commented entries are filtered out before reaching gated-run lookup — but would misattribute status if someone partially uncommented a block (uncommented just the `uses:` line). 2. **Pin pyyaml to ==6.0.3** in the CI workflow. The repo convention is exact `==` pins (95% of `pip install` calls in workflows); the only floating range was the one introduced by this PR. Matches pdm.lock. 3. **Validate marker order** in `merge_with_existing`. If a manual edit leaves the BEGIN/END markers reversed (e.g. mid-merge-conflict), bail to a clean overwrite instead of splicing interleaved garbage. 4. **Remove `_coerce_jq_stream`** — unused helper left behind from an earlier iteration. Zero call sites; no behavior change. Verified by re-running the generator + `--check-structure`. The rendered dashboard's only diff vs prior commit is the regeneration timestamp and live "Last activity" cells (expected — those reflect recent runs since the previous regen). * feat(ci): bucketed activity labels + auto-regen on version bump Two changes that together make the dashboard's diffs meaningful instead of noisy. 1. **Coarse activity buckets.** Replace exact UTC timestamps in every "Last activity / Last manual run / Last successful run" cell with one of: `this week`, `last week`, `2 weeks ago`, `3 weeks ago`, `last month`, `2 months ago`, `3+ months ago`, `long ago`, `never`. Calendar-day boundaries (no time-of-day jitter) so two regenerations on the same date produce **zero diff** when nothing actually drifted. Verified: same-day re-runs after stable workflow state → empty diff. Also drop the redundant `Days idle` columns from Stale and Manual-only tables (the bucket label already says it), and round the "Last regenerated" footer to a date. Why: a daily-running healthy workflow used to bump its timestamp every regen (noise). Now it stays in `this week` indefinitely, and the only diffs that land in a version-bump PR are real bucket transitions — exactly the "this slipped from last week to last month — something might be wrong" signal the dashboard exists for. 2. **Auto-regenerate on version bump.** Add a step to `version_check.yml` right after the existing `generate_config_docs.py` regen. Same pattern as the config docs precedent — the dashboard refresh rides along with each version-bump PR and is reviewable in the same diff. Costs ~340 GitHub API calls per run (well under the GITHUB_TOKEN 1000/hr workflow-runs limit). Adds `actions: read` to the job permissions block; uses `pyyaml==6.0.3` matching pdm.lock. * feat(ci): drop regen timestamp; add health banner; fix in-progress false-stale Three follow-ups to keep version-bump diffs strictly meaningful, plus two correctness fixes uncovered by repeated stability testing. 1. **Drop the "Last regenerated" date.** Git history is authoritative for "when this snapshot was taken"; embedding a date here forced a single-line diff every regeneration even when nothing else drifted. 2. **Aggregated health banner** at the top of the generated region: `**63 workflows:** 1 disabled · 1 stale · 2 manual-only · 59 active` Counts only change when a workflow shifts between {disabled, stale, manual, active} — same level of diff-stability as the per-row buckets. 3. **`?event=schedule` for own-cron workflow badges.** Verified effective by SHA-comparing badge bodies for workflows with multi-event run history. Makes the badge for e.g. `gitleaks.yml`, `fuzz.yml`, `osv-scanner.yml` reflect cron health specifically, rather than whichever PR ran last. The runs-page link uses the matching `?query=event%3Aschedule` so a click lands on the filtered run list. 4. **Fix false-stale during in-flight release runs.** Previously, when release.yml was running, gates reachable via release.yml (puppeteer-e2e-tests, ci-gate, etc.) would briefly flip to "stale" because `fetch_last_gated_run` returned the in-progress run first and `last_success` couldn't see past it. Now the function walks all 5 caller runs and returns both the latest match (for activity) and the latest successful match (for staleness), avoiding the flip. 5. **Map all GitHub conclusion enum values.** A `gitleaks.yml` run completed with `action_required` between two test regens; the glyph table didn't have it and rendered `?`. Added every documented value (`neutral`, `timed_out`, `stale`, `action_required`) and changed the unknown-fallback from `?` to em-dash, so future GitHub-side enum additions don't introduce a false-positive diff. Verified: two same-day regens after workflow state has settled now produce **zero diff**. * ci(version-bump): make workflow-status regen non-blocking Add `continue-on-error: true` to the dashboard regeneration step in version_check.yml. The regen calls ~340 GitHub API endpoints and would otherwise block the entire version-bump PR if any of them transiently fail (rate-limit hit, GitHub Actions outage, etc.). The failure mode should be "dashboard stays at the previous snapshot until next successful regen", not "release pipeline is blocked". The sibling `generate_config_docs.py` step doesn't need this — it's purely local with no external API dependency. |
||
|
|
1315b679e0 |
ci(research): switch E2E research workflow to langgraph-agent strategy (#3965)
* ci(research): switch E2E research workflow to langgraph-agent strategy
The ldr_research label runs scripts/ldr-diff-research.py, which until
now didn't pass a search_strategy and so fell through to the
quick_summary default of source_based. Switch to the agentic
langgraph-agent strategy so the workflow exercises the autonomous
research path.
- Adds --strategy CLI arg and LDR_STRATEGY env var, default
langgraph-agent (consistent with the existing --provider /
--search-tool / --iterations pattern).
- Workflow exposes LDR_STRATEGY: vars.LDR_STRATEGY || 'langgraph-agent'
so the choice is overridable per-repo via Variables.
- Notes in the script docstring that LDR_ITERATIONS=1 is a no-op for
the langgraph strategy (which reads langgraph_agent.max_iterations
from settings instead).
* ci(research): consolidate model var to LDR_RESEARCH_MODEL
The workflow had two model variables — vars.LDR_MODEL for diff mode and
vars.LDR_STATIC_MODEL for static mode — selected by a small set-model
step. Collapse to a single LDR_RESEARCH_MODEL variable shared by both
labels, mirroring the AI reviewer's vars.AI_MODEL pattern.
- Default: google/gemini-2.0-flash-001 (the value the script was
already falling through to).
- Override via Settings → Variables → New repository variable
→ name: LDR_RESEARCH_MODEL.
- The set-model step is removed; the workflow now passes the env var
through directly.
- Script reads LDR_RESEARCH_MODEL instead of LDR_MODEL.
Note: existing repo variables LDR_MODEL and LDR_STATIC_MODEL become
orphaned by this rename and can be deleted from repo settings.
* ci(research): stop overriding strategy iterations from the workflow
Previously the workflow set LDR_ITERATIONS=1 and the script forwarded
that as iterations= in kwargs. For source_based that capped research at
one iteration; for langgraph-agent it was effectively a no-op (langgraph
reads max_iterations, not iterations) but the wiring was misleading.
- Drop LDR_ITERATIONS from the workflow env block.
- Make --iterations default to None in the script and only forward it
to quick_summary when explicitly set on the CLI.
- Each strategy now uses its own setting-driven default unless
overridden — for langgraph-agent that means langgraph_agent.max_iterations
(default 50) flows through unchanged.
* ci(research): split research model into MAIN + CHEAP per label
Bring back per-label model selection with cleaner names:
- ldr_research → vars.LDR_RESEARCH_MODEL (deep PR analysis,
user-configurable)
- ldr_research_static → vars.LDR_RESEARCH_CHEAP_MODEL (regression
smoke, kept cheap)
Both default to google/gemini-2.0-flash-001 if unset, so existing
behaviour stays identical until you actually configure cheap-model.
The script and its env-var contract are unchanged — the workflow
just picks which value to feed into LDR_RESEARCH_MODEL based on the
applied label.
|
||
|
|
903a2db8af |
ci(nuclei): authenticate DAST scan + seed URLs from Flask url_map (#3698)
* ci(nuclei): authenticate scan + seed URL list from Flask url_map Previously the Nuclei DAST job ran against an unauthenticated single target (`http://localhost:5000`) with no URL list. Because Nuclei is template-driven (not a crawler) and the LDR app is auth-gated, the scanner only ever saw `/auth/login`, the index, and a couple of unauthenticated endpoints. The 2-minute scan over 10k templates produced only 5 info-level findings, all of which were intentional design choices (CSP `unsafe-inline`, SameSite=Lax, OPTIONS verb, form detection) — i.e. the gate was effectively a green-checkmark. Now the workflow: 1. Pre-creates the standard CI `test_admin` user via the existing `init_test_database.py` helper (avoids slow registration + rate limits). 2. Logs in via the real /auth/login flow with CSRF token, captures the Flask session cookie, and verifies via /auth/check. 3. Dumps the Flask url_map (excluding parameterized routes, static, and POST-only endpoints) into urls.txt so Nuclei probes every blueprint route, not just `/`. 4. Runs Nuclei with `-list urls.txt` and the authenticated session cookie via `-H "Cookie: session=..."`. 5. Filters to severity >= low to drop the four info-level findings that are intentional design choices. The session cookie is masked in logs via `::add-mask::` so it doesn't leak into the run output. Test credentials match the convention used by the playwright-webkit-tests and puppeteer-e2e-tests workflows. Adds scripts/ci/dump_url_map.py as a small helper that imports `create_app()` and iterates `app.url_map.iter_rules()` — reusable from other DAST workflows (e.g. ZAP API scan) that benefit from URL seeding. * ci(nuclei): address findings from review pass Three differentiated review agents flagged five actionable items on the authenticated-Nuclei PR. This commit addresses all five: * dump_url_map.py: stop skipping parameterized routes. Substitute a Flask-converter-appropriate placeholder (int/float→1, uuid→all-zeros, default→"nuclei") so Nuclei still probes path-traversal / parameter- injection / SQLi templates against routes like /research/<research_id> and /api/research/<research_id>/status. Without this, the bulk of the authenticated app surface (history, research, API blueprints) was silently excluded — which defeats the PR's purpose. * nuclei.yml -etags intrusive,dos,fuzz: now that Nuclei holds a real session, default templates could mutate state or DoS the runner. This is the standard exclusion set for authenticated DAST. * nuclei.yml: replace `cat cookies.txt` in the missing-cookie error branch with a column-filtered `awk` that omits the value column. The cookie is masked via `::add-mask::` after this point, so the previous branch could leak the session token in CI logs if the extraction regex ever broke. * nuclei.yml: add `sleep 2` between auth/check and the Nuclei step so the post-login background thread (settings migration + library init, see web/auth/routes.py:_perform_post_login_tasks) finishes before probes start and 500 on settings-dependent routes. * nuclei.yml: drop `# pragma: allowlist secret` on TEST_PASSWORD. The repo uses gitleaks (.gitleaks.toml already allowlists `testpass123`), not detect-secrets — the pragma was dead weight. Out of scope for this PR (recorded but not changed): - 3-way credential drift (init_test_database.py / nuclei.yml / auth_helper.js all hardcode test_admin/testpass123) - Nuclei binary version `latest` auto-updating (matches existing CI) - create_app() side effects in dump_url_map.py (currently benign) |
||
|
|
3b1d6c6b2f |
feat: redesign journal quality system with data-driven scoring and predatory auto-removal (#3081)
* feat: redesign journal quality system with data-driven scoring and predatory auto-removal
Replace the expensive LLM-based journal scoring (SearXNG + AdvancedSearchSystem
per journal) with a tiered data-driven approach:
Tier 0: DB cache (instant, from previous runs)
Tier 1: Predatory check — auto-removes results from blacklisted journals/publishers
Tier 2: OpenAlex snapshot — h-index + DOAJ from ~217K sources (downloaded at runtime)
Tier 3: DOAJ check — quality floor for open access journals (downloaded at runtime)
Tier 4: LLM analysis — SearXNG fallback (now optional, not required)
Bundled data:
- Stop Predatory Journals: 6K predatory publishers/journals (MIT license)
Downloadable data (CC0, loaded if present):
- OpenAlex sources snapshot: 217K journals/conferences with h-index, impact factor
- DOAJ journals: 22K+ journals with DOAJ Seal status
Key changes:
- Extended Journal DB model with bibliometric fields (h-index, impact factor,
DOAJ, predatory status, provenance tracking) + Alembic migration
- JournalReputationFilter now uses tiered scoring with journal dedup
- SearXNG no longer required — filter works with bundled data alone
- Predatory journals auto-removed (with whitelist override for false positives)
- Added journal filter to Semantic Scholar (was the only scientific engine without it)
- OpenAlex results now include source_id and source_type for direct lookups
- Fixed score parsing (regex instead of strict int()), prompt truncation,
fail-fast on SearXNG failures, lru_cache on name cleaning
* fix: address code review findings from Round 1
- Remove dead __check_result method, update tests to use filter_results
- Fix predatory substring matching (min length guard prevents false positives)
- Add name parameter to is_whitelisted for journals without ISSN
- Fix migration: server_default for Booleans, correct index creation logic
- Improve safety net logging in filter_results
* fix: forward journal quality fields through _get_full_content (Round 2 review)
OpenAlex _get_full_content was constructing a new result dict without
forwarding journal_ref, openalex_source_id, and source_type from the
preview. This effectively disabled journal quality filtering for all
OpenAlex results since the content filters run after full content
retrieval and couldn't find the journal_ref key.
* fix: address Round 3 review findings — bugs, thread safety, tests
Critical bug fixes:
- Add missing quality_model column to migration 0005
- Fix dedup to use richest metadata (two-pass approach)
- Predatory cache entries no longer expire via normal TTL
Performance:
- Build indexed sets for predatory data at load time (O(1) exact match)
- Add threading.Lock for singleton and lazy property loading
Data quality:
- Deduplicate predatory.json (removed 21 dupes)
Test coverage (38 new tests):
- JournalDataManager: derive_quality_score, is_predatory, is_whitelisted,
lookup_openalex, lookup_doaj, _expand_openalex_record, singleton
* fix: address all review findings — critical bugs, security, performance
Critical bugs: NASA ADS journal_ref, empty string guard, regex name
cleaning with LLM fallback, DOAJ field overwrite protection, predatory
cache TTL re-evaluation.
Security: prompt injection sanitization, log injection prevention,
Unicode NFKC normalization for predatory lookups.
Important bugs: predatory publish-after-indexes race fix, Tier 0 DB
error handling.
Performance: regex-based name cleaning eliminates ~5 LLM calls/batch.
* fix: .text() → .content for LangChain, improve regex name cleaning
Critical runtime fix:
- LangChain AIMessage has .content attribute, not .text() method.
Both LLM calls in the filter (name cleaning and Tier 4 scoring)
would crash with AttributeError at runtime. Fixed both occurrences
and updated all test mocks.
Regex improvements:
- Add bare trailing citation number stripping (", 95, 146802")
- Add volume(issue) pattern stripping ("141(5)")
- Fix month regex: require at least 1 digit after month name and
add word boundaries (prevents "May" in journal names being stripped)
- Only skip LLM when regex result has no residual numerics — complex
citation strings like "Phys. Rev. Lett. 95, 146802 (2005)" correctly
fall through to LLM instead of returning partially-cleaned name
* feat: add journal quality dashboard at /metrics/journals
Dashboard with summary stats, quality distribution chart, score source
doughnut, sortable/filterable journal table with pagination, quality
badges, trust signal icons, empty state, help panel, mobile responsive.
API: GET /metrics/api/journals — all journals + summary in one call.
* fix: XSS prevention, missing API fields, sort null handling in dashboard
Security:
- Add escHtml() helper for HTML entity escaping in all innerHTML
injections (journal names, publishers, predatory_source, source badges)
- Prevents XSS via crafted journal names containing HTML/JS
API:
- Add works_count and cited_by_count to journal API response
(bibliometric fields useful for dashboard display)
UX:
- Fix sort comparison with null values: nulls pushed to end consistently
instead of unpredictable placement from mixed Infinity/string comparison
* fix: dashboard null-quality filter, avg h-index N/A, core label
- Fix null-quality journals appearing in predatory tier filter
(quality || 0 coerced null to 0, which passed predatory check)
- Fix avg h-index showing "0" when no journals have h-index data
(API now returns null, frontend shows "—")
- Rename "Scopus Indexed" to "Core Indexed" (OpenAlex is_core
is CWTS core status, not Scopus indexing)
* feat: SQLite reference DB for dashboard with server-side pagination
Replace client-side 212K journal array with a shared read-only SQLite
database built from bundled JSON on first access. Near-zero RAM usage.
* perf: split summary from pagination queries in journal dashboard
Summary stats + chart data (3 SQL queries, ~130ms) are now fetched
only on initial page load via include_summary=true param. Subsequent
pagination, sorting, and filter changes only fetch the journal page
(1 query, ~7ms), making navigation feel instant.
* fix: expose Chart.js globally, split summary from pagination queries
- Add window.Chart = Chart in app.js so inline scripts can use Chart.js
(was imported but never exposed on window — caused ReferenceError)
- Split summary from pagination: include_summary=true only on initial
load, page/filter/sort skip the 3 extra SQL queries
- NOTE: run `npm run build` to rebuild the Vite bundle
* fix: guard Chart.js usage and defer initial load for module script timing
The Vite bundle loads as type="module" (deferred), but the inline
script in journal_quality.html runs immediately. Chart is not yet on
window when the script executes, causing ReferenceError that kills
the entire script block including the data loading call.
Fix: guard Chart usage with typeof checks, defer loadJournalPage
to window.onload so module scripts have finished executing.
* feat: upgrade journal filter logs from debug to info level
Users can now see the tiered scoring process in their logs:
- Tier 0: cache hit with score
- Tier 1: predatory detection + whitelist override
- Tier 2: OpenAlex match with h-index
- Tier 3: DOAJ match with seal status
- Tier 4: LLM analysis result
- Summary: passed/below-threshold/predatory breakdown
* fix: add 'the' prefix fallback for journal name lookups, add lookup logs
Many OpenAlex journals start with 'The ' (e.g., 'The Astrophysical
Journal Letters') but ArXiv journal_ref omits it. Now tries with/without
'the ' prefix when exact match fails — fixes ~5K potential Tier 2 misses
that would unnecessarily fall through to expensive Tier 4 LLM analysis.
Applied to both JournalDataManager (in-memory) and JournalReferenceDB
(SQLite). Added debug-level logs for lookup hits/misses.
* feat: quality tags in sources, sidebar menu, documentation
- Attach journal quality score to each result in filter_results
- Display quality tags in research output source lists:
[Q1 ★★★★★] for elite, [Q2 ★★★] for moderate, etc.
- Add "Journals" item to sidebar under Analytics section
- Create docs/journal-quality.md with full system documentation
* fix: restore docstrings, increase DOAJ Seal score, fix truncated file
Address djpetti's review comments:
- Restore full Args/Returns docstrings on __init__, create_default,
__db_session, __make_search_system, __clean_journal_name,
__analyze_journal_reputation, __save_journal_to_db
- Remove "unlike the previous version" reference from create_default
- Add clarifying comment on regex vs LLM name cleaning tradeoff
- Increase DOAJ Seal score from 6 to 7 (2-point spread vs 1-point)
- Fix file truncation from disk-full error (line 763)
* refactor: move build logic into journal_reference_db module
Eliminate sys.path hack, make build logic importable. Script is now
a thin CLI wrapper. derive_quality_score imported from data_manager
(canonical copy) instead of duplicating.
* fix: review findings — docs, sidebar, dashboard, test gaps
Address final review round findings:
- Fix DOAJ Seal score in docs (6→7)
- Sidebar: use url_for() instead of hardcoded URL
- Template: set active_page='journal-quality' for sidebar highlight
- Rename stat-scopus to stat-seal with label "DOAJ Seal" (was mislabeled)
- Always use window.onload for initial load (readyState fast path unsafe)
- Add tests for _format_quality_tag (6 tests, all 5 tier branches + None)
- Add tests for "the" prefix fallback in lookup_source (2 tests)
* feat: add CORE conference rankings (795 CS conferences)
Bundle CORE Rankings (ICORE2026) for automatic conference scoring:
A*→9, A→7, B→5, C→4. Acronym + proceedings prefix matching.
Eliminates Tier 4 LLM calls for major CS conferences.
* feat: add data source attribution to journal quality dashboard
Credit the open academic data projects that make the dashboard possible:
OpenAlex (CC0), DOAJ (CC0), CORE Rankings, Stop Predatory Journals (MIT).
Displayed as an attribution section at the bottom of the page.
* fix: remove CORE conference data (no open license)
CORE Rankings are copyrighted (c) 2013 Computing Research & Education
with no published open license. Redistribution in an MIT project is
not permitted without explicit permission.
Removed core_conferences.json from bundled data. The build function
_load_core_conferences gracefully returns {} when the file is absent.
Conference matching still works via OpenAlex data + proceedings prefix
stripping.
Verified remaining data licenses:
- OpenAlex: CC0 Public Domain (confirmed)
- DOAJ metadata: CC0 (confirmed on doaj.org)
- Stop Predatory Journals: MIT License (confirmed in GitHub LICENSE)
* docs: add data source attribution to README, docs, code, and dashboard
Credit open academic data projects at multiple touchpoints:
- README.md: Journal Quality feature links to data sources
- docs/journal-quality.md: expanded attribution table with websites
- data/__init__.py: license details per bundled file
- journal_reference_db.py: data sources in module docstring
- Dashboard: attribution section with links (already added)
All bundled data verified: OpenAlex (CC0), DOAJ metadata (CC0),
Stop Predatory Journals (MIT).
* fix: DOAJ Seal score consistency across all tiers
Tier 2 (OpenAlex) now cross-references DOAJ for Seal status via
dm.has_doaj_seal(issn). Tier 3 now calls derive_quality_score
instead of hardcoding score=6. All tiers consistently score
DOAJ Seal at 7. Fixed docs inconsistency.
* feat: add CitationMetadata model for structured academic metadata
New citation_metadata table stores bibliographic data on academic
research sources using CSL-JSON vocabulary. 1:1 with ResearchResource.
- CitationMetadata model: doi, arxiv_id, pmid, authors, year,
volume, issue, pages, container_title, journal_id FK, csl_json
- Migration 0006: create table + indexes
- citation_normalizer.py: engine-specific → CSL-JSON normalization
- extract_links: preserve citation fields (was dropping 90% of data)
- research_sources_service: create CitationMetadata for academic sources
- Quality never stored — derived via journal_id at query time
* refactor: simplify Journal table to only cache Tier 4 LLM results
Tiers 1-3 use bundled data (instant, no caching needed). Only Tier 4
(LLM) results cached in DB. Wire up journal_id FK on CitationMetadata.
* feat: auto-download journal data from GitHub Releases
Replace bundled data files with on-demand download:
- journal_data_downloader.py: fetch from GitHub Releases on first use
- Data in user dir (not package dir, read-only in pip installs)
- Dashboard shows download banner when data missing
- API: GET/POST /metrics/api/journal-data/{status,download}
- predatory.json (307KB) stays bundled, large files never in git
* refactor: fetch journal data from APIs instead of GitHub Releases
Fetch directly from OpenAlex and DOAJ public APIs. No redistribution
concerns — data fetched fresh from CC0 sources (~3 min first run).
* fix: review findings — h_index=0 edge case, dead code, missing field
- derive_quality_score: h_index=0 no longer bypasses DOAJ Seal score
(0 means newly indexed, not low quality)
- citation_normalizer: remove dead arxiv check in detect_engine
- extract_links: add source_engine to preserved fields
- paths.py: fix stale docstring (GitHub Releases → APIs)
* fix: DB race condition and journal name normalization (Round 3 review)
- Wrap __save_journal_to_db commit in try/except to handle concurrent
inserts gracefully (rollback + warning) instead of incorrectly
incrementing the SearXNG failure counter
- Add geographic qualifier stripping to regex cleaner: "(London)",
"(New York)", "(US)" etc. are now stripped deterministically,
preventing duplicate scoring of the same journal under variant names
* fix: DB race condition and journal name normalization (Round 3 review)
- S2 close() now calls super().close() to properly clean up the
JournalReputationFilter (SearXNG engine + LLM). Before this fix,
adding content_filters to S2 created a resource leak since S2's
close() override didn't delegate to BaseSearchEngine.close().
* fix: DB race condition and journal name normalization (Round 3 review)
- Fix predatory substring matching: check both directions for renamed
publisher variants while keeping >= 10 char guard
- DB cache read: logger.exception for stack trace preservation
- Model Boolean columns: add server_default=sa_false()
- Migration downgrade: drop indexes before columns
* fix: correct url_to_quality type annotation after merge (Round 4 review)
Type was `dict[str, dict]` but values are `int` scores from the journal
quality filter. Changed to `dict[str, int]`.
* fix: CI failures — sensitive logging and file write allowlist
- journal_data_downloader: use logger.exception() instead of f-string
with exception variable (sensitive-logging check)
- Add journal_data_downloader.py to file-write security check allowlist
(writes public CC0/MIT journal metadata, not user data)
* fix: skip journal reference DB tests when DB not built (CI timeout fix)
The test fixture was calling db.available which triggers _get_conn()
which auto-downloads 200K+ sources from OpenAlex API. In CI this caused
60s timeouts on 26 tests. Now checks db_path.exists() directly.
* fix: renumber migration 0005 → 0007 to resolve multiple-heads conflict
Main already has 0005_add_resource_document_id and 0006_add_citation_metadata.
Our migration was also numbered 0005, causing Alembic to reject login with
"multiple heads" error. Renumbered to 0007 with down_revision=0006.
* fix: align test mock chains with real Tier 0 DB query pattern
Tests were mocking .filter_by().first() but real code does
.filter_by().filter(score_source=="llm").first(). Fixed mock chains
to match. Also fixed docs typo: reanalysis_period default 265 → 365.
* fix: journal dashboard showing "not installed" when reference DB exists
get_journal_data_status() only checked for raw JSON source files, not
the compiled journal_reference.db. If the DB existed without source
JSONs (e.g., after cleanup), the dashboard refused to load.
* feat: add DOI-based venue identification and conference detection
Adds a pre-enrichment layer that resolves paper DOIs to OpenAlex source
IDs via batch lookup (up to 50 DOIs per HTTP request). This gives the
journal quality filter a reliable ID-based lookup path instead of
fragile name matching.
Changes:
- New: openalex_enrichment.py — batch DOI → source_id resolution
- Integration hook in search_engine_base.py for scientific engines
- Conference detection heuristic as fallback for papers without DOI
- Year stripping in OpenAlex lookup: "NeurIPS 2023" → "NeurIPS"
- NASA ADS now extracts DOI to result dict
- Fix stale AdvancedSearchSystem mocks in tests
* fix: handle missing thread context in preview filter phase
The journal filter runs as a preview_filter (before LLM relevance) for
instant data lookups. But DB operations (Tier 0 cache, save) require
thread context which isn't available in the preview phase.
Fix: __db_session() returns None when no context available. Callers
skip DB operations gracefully — data-only tiers (1-3) still work.
* feat: disable Tier 4 LLM journal scoring by default (too slow)
* feat: institution scoring tier + DataSource refactor
- New DataSource ABC + registry under utilities/data_sources/ unifying
openalex, doaj, jabref, predatory, and institutions sources
- Add InstitutionSource (OpenAlex Institutions, ~123K records) for
affiliation-based scoring of preprints
- Add Tier 3.5 (institution lookup) to journal_reputation_filter
for the no-journal_ref salvage path and as a max() lift for
preprint repositories with weak Tier-2 scores
- Extract author affiliations in OpenAlex search engine
- Wire JournalReputationFilter into PubMed engine and fix journal_ref
field aliasing
- Tighten regex cleaner for journal_ref (year/month/volume debris)
- Delete bundled src/local_deep_research/data/ — all sources now
fetched at runtime with shared auto_download policy
- Dashboard banner shows all academic data sources with license + status
* refactor: consolidate journal-quality system into one package with SQLAlchemy
- New package src/local_deep_research/journal_quality/ groups all
journal-related modules (downloader, db, models, scoring, data_sources)
- Single source of truth: gz files compile into one journal_quality.db
via build_db(); JournalDataManager dict-based loader is deleted
- SQLAlchemy 2.0 ORM throughout (models.py + db.py); filter call sites
unchanged thanks to dict-shaped lookup return values
- Read-only enforcement at three layers: SQLite mode=ro&immutable=1,
POSIX chmod 0o444 after build, and a pre-commit hook that bans
cross-module writable opens of journal_quality.db
- Downloader rebuilds the DB synchronously after each successful fetch
- New tables: predatory_journals/_publishers/_hijacked, institutions,
abbreviations
- Tests migrated to tests/journal_quality/; 207 tests pass
* fix: P0/P1 bugs from journal-quality code review
- P0: flag hijacked journals as predatory in _populate_sources
(loaded into predatory_hijacked but never checked against sources)
- P0: insert DOAJ-only journals (~8K rows) via second pass over
doaj_data; previously only OpenAlex venues entered the DB
- P0: replace `mod._ref_db = None` with `reset_db()` in metrics
rebuild route (the singleton attr is `_db`, not `_ref_db`)
- P0: change JournalQualityDB._lock to RLock to prevent first-run
deadlock (_ensure_engine → build_db → reset_db re-acquires lock)
- P1: dedup sources on (name_lower, issn) so print + electronic
ISSN variants both survive; drop unique=True on Source.name_lower
- tests: cover hijacked, DOAJ-only, and dual-ISSN cases
* fix: resolve CI failures on journal-quality refactor
- pre-commit: add missing .pre-commit-hooks/check-journal-quality-readonly.py
to git (file existed locally but was never committed, so CI couldn't
exec it)
- file-writes scan: extend allowlist to cover the new
journal_quality/downloader.py and journal_quality/data_sources/*.py
paths (the old `journal_data_downloader.py` entry no longer matches
after the package move)
- mypy: fix 12 errors in journal_quality/db.py
- explicit list[] annotation on `wheres`
- dict comprehension on Row sequence in get_source_distribution
- wrap loader returns in dict() so SQLAlchemy stub Any-types resolve
- type: ignore[arg-type] on bulk_insert_mappings (known stub gap;
SQLAlchemy 2.x types accept type[T] at runtime but stubs say Mapper)
- CodeQL py/incomplete-url-substring-sanitization: anchor doi.org URL
parsing on scheme prefixes instead of substring `in` check
* refactor: address djpetti review comments on journal quality system
Tier 4 LLM scoring is now opt-in via the new
search.journal_reputation.enable_llm_scoring setting (default off) instead
of being unreachable behind a hardcoded flag. The redundant in-process
lru_cache on the LLM analyzer is gone - Tier 0 (DB cache) already covers
repeat lookups, and keeping the cache only masked DB write failures.
Trailing-year stripping for conference names ("NeurIPS 2023" -> "NeurIPS")
moves into __regex_clean_journal_name where it belongs, replacing the
post-hoc retry block in __score_journal.
DOAJ Seal score bumped 7 -> 8 to reflect the certification meaning more
faithfully (top ~10% of DOAJ journals, curated against best OA practices).
The h-index >= 7 tier mapping is unchanged so no test fixtures break.
Adds /api/journals/research/<id> + a "View Journals" button on the research
details page so users can see the journals encountered in a single research
session, not just the cross-research aggregate. Joins through
CitationMetadata -> ResearchResource without schema changes.
Adds quartile (Q1-Q4) as a display-only signal on Source rows, derived at
build time from cited_by_count percentile within each source_type. Quality
scoring is unchanged - h-index remains the canonical bibliometric.
Magic numbers in scoring.py / db.py extracted into a Journal Quality
Scoring Thresholds section in constants.py. Institution scoring is now
consolidated to scoring.py::institution_score_from_h_index, fixing an
unreachable branch in db.py::score_from_affiliations along the way.
Misc:
- OPENALEX_ENRICHMENT_API_TIMEOUT lifted into constants.py (was hardcoded 15)
- Deleted scripts/build_journal_reference_db.py - auto-build on first
access plus the dashboard rebuild button cover all use cases
* perf(journal-quality): switch data sources to bulk dumps + release-gate test
Replace paginated REST API fetches with public bulk snapshots:
- OpenAlex Sources: S3 manifest + parts (~280K, ~270s vs 5-10min)
- OpenAlex Institutions: S3 manifest + parts (~120K, ~156s vs 5-10min)
- DOAJ: single CSV dump (~22K, ~2s)
Bulk paths are the OpenAlex/DOAJ-recommended way to pull the full
dataset and eliminate hundreds of rate-limited requests on every
"Download Data" click. Compact output formats are preserved so the
build pipeline and runtime accessors are unchanged.
Add a release-gate integration test + dedicated workflow that
downloads all 5 sources in parallel, builds the reference DB end
to end, and scores a real journal. Catches upstream schema breaks
(renamed fields, restructured dumps) before we cut a release.
* test(journal-quality): exercise dashboard query methods in release gate
* docs(journal-quality): credit upstream data providers on dashboard
* docs(journal-quality): add 'How It Works' tab explaining tiered scoring
* fix(journal-quality): score unknown journals as 3, log institution names
- Lower truly-unknown journals (no OpenAlex/DOAJ/Tier 3.5 hit) from
pass-through to score 3 so the default threshold (4) actually filters
them. Distinct from predatory (1) — these are merely unknown.
- Fix AttributeError in OpenAlex search engine when work has DOI key
with explicit None value: use \`work.get('doi') or work_id\` instead
of \`work.get('doi', work_id)\`. Was dropping ~14% of results per
search before they reached the filter.
- Include matching institution names in Tier 3.5 log lines so the
affiliation salvage path is debuggable.
* refactor(journal-quality): demote per-journal scoring logs to DEBUG, log institutions on score-3
* fix(openalex): handle None values for display_name, id, source.id
OpenAlex routinely returns these keys with explicit null values, which
bypassed the dict.get default and crashed downstream string operations
(slicing, split). Same antipattern as the 'doi' fix in
|
||
|
|
bab0f61b66 |
chore(hooks): require UtcDateTime in migrations too (#3523)
Tighten check-datetime-timezone so the UtcDateTime rule applies to both models and migrations. Supersedes the inverted approach in #3515, which tried to accept sa.DateTime(timezone=True) inside migrations. - Rewrite the AST walker: handle sa.Column / bare Column, positional type arg at any index, bare Column(UtcDateTime) without parens (the hook's own example), and ast.IfExp with both branches inspected independently so a violation in either arm is still flagged. - Anchor the path filter on src/local_deep_research/ to stop false-positives on tests/database/models/ and partial-name matches like database/models_backup/. - Update .pre-commit-config.yaml name/description and the stale CI_CD_INFRASTRUCTURE.md hook table entry. - Add tests/hooks/test_check_datetime_timezone.py with 20 cases: violations (models / migrations / conditional types / batch runs / bare names), allows (UtcDateTime with import, combo import order, empty / syntax-error files), and path-filter boundaries. |
||
|
|
12160e26e1 |
chore(lint): add ruff rules for logging, performance, exceptions, and print detection (#3211)
* chore(lint): add ruff rules for logging, performance, exceptions, and print detection Add wave 2 lint rules: G, PERF, RET, TRY, T20, C4, ERA. All existing violations are suppressed via ignore/per-file-ignores so this config change is merge-safe. Follow-up PRs will fix violations and remove the ignore entries incrementally. * fix(lint): exempt pre-commit hooks from T201 print rule (#3270) Pre-commit hooks are CLI scripts where print is the intended output interface, same as scripts/ and cli/ directories already exempted. * fix(lint): fix all low-count ruff violations instead of suppressing them (#3275) * fix(lint): replace manual dict-building loops with dict comprehensions (PERF403) * fix(lint): replace bare Exception raises with specific built-in types (TRY002) Replace all `raise Exception(...)` in production code with appropriate built-in exception types: RuntimeError for operational/state failures, ValueError for invalid data, and ConnectionError for HTTP errors. * fix(lint): resolve TRY004 and PERF402 ruff violations Use TypeError instead of ValueError for isinstance/issubclass type checks (TRY004), and replace manual for-loop list copies with list.extend() (PERF402). * fix(lint): fix all low-count ruff violations instead of suppressing them Fix all violations for 15 ruff rules that had ≤10 occurrences each, rather than suppressing them with ignore directives: - TRY002: raise-vanilla-class → use specific built-in exceptions - TRY004: type-check-without-type-error → use TypeError - C408: unnecessary-collection-call → use dict/list literals - C401: unnecessary-generator-set → use set comprehensions - C416: unnecessary-comprehension → use list()/set() - C414: unnecessary-double-cast-or-process → simplify - PERF403: manual-dict-comprehension → use dict comprehensions - PERF102: incorrect-dict-iterator → use .values()/.keys() - PERF402: manual-list-copy → use list.extend() - RET503/RET506/RET507/RET508: superfluous else after return/raise/continue/break - RET501/RET502: unnecessary/implicit return None Adds per-file-ignores for tests/ and examples/ where these patterns are acceptable (e.g. bare Exception in tests, dict() calls in fixtures). * fix(lint): enforce E722, ERA001, RET505 and fix pre-commit RET503 gap (#3276) Remove three rules from the global ignore list by fixing all violations: E722 (bare except) — 6 violations in tests: Replace `except:` with `except Exception:` to avoid swallowing KeyboardInterrupt and SystemExit. ERA001 (commented-out code) — 25 violations: Delete 18 true positives (dead variables, disabled debug logs, commented-out imports). Add `# noqa: ERA001` to 7 false positives (template instructions, type annotations, documentation comments). RET505 (superfluous else after return) — 413 violations: Auto-fix all occurrences. Also fixes 5 cascading RET506/RET507 violations exposed by the RET505 removals. Pre-commit hooks gap: Add RET503 to `.pre-commit-hooks/**` per-file-ignores alongside T201. * fix(lint): enforce RET504 and TRY301 — fix all violations (#3279) * fix(lint): enforce RET504 — collapse unnecessary assign-before-return Auto-fix all 46 RET504 violations via ruff unsafe-fixes: collapse `result = expr; return result` into `return expr`. Remove RET504 from global ignore list. Add to tests/examples per-file-ignores where intermediate variables aid test clarity. Also removes TRY301 from global ignore (violations fixed in next commit). * fix(lint): enforce TRY301 — fix raises inside broad try/except blocks Structural fixes for 65 TRY301 violations: Security-critical fixes: - url_validator.py: move 6 validation raises before try block, replace isinstance-based re-raise with specific except clause - path_validator.py: move validation outside try block - env_settings.py: separate parsing (try) from validation (outside) Route/service fixes: - research_routes.py: replace raise-then-catch with direct error return - mcp/server.py: move all 7 tool validations before try blocks - news/api.py: move validation before try, noqa for db-session raises - notifications: move rate limit and URL validation before try blocks - iterative_refinement_strategy.py: move JSON validation after try Added noqa for intentional patterns: re-raise in except handlers, nested function definitions, db-session-dependent checks, rate limit re-raises for base class retry logic. * merge: resolve conflicts between wave2 lint branch and main Resolve 14 merge conflicts by always starting from main's version and re-applying lint fixes on top: - mcp_strategy.py, ollama.py, security_settings.py, delete_routes.py: Take main's code, re-apply RET505 (remove else: after return) - mcp/server.py (3 conflicts): Take main's ValidationError handlers and set_settings_context, re-apply TRY301 fixes, fix sensitive data logging - research_routes.py: Take main, fix duplicate block (merge artifact) - settings_routes.py: Take main's default-settings fallback feature - meta_search_engine.py, parallel_search_engine.py: Take main's get_available_engines delegation, delete unreachable code - search_engine_ddg.py, search_engine_google_pse.py: Take main's sanitization, re-apply RET506 (if not elif after raise) - rag_routes.py: Accept main's deletion (route moved to delete_routes) - encryption_check.py: Accept main's deletion (dead code) - test_storage_coverage.py: Remove broken test classes referencing undefined stubs - pre-commit hooks: extend per-file-ignores for ERA001, RET504 * fix: revert ValueError→TypeError changes that break tests and API contracts Revert TRY004 fixes in 3 files where changing ValueError to TypeError would break existing tests and HTTP status code contracts: - card_factory.py: 5 tests assert pytest.raises(ValueError) - base_rater.py: flask_api.py catches ValueError for HTTP 400 responses; TypeError would fall through to HTTP 500 - full_search.py: test asserts pytest.raises(ValueError) Add # noqa: TRY004 to suppress the lint rule on these lines. * fix: move benchmark_data check back inside try block The ValueError for missing benchmark_data must be inside the try/except so the except handler can mark the run as FAILED in the database. Without this, the exception propagates unhandled in a daemon thread, leaving the benchmark run stuck in RUNNING state permanently. * chore(lint): remove ERA rule and suppress TRY004 globally Remove ERA (eradicate — commented-out code detection) from ruff select: - 28% false positive rate in our codebase (7 of 25 violations) - No major Python project enables it (Django, FastAPI, Pydantic, Airflow) - Ruff itself doesn't use it; autofix was demoted to manual-only - 172 noqa suppressions provided zero enforcement value Suppress TRY004 (type-check-without-type-error) globally: - Ruff maintainer agreed the autofix "can change functionality" - We already had to revert 3 TypeError changes that broke tests and HTTP 400→500 API contracts - Django, Flask, pandas all use isinstance + ValueError routinely - Pylint has no equivalent rule; near-zero PyPI adoption Remove all 173 # noqa: ERA001 and 49 # noqa: TRY004 comments from the codebase — no longer needed with rules disabled/suppressed. * fix: resolve mypy errors, failing MCP test, and TRY301 noqa - search_engine_factory.py: restore typed intermediate variable to fix mypy no-any-return (RET504 collapse lost the type annotation) - search_engine_pubchem.py: add explicit list[str] type annotation - test_edge_cases.py: fix assertion that expected engine name in sanitized error message - mcp/server.py: add noqa: TRY301 to validation raises inside try blocks (from main's new merge code) |
||
|
|
b28c80466c |
refactor: cleanup remaining verified dead code across 5 areas (#3263)
* refactor: cleanup remaining verified dead code across 5 areas Dead templates, functions, storage ABCs, eslint duplicate, dev scripts. All verified by 40 agents (20 scanning + 20 verification). * revert: keep 3 dev scripts that have active references - regenerate_golden_master.py: called by pre-commit hook .pre-commit-hooks/check-golden-master-settings.py - restart_server.sh: documented in API testing guide, examples, and multiple README files - run_tests.py: referenced in CONTRIBUTING.md testing section Added inline comments noting the references so future cleanup attempts don't remove them without updating dependents. * revert: keep restart_server_debug.sh dev script * revert: keep debug_pytest.py and stop_server.sh dev scripts Small utility scripts that cost nothing to keep and are useful for developers debugging CI failures and managing the dev server. * docs: add do-not-remove comments to all dev scripts Each script now documents why it must be kept: - regenerate_golden_master.py: pre-commit hook dependency - restart_server.sh: documented in API guides and examples - restart_server_debug.sh: companion to restart_server.sh - run_tests.py: referenced in CONTRIBUTING.md - debug_pytest.py: developer utility for CI failure reproduction - stop_server.sh: companion to restart_server.sh |
||
|
|
9988f70318 |
refactor: remove fallback LLM (FakeListChatModel) from all providers (#2717)
* cleanup: remove @pytest.mark.requires_llm decorators and fallback LLM doc references Remove the `@pytest.mark.requires_llm` decorator from all test files since the fallback LLM infrastructure is being removed. Update docs to remove references to `LDR_TESTING_USE_FALLBACK_LLM` and `LDR_USE_FALLBACK_LLM` environment variables from troubleshooting and CI configuration tables. * test: remove fallback LLM references from test files Remove all fallback-related test code: TestGetFallbackModel classes, FakeListChatModel assertions, check_fallback_llm parameters, and LDR_USE_FALLBACK_LLM skipif markers. Replace fallback-returning tests with ValueError-expecting tests for missing API keys and unavailable providers. * cleanup: remove remaining use_fallback_llm references from source and tests Remove use_fallback_llm() imports and calls from db_utils.py and rate_limiting/tracker.py. Clean up test files that referenced check_fallback_llm, get_llm_setting_from_snapshot, and LDR_USE_FALLBACK_LLM env var. * cleanup: remove remaining fallback LLM references from test files Remove all use_fallback_llm mocks, LDR_USE_FALLBACK_LLM env var checks, and related skip logic from test files since the fallback LLM feature has been removed from source code. - test_db_utils.py: Remove use_fallback_llm mock patches from 4 tests - test_rate_limiter.py: Replace use_fallback_llm mock with is_ci_environment - test_tracker.py: Replace fallback mode test with CI mode test - test_tracker_quality_stats.py: Remove 8 use_fallback_llm decorators - test_openai_api_key_usage.py: Remove LDR_USE_FALLBACK_LLM skipif - test_llm_provider_integration.py: Remove LDR_USE_FALLBACK_LLM skipif - test_ci_config.py: Remove LDR_USE_FALLBACK_LLM env var setting - test_search_system.py: Remove LDR_USE_FALLBACK_LLM skipif - run_all_tests.py: Remove LDR_USE_FALLBACK_LLM log line - test_env_auto_generation.py: Remove testing.use_fallback_llm mapping - test_lmstudio_provider.py: Fix docstring referencing removed function * refactor: remove fallback LLM from providers, settings, CI, and tests - Remove FakeListChatModel import and get_llm_setting_from_snapshot wrapper - Update all provider imports to use get_setting_from_snapshot directly - Remove LDR_USE_FALLBACK_LLM env var from CI workflows - Remove use_fallback_llm setting and registry function - Remove skip_if_using_fallback_llm fixture from conftest.py - Update tests to expect ValueError instead of fallback model * refactor: remove fallback model from llm_config and thread_settings - Remove get_fallback_model() and all call sites in get_llm() - Replace fallback returns with descriptive ValueError raises - Remove LDR_USE_FALLBACK_LLM env check block from get_llm() - Remove check_fallback_llm parameter from get_setting_from_snapshot - Remove get_llm_setting_from_snapshot convenience wrapper - Add ValueError re-raise in Ollama model-not-found path - Regenerate golden master with ensure_ascii=False for proper Unicode * fix: restore requires_llm skip mechanism and fix CI test failures Three fixes for CI regressions from fallback LLM removal: 1. Restore @pytest.mark.requires_llm decorator and skip fixture (skip_if_no_real_llm) that checks LDR_TESTING_WITH_MOCKS env var. Re-add decorators to 17+ tests across 9 files that need real LLMs. 2. Fix type coercion in test_openai_api_key_usage.py by converting fixture from dict format to simplified raw-value format, bypassing get_typed_setting_value string coercion. 3. Fix golden master format mismatch by adding ensure_ascii=False to test serialization to match regeneration script. Narrow pre-commit hook trigger to only defaults/*.json files. * fix: remove remaining fallback LLM references from coverage tests - Delete TestGetFallbackModel class from test_llm_config_coverage.py (5 tests that imported removed get_fallback_model) - Update test_llm_config_missing_coverage.py: 6 tests that expected FakeListChatModel fallback now expect ValueError/exception raises - Remove use_fallback_llm mocks from test_rate_limiting_tracker_coverage.py (delete 4 fallback-specific tests, fix 9 tests) - Remove use_fallback_llm mocks from rate_limiting/test_tracker_coverage.py (fix _make_tracker helper and 25 tests) - Add @pytest.mark.requires_llm to test_analyze_documents_minimal - Merge upstream main to pick up new coverage test files * fix: remove dead LDR_USE_FALLBACK_LLM env var from accessibility tests CI This env var was added to the accessibility test server but has no effect since the fallback LLM code was removed. * fix: align pre-commit hook description and error listing with defaults-only trigger The hook file pattern was narrowed to defaults/ only, but the description and error-listing code still referenced config/. Remove dead config/ path from the file listing and update messaging to match. * fix: update test_llm_config_deep_coverage.py for fallback LLM removal File was added on main after branch diverged. Remove TestGetLlmFallbackEnvVar class (tests removed functionality) and update test_provider_lowercased to expect ValueError instead of fallback model. * fix: improve "none" provider error message and fix stale CI-mode test - Add explicit handler for provider="none" with user-friendly message instead of misleading "this is a bug" error - Fix test_load_estimates_skipped_in_ci_mode: _load_estimates no longer checks is_ci_environment, test now correctly verifies deferred loading behavior in non-programmatic mode - Update 4 test assertions to match new "none" provider error message |
||
|
|
b524bd9a45 |
fix: debug logging now visible on stderr when LDR_APP_DEBUG=true (#2761)
* fix: debug logging now visible on stderr when LDR_APP_DEBUG=true config_logger() had stderr hardcoded to INFO level regardless of the debug flag — only diagnose= was toggled, not the log level itself. DEBUG entries went to the DB sink but never to the console, making LDR_APP_DEBUG ineffective for local debugging via log files. Also adds restart_server_debug.sh for convenient debug-mode startup with LDR_APP_DEBUG=true and LDR_LOG_SETTINGS=summary. * fix: log warning when debug mode is active Emits a WARNING-level message on startup when LDR_APP_DEBUG=true so it's immediately visible in the logs that sensitive data may be logged. |
||
|
|
09f306f0c1 |
fix: set HOME=/home/ldruser in entrypoint before dropping to non-root (#2520)
setpriv changes UID/GID but does not update HOME. Without this, HOME stays as /root/ and platformdirs resolves data paths to /root/.local/share/local-deep-research/ which ldruser cannot write to. This causes PermissionError on startup when LDR_DATA_DIR is not explicitly set (e.g. in the docker-multiarch-test workflow). The Dockerfile already uses this pattern during build (line 246) but the entrypoint was missing it. |
||
|
|
04a55f106f |
security: replace gosu with setpriv and suppress 8 unfixable CVEs (#2501)
Replace gosu (Go binary) with setpriv (util-linux, already in base image) for privilege dropping in the container entrypoint. This eliminates 7 Go stdlib CVEs (CVE-2025-4674, CVE-2025-61732, CVE-2025-61731, CVE-2025-47907, CVE-2025-61729, CVE-2025-58187, CVE-2025-58188) by removing the only Go binary from the image. For the remaining 8 CVEs that are unfixable in Debian Trixie (libtiff6, coreutils, libc6, Chrome DevTools), add documented suppressions to both .grype.yaml (new) and .trivyignore with review date 2026-09-01. Also updates the base image digest to pick up latest security patches, and bumps Playwright from 1.57.0 to 1.58.0 (matching pyproject.toml) with the corresponding chromium-1208 revision. |
||
|
|
d93adba1cd |
feat: add one-command golden master regeneration script (#2475)
Adds scripts/dev/regenerate_golden_master.py that regenerates the golden master settings snapshot in a single command, replacing the previous 3-step process (delete → pytest → stage). Updates the pre-commit hook message to reference the new script. |
||
|
|
951a97f375 |
fix(docker): add diagnostic error message when gosu fails in LXC (#2373)
If gosu can't switch users (e.g. LXC blocks CAP_SETUID/CAP_SETGID), print a clear error message with actionable fix instructions instead of gosu's cryptic "operation not permitted" error. |
||
|
|
20fedc67b1 |
docs: add config docs generator script (#2134)
* docs: add config docs generator script Add scripts/generate_config_docs.py that auto-generates docs/CONFIGURATION.md from default settings JSON files and env_definitions/ modules. Supports both database-managed settings and pre-database env-only settings. Extracted from PR #1393. Co-authored-by: daryltucker <daryltucker@users.noreply.github.com> * docs: improve config docs generator with auto-discovery, --check mode, and CI - Auto-discover env_definitions modules instead of hardcoding filenames - Extract additional AST fields: required, min/max_value, allowed_values, deprecated_env_var - Expand env-only settings table with Type, Required, Constraints, Deprecated Alias columns - Add --check mode (exit 1 when docs are stale) for CI validation - Add inline gitleaks:allow on key extraction line - Generate initial docs/CONFIGURATION.md covering all 18 JSON files and 5 env_definitions modules - Add check-config-docs.yml PR workflow (zero deps, stdlib only) - Add docs regeneration step to version_check.yml - Allowlist docs/CONFIGURATION.md in .gitleaks.toml (references env var names, not actual secrets) - Add comprehensive tests (27 tests: unit, integration, check mode, error handling) * docs: add CONFIGURATION.md references to README, env_configuration, and developing guides * docs: regenerate CONFIGURATION.md after merge with main Picks up db_config.cipher_memory_security default change (OFF -> ON). --------- Co-authored-by: daryltucker <daryltucker@users.noreply.github.com> |
||
|
|
07ff140c16 |
security: Docker hardening and session/debug setting tightening
Docker hardening: - Add no-new-privileges and cap_drop ALL to main LDR service - Add no-new-privileges to ollama service - Mount local_collections volumes as read-only (:ro) - Validate model name in ollama_entrypoint.sh to prevent injection - Add security warning to elasticsearch example about disabled xpack Application settings: - Make app.debug non-editable via UI to prevent enabling debug mode in production (can still be set via environment variable) - Reduce remember-me max from 90 to 30 days and default from 30 to 7 days to limit session persistence window |
||
|
|
f848e8b0c2 |
ci: Add MCP server tests workflow (#1506)
* ci: Add MCP server tests workflow Add dedicated CI workflow for testing the MCP (Model Context Protocol) server implementation. This workflow: - Runs on changes to MCP-related files - Verifies MCP module loads correctly - Tests discovery tools (list_strategies, list_search_engines, get_configuration) - Runs full MCP unit test suite with mocks - Tests MCP strategy (ReAct pattern) implementation - Verifies server startup behavior The tests use mocks to avoid requiring an LLM backend, making them fast and reliable in CI environments. Prepares CI infrastructure for PR #1366 (MCP server feature). * refactor: Move MCP smoke tests to external script Address review feedback from djpetti: - Extract MCP module loading test to scripts/mcp_smoke_test.sh - Extract MCP server startup test to the same script - Update workflow to call the external script - Add script path to workflow triggers * ci: skip MCP tests when server module not implemented The MCP server module (src/local_deep_research/mcp/server.py) is in a separate feature branch. This change makes the MCP test workflow skip gracefully when the module doesn't exist, with a clear notice. Tests will automatically run once the MCP feature branch is merged. |
||
|
|
7be59b4dc0 |
refactor: Address PR review feedback (#1570)
1. Move inline DB init script to external file (scripts/ci/init_test_database.py) for better maintainability per djpetti's suggestion. 2. Fail fast in CI when pre-created user login fails instead of falling back to slow registration. This makes debugging easier - if the CI user doesn't work, something is wrong with workflow setup and should be fixed there. Per djpetti's suggestion about developer experience. The external script is now shared between critical-ui-tests.yml and extended-ui-tests.yml, reducing duplication. Co-authored-by: Daniel Petti <djpetti@gmail.com> |
||
|
|
804cd923fd |
fix: Complete RAG Docker cache path fix and bump version to 1.3.24
Fixes remaining RAG cache path issues not addressed in #1563: 1. library_rag_service.py: Changed from Path.home() / ".cache" to get_cache_directory() / "rag_indices" to respect LDR_DATA_DIR 2. docker-compose.yml: Fixed volume mount from /root/.cache/... to /data/cache/rag_indices (app runs as ldruser, not root) 3. ldr_entrypoint.sh: Added rag_indices directory creation with proper permissions The original fix (#1563) addressed search_engine_local.py but missed library_rag_service.py which still used hardcoded Path.home()/.cache. Fixes issue reported on Discord where RAG indexing failed with: - PermissionError: [Errno 13] Permission denied: '.cache' |
||
|
|
acc5b585f0 |
Merge pull request #1191 from LearningCircuit/feat/e2e-research-test
feat: add E2E research test via PR label trigger |
||
|
|
57cda9babb | fix: remove debug output that was corrupting JSON response | ||
|
|
cf6d9f6b2c |
Merge dev into sync-main-to-dev
Resolved version conflict - keeping dev version 1.3.0 |
||
|
|
1fe134588d | debug: add logging and explicitly pass search_tool parameter | ||
|
|
c43941dda0 |
fix: use correct Serper API key settings path in E2E script
The script was setting the API key at 'search.serper.api_key' instead of the correct path 'search.engine.web.serper.api_key'. This caused the search engine factory to fail finding the key, falling back to other engines instead of using Google search via Serper. |
||
|
|
08f8a733a2 |
Fix matplotlib cache directory permissions
Matplotlib requires a writable cache directory at ~/.config/matplotlib. The entrypoint now creates this directory with proper ownership for ldruser before starting the application. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
|
|
76247ff048 |
Fix Docker volume permissions error for /data directory
Fixes PermissionError when container tries to create /data/logs and other subdirectories. Docker named volumes are created with root ownership, but the application runs as ldruser (UID 1000). Changes: - Add entrypoint script (ldr_entrypoint.sh) to handle volume setup - Install gosu for safe privilege dropping - Create required subdirectories with correct ownership - Use 700 permissions for security (owner-only access) - Remove USER directive (entrypoint handles user switching) The entrypoint runs as root to fix permissions, then drops to ldruser before starting the application. This is the standard Docker pattern for handling volume permissions with non-root containers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
|
|
e2ab7d0f20 |
fix: use formatted_findings from API and proper source format
- Use formatted_findings when available (already includes sources) - Remove custom extract_sources function - Fix source URL extraction (API uses 'link' not 'url') |
||
|
|
b9680e58c9 |
feat: add argparse, static mode, and search sources to E2E test
- Add argparse for easier local testing (per djpetti's review) - Add --mode static for regression testing with fixed query - Include search sources in JSON output and PR comments - Support both ldr_research and ldr_research_static labels - Static query: 'What is Local Deep Research and how does it work?' |
||
|
|
d171eb6b1d | fix: use valid OpenRouter model (google/gemini-2.0-flash-001) as default | ||
|
|
b6ca3cec46 | fix: pass API keys through settings snapshot | ||
|
|
d28f269d10 | fix: set llm.provider in settings to use OpenRouter instead of Ollama | ||
|
|
d4327184a4 |
feat: add E2E research test via PR label trigger
Add a reusable script and GitHub Actions workflow that tests the complete LDR pipeline (OpenRouter + Serper) by researching PR diffs. - scripts/ldr-diff-research.py: Standalone script that reads diff from stdin and outputs JSON with research results. Can be tested locally. - .github/workflows/e2e-research-test.yml: Workflow triggered by 'ldr_research' label that runs the script and posts results as a PR comment. Configurable via environment variables: - LDR_PROVIDER: LLM provider (default: openrouter) - LDR_SEARCH_TOOL: Search tool (default: serper) - LDR_MODEL: Model name (optional) - LDR_ITERATIONS: Research iterations (default: 1) Required secrets: OPENROUTER_API_KEY, SERPER_API_KEY |
||
|
|
0d26c46c8a |
Merge dev into sync-main-to-dev - resolve conflicts
Resolved conflicts: - .gitleaks.toml: Combined regex patterns from both branches, added path allowlists - pyproject.toml: Kept updated versions from dev + added hypothesis from main - __version__.py: Keep 1.3.0 from dev - news.js: Removed duplicate toggleExpanded function (already exists at line 1291) - pdm.lock: Regenerated with pdm lock |
||
|
|
309b2a619e |
Fix shellcheck warnings in all shell scripts
- Quote variables to prevent word splitting (SC2086) - Use 'read -r' to prevent backslash mangling (SC2162) - Use 'cd ... || exit' for safe directory changes (SC2164) - Use '-n' instead of '\! -z' for string checks (SC2236) - Use pgrep instead of ps | grep (SC2009) - Check exit codes directly instead of using $? (SC2181) - Declare and assign separately for exports (SC2155) - Fix unused loop variables with underscore prefix (SC2034) - Remove stray markdown backticks from ollama_entrypoint.sh |
||
|
|
cff33086ec |
fix: resolve CI test failures (actionlint and trivy-scan)
- Restore missing scripts/ollama_entrypoint.sh required by Dockerfile - Update actions/setup-python from deprecated v4 to v5 in workflow files - Fix security issue: move untrusted github.head_ref to environment variable - Fix shellcheck warnings: quote variables and use block redirects These changes address pre-commit actionlint failures and trivy-scan Docker build errors. |
||
|
|
ed0212ba53 | Delete scripts/dev/kill_servers.py | ||
|
|
ceb1526de6 | Delete scripts/ollama_entrypoint.sh | ||
|
|
6738ecf86c | Delete scripts/test_unified_indexing.py | ||
|
|
5c46491e9c | Delete scripts/create_unified_library_tables.py | ||
|
|
a0025bde7c | Delete scripts/create_integrity_tables.py | ||
|
|
590e0112c2 | unified rag and library collection this will be more maintanable | ||
|
|
f8b1477042 | Merge remote-tracking branch 'origin/dev' into sync-main-to-dev | ||
|
|
f26bb4545a |
Merge pull request #849 from LearningCircuit/LearningCircuit-patch-1
Delete scripts/check_research_db.py |
||
|
|
de4354ccfe | Merge remote-tracking branch 'origin/dev' into sync-main-to-dev | ||
|
|
e6e21d4407 |
Delete scripts/check_benchmark_db.py
Not sure if this is really usefull |
||
|
|
7087689690 | Delete scripts/check_research_db.py | ||
|
|
d0b900f8e3 | Delete scripts/check_metrics.py |