mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-13 18:44:51 +03:00
* ci(nuclei): authenticate scan + seed URL list from Flask url_map Previously the Nuclei DAST job ran against an unauthenticated single target (`http://localhost:5000`) with no URL list. Because Nuclei is template-driven (not a crawler) and the LDR app is auth-gated, the scanner only ever saw `/auth/login`, the index, and a couple of unauthenticated endpoints. The 2-minute scan over 10k templates produced only 5 info-level findings, all of which were intentional design choices (CSP `unsafe-inline`, SameSite=Lax, OPTIONS verb, form detection) — i.e. the gate was effectively a green-checkmark. Now the workflow: 1. Pre-creates the standard CI `test_admin` user via the existing `init_test_database.py` helper (avoids slow registration + rate limits). 2. Logs in via the real /auth/login flow with CSRF token, captures the Flask session cookie, and verifies via /auth/check. 3. Dumps the Flask url_map (excluding parameterized routes, static, and POST-only endpoints) into urls.txt so Nuclei probes every blueprint route, not just `/`. 4. Runs Nuclei with `-list urls.txt` and the authenticated session cookie via `-H "Cookie: session=..."`. 5. Filters to severity >= low to drop the four info-level findings that are intentional design choices. The session cookie is masked in logs via `::add-mask::` so it doesn't leak into the run output. Test credentials match the convention used by the playwright-webkit-tests and puppeteer-e2e-tests workflows. Adds scripts/ci/dump_url_map.py as a small helper that imports `create_app()` and iterates `app.url_map.iter_rules()` — reusable from other DAST workflows (e.g. ZAP API scan) that benefit from URL seeding. * ci(nuclei): address findings from review pass Three differentiated review agents flagged five actionable items on the authenticated-Nuclei PR. This commit addresses all five: * dump_url_map.py: stop skipping parameterized routes. Substitute a Flask-converter-appropriate placeholder (int/float→1, uuid→all-zeros, default→"nuclei") so Nuclei still probes path-traversal / parameter- injection / SQLi templates against routes like /research/<research_id> and /api/research/<research_id>/status. Without this, the bulk of the authenticated app surface (history, research, API blueprints) was silently excluded — which defeats the PR's purpose. * nuclei.yml -etags intrusive,dos,fuzz: now that Nuclei holds a real session, default templates could mutate state or DoS the runner. This is the standard exclusion set for authenticated DAST. * nuclei.yml: replace `cat cookies.txt` in the missing-cookie error branch with a column-filtered `awk` that omits the value column. The cookie is masked via `::add-mask::` after this point, so the previous branch could leak the session token in CI logs if the extraction regex ever broke. * nuclei.yml: add `sleep 2` between auth/check and the Nuclei step so the post-login background thread (settings migration + library init, see web/auth/routes.py:_perform_post_login_tasks) finishes before probes start and 500 on settings-dependent routes. * nuclei.yml: drop `# pragma: allowlist secret` on TEST_PASSWORD. The repo uses gitleaks (.gitleaks.toml already allowlists `testpass123`), not detect-secrets — the pragma was dead weight. Out of scope for this PR (recorded but not changed): - 3-way credential drift (init_test_database.py / nuclei.yml / auth_helper.js all hardcode test_admin/testpass123) - Nuclei binary version `latest` auto-updating (matches existing CI) - create_app() side effects in dump_url_map.py (currently benign)