local-deep-research

mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-13 18:44:51 +03:00

Files

LearningCircuit 903a2db8af ci(nuclei): authenticate DAST scan + seed URLs from Flask url_map (#3698 )

* ci(nuclei): authenticate scan + seed URL list from Flask url_map

Previously the Nuclei DAST job ran against an unauthenticated single
target (`http://localhost:5000`) with no URL list. Because Nuclei is
template-driven (not a crawler) and the LDR app is auth-gated, the
scanner only ever saw `/auth/login`, the index, and a couple of
unauthenticated endpoints. The 2-minute scan over 10k templates produced
only 5 info-level findings, all of which were intentional design
choices (CSP `unsafe-inline`, SameSite=Lax, OPTIONS verb, form
detection) — i.e. the gate was effectively a green-checkmark.

Now the workflow:
  1. Pre-creates the standard CI `test_admin` user via the existing
     `init_test_database.py` helper (avoids slow registration + rate
     limits).
  2. Logs in via the real /auth/login flow with CSRF token, captures
     the Flask session cookie, and verifies via /auth/check.
  3. Dumps the Flask url_map (excluding parameterized routes, static,
     and POST-only endpoints) into urls.txt so Nuclei probes every
     blueprint route, not just `/`.
  4. Runs Nuclei with `-list urls.txt` and the authenticated session
     cookie via `-H "Cookie: session=..."`.
  5. Filters to severity >= low to drop the four info-level findings
     that are intentional design choices.

The session cookie is masked in logs via `::add-mask::` so it doesn't
leak into the run output. Test credentials match the convention used by
the playwright-webkit-tests and puppeteer-e2e-tests workflows.

Adds scripts/ci/dump_url_map.py as a small helper that imports
`create_app()` and iterates `app.url_map.iter_rules()` — reusable from
other DAST workflows (e.g. ZAP API scan) that benefit from URL seeding.

* ci(nuclei): address findings from review pass

Three differentiated review agents flagged five actionable items on the
authenticated-Nuclei PR. This commit addresses all five:

* dump_url_map.py: stop skipping parameterized routes. Substitute a
  Flask-converter-appropriate placeholder (int/float→1, uuid→all-zeros,
  default→"nuclei") so Nuclei still probes path-traversal / parameter-
  injection / SQLi templates against routes like /research/<research_id>
  and /api/research/<research_id>/status. Without this, the bulk of the
  authenticated app surface (history, research, API blueprints) was
  silently excluded — which defeats the PR's purpose.

* nuclei.yml -etags intrusive,dos,fuzz: now that Nuclei holds a real
  session, default templates could mutate state or DoS the runner. This
  is the standard exclusion set for authenticated DAST.

* nuclei.yml: replace `cat cookies.txt` in the missing-cookie error
  branch with a column-filtered `awk` that omits the value column. The
  cookie is masked via `::add-mask::` after this point, so the previous
  branch could leak the session token in CI logs if the extraction
  regex ever broke.

* nuclei.yml: add `sleep 2` between auth/check and the Nuclei step so
  the post-login background thread (settings migration + library init,
  see web/auth/routes.py:_perform_post_login_tasks) finishes before
  probes start and 500 on settings-dependent routes.

* nuclei.yml: drop `# pragma: allowlist secret` on TEST_PASSWORD. The
  repo uses gitleaks (.gitleaks.toml already allowlists `testpass123`),
  not detect-secrets — the pragma was dead weight.

Out of scope for this PR (recorded but not changed):
- 3-way credential drift (init_test_database.py / nuclei.yml /
  auth_helper.js all hardcode test_admin/testpass123)
- Nuclei binary version `latest` auto-updating (matches existing CI)
- create_app() side effects in dump_url_map.py (currently benign)

2026-04-27 23:11:40 +02:00

dump_url_map.py

ci(nuclei): authenticate DAST scan + seed URLs from Flask url_map (#3698 )

2026-04-27 23:11:40 +02:00

init_test_database.py

refactor: Address PR review feedback (#1570 )

2026-01-18 16:14:24 -05:00