local-deep-research

mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-15 19:46:56 +03:00

Files

LearningCircuit 9755a900eb ci(research): extract reusable LDR-research workflow + add issue-trigger caller (#3987 )

* ci(research): extract reusable LDR-research workflow + add issue-trigger caller

Three triggers will end up calling the same install-and-run-LDR
plumbing (PR diff today, issue body now, Reddit posts later). Factor
out the middle of the workflow into a reusable workflow so we don't
have to maintain the same logic in three places, and add the
issue-trigger caller on top of it.

Changes:

- .github/workflows/ldr-research-reusable.yml (new) — workflow_call
  workflow that takes a fully-assembled query and returns a
  comment-ready markdown blob via artifact. Inputs include
  forward-compat knobs the future Reddit caller will need
  (max-query-length, max-sources, comment-footer override,
  include-sources-section, output-truncate-chars).

- .github/workflows/e2e-research-test.yml — refactored from a single
  job to three jobs (build-query → research-via-reusable →
  post-comment). Behaviour is preserved: same headers, same footer,
  same diff truncation at MAX_DIFF_SIZE, same label-removal on
  completion.

- .github/workflows/issue-research.yml (new) — triggers on
  `issues: types: [labeled]` gated by the same `ldr_research` label
  the PR workflow uses (GitHub event-type gating means they don't
  conflict). Output has two sections: "For the reporter" (cautious
  framing) and "For maintainers" (raw research context). Issue body
  is sanitized (control-char strip, 4000-char truncation) and never
  reaches a shell.

- scripts/ldr-research.py — renamed from ldr-diff-research.py
  (`git mv`, history preserved). Drops --mode, --static-query,
  --max-diff-size: query now comes from stdin only and the caller
  workflow does prompt assembly. Output JSON shape: {research,
  sources, findings, iterations}.

- .github/labels.yml — register ldr_research and ldr_research_static
  so they exist canonically rather than via on-the-fly creation.

Reddit research is a follow-up PR; this PR ships the abstraction
shape it will need.

* docs(ci): regenerate workflow status dashboard for new LDR workflows

The check-structure CI gate requires every workflow file to have a row
in docs/ci/workflow-status.md. Regenerate to add rows for the two new
workflows added in this PR. The live-status flips on unrelated rows
(gitleaks, ossf-scorecard, responsive-ui-tests-enhanced, osv-scanner)
are accurate snapshots of current status — the auto-regen workflow
keeps them fresh on its own schedule.

* ci(research): address review feedback — label cleanup, delimiter, artifact

Three small follow-ups from the AI review on this PR:

1. Label cleanup on build-query failure. The post-comment job had
   `if: always() && needs.research.result != 'skipped'`, which meant
   that if build-query failed, research was skipped and the entire
   post-comment job (including the label-removal step) was skipped
   too — leaving a stuck `ldr_research` label on the PR/issue.
   Switch to `if: always()`; the download and post steps already
   self-guard with `needs.research.outputs.success == 'true'`, so
   only the label-removal step runs in the failure path.

2. Randomized GHA output delimiter. `__LDR_QUERY_EOF__` was a fixed
   string; a query containing that exact line could prematurely
   terminate the multi-line output. Use $$/$RANDOM/nanosecond as the
   delimiter base. Defense-in-depth — collision was already
   astronomically unlikely.

3. Optional `artifact-suffix` input on the reusable workflow. Until
   now the artifact name was
   `ldr-research-{run_id}-{run_attempt}-{github.job}`, which
   collides if a caller invokes the reusable multiple times in one
   run. The Reddit follow-up will use a matrix call, so add a
   caller-provided suffix now and sanitize it to artifact-safe
   chars. Existing callers don't pass it; default empty preserves
   today's name.

* ci(research): fix per-line truncation in reusable workflow

Two follow-ups from the second review pass:

1. The awk-based backstop truncation in `Write query to file` was
   per-line (operating on $0 / length($0)), not total. A long
   multi-line query with many short lines would silently bypass the
   max-query-length cap. Swap for a wc -c + head -c approach that
   truncates total bytes. Verified locally that a 114-byte
   multi-line input with all-short-lines is now correctly truncated
   to ~100 bytes.

2. Remove the unused EXIT_CODE capture in `Run LDR Research`. The
   step relies on JSON validation for error detection; capturing
   $? without using it was just dead code inherited from the
   original workflow.

2026-05-11 00:44:16 +02:00