mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
* ci(research): extract reusable LDR-research workflow + add issue-trigger caller
Three triggers will end up calling the same install-and-run-LDR
plumbing (PR diff today, issue body now, Reddit posts later). Factor
out the middle of the workflow into a reusable workflow so we don't
have to maintain the same logic in three places, and add the
issue-trigger caller on top of it.
Changes:
- .github/workflows/ldr-research-reusable.yml (new) — workflow_call
workflow that takes a fully-assembled query and returns a
comment-ready markdown blob via artifact. Inputs include
forward-compat knobs the future Reddit caller will need
(max-query-length, max-sources, comment-footer override,
include-sources-section, output-truncate-chars).
- .github/workflows/e2e-research-test.yml — refactored from a single
job to three jobs (build-query → research-via-reusable →
post-comment). Behaviour is preserved: same headers, same footer,
same diff truncation at MAX_DIFF_SIZE, same label-removal on
completion.
- .github/workflows/issue-research.yml (new) — triggers on
`issues: types: [labeled]` gated by the same `ldr_research` label
the PR workflow uses (GitHub event-type gating means they don't
conflict). Output has two sections: "For the reporter" (cautious
framing) and "For maintainers" (raw research context). Issue body
is sanitized (control-char strip, 4000-char truncation) and never
reaches a shell.
- scripts/ldr-research.py — renamed from ldr-diff-research.py
(`git mv`, history preserved). Drops --mode, --static-query,
--max-diff-size: query now comes from stdin only and the caller
workflow does prompt assembly. Output JSON shape: {research,
sources, findings, iterations}.
- .github/labels.yml — register ldr_research and ldr_research_static
so they exist canonically rather than via on-the-fly creation.
Reddit research is a follow-up PR; this PR ships the abstraction
shape it will need.
* docs(ci): regenerate workflow status dashboard for new LDR workflows
The check-structure CI gate requires every workflow file to have a row
in docs/ci/workflow-status.md. Regenerate to add rows for the two new
workflows added in this PR. The live-status flips on unrelated rows
(gitleaks, ossf-scorecard, responsive-ui-tests-enhanced, osv-scanner)
are accurate snapshots of current status — the auto-regen workflow
keeps them fresh on its own schedule.
* ci(research): address review feedback — label cleanup, delimiter, artifact
Three small follow-ups from the AI review on this PR:
1. Label cleanup on build-query failure. The post-comment job had
`if: always() && needs.research.result != 'skipped'`, which meant
that if build-query failed, research was skipped and the entire
post-comment job (including the label-removal step) was skipped
too — leaving a stuck `ldr_research` label on the PR/issue.
Switch to `if: always()`; the download and post steps already
self-guard with `needs.research.outputs.success == 'true'`, so
only the label-removal step runs in the failure path.
2. Randomized GHA output delimiter. `__LDR_QUERY_EOF__` was a fixed
string; a query containing that exact line could prematurely
terminate the multi-line output. Use $$/$RANDOM/nanosecond as the
delimiter base. Defense-in-depth — collision was already
astronomically unlikely.
3. Optional `artifact-suffix` input on the reusable workflow. Until
now the artifact name was
`ldr-research-{run_id}-{run_attempt}-{github.job}`, which
collides if a caller invokes the reusable multiple times in one
run. The Reddit follow-up will use a matrix call, so add a
caller-provided suffix now and sanitize it to artifact-safe
chars. Existing callers don't pass it; default empty preserves
today's name.
* ci(research): fix per-line truncation in reusable workflow
Two follow-ups from the second review pass:
1. The awk-based backstop truncation in `Write query to file` was
per-line (operating on $0 / length($0)), not total. A long
multi-line query with many short lines would silently bypass the
max-query-length cap. Swap for a wc -c + head -c approach that
truncates total bytes. Verified locally that a 114-byte
multi-line input with all-short-lines is now correctly truncated
to ~100 bytes.
2. Remove the unused EXIT_CODE capture in `Run LDR Research`. The
step relies on JSON validation for error detection; capturing
$? without using it was just dead code inherited from the
original workflow.