Files
local-deep-research/.github/workflows/responsive-ui-tests-enhanced.yml
LearningCircuit 8597e429cc Improve UI tests + CI: artifact uploads, WebKit skip narrowing, settle-wait migrations (#4061)
* ci(responsive): restore artifact uploads and fix dead post-results gate

The Responsive UI workflow lost its per-viewport artifact uploads (the
explanatory comment around lines 206-209), so PR/release failures were
un-debuggable - no screenshots, no test output. The downstream
`post-results` job was also gated on `github.event_name == 'pull_request'`,
which can never be true because the workflow has no `pull_request` trigger;
the combined-report aggregator therefore never ran.

Restore the upload step using `if: always()` + `if-no-files-found: ignore`
(so server-startup failures still upload logs and quiet runs don't fail
the step) and rewrite the `post-results` gate to `if: always()`. Artifact
name matches the existing `ui-test-results-*` pattern expected by the
combined-report glob.

* test(playwright): narrow WebKit closed-context skip to webkit only (#4060)

The catch at all-pages-mobile.spec.js:372 was previously calling
`test.skip(true, ...)`, which skipped the test for every browser - so any
non-WebKit error path also silently bailed out of the mobile-nav overlap
assertion. Only Mobile Safari / WebKit is known to hit the
`Target page, context or browser has been closed` race, so gate the skip
on `browserName === 'webkit'`. Other browsers now re-throw and surface the
regression.

Also broaden the matched error message to include
`Execution context was destroyed`, the alternate wording the same upstream
race uses in newer Playwright versions.

Skip annotation references issue #4060 so the skip is grep-able and can be
removed when the underlying race is fixed or the DOM walk is restructured.

* test(ui): add waitForStable helper to auth_helper.js

Replaces ad-hoc `await delay(N)` sleeps used to "let the UI settle" after
an action. The helper waits for a selector to be visible, then waits for
its bounding box to stop changing across requestAnimationFrame ticks
(bounded to 3s in-page). The final `idleMs` pause is configurable.

JSDoc explicitly notes when NOT to use it: don't replace `delay()` calls
that exercise wall-clock behavior (e.g. a 10s timer the app is supposed to
respect). Those tests need real elapsed time, not a settle wait.

Exported as a sibling of `safeClick` to keep Puppeteer test imports tidy.

* test(ui): replace settle-delays with state-based waits in two puppeteer tests

`test_research_cancellation.js` had 7 hardcoded `await delay(...)` calls
and `test_form_validation_aria_ci.js` had 19. The vast majority were
"give the UI a moment to settle" pauses with no real signal attached, so
they slowed CI and quietly hid races whenever the runner was a beat slower
than the chosen delay.

For each call:
- post-`navigateTo` 500ms sleeps -> `waitForSelector('#query', { visible: true })`
- post-validation-trigger sleeps -> `waitForFunction` polling the
  `ldr-field-invalid` class to appear (or clear, when the test expects
  validation to pass)
- post-focus 100ms -> `waitForFunction(() => document.activeElement?.id === 'query')`
- post-cancel-click sleeps -> `waitForFunction` polling for `cancel|stop|suspend`
  to appear in the status text
- post-typing 200ms -> `waitForFunction` polling for the typed value to land

The one delay we kept: the explicit 10-second wait in the mid-stage
cancellation test (`test_research_cancellation.js`), which deliberately
exercises elapsed-time behavior of the research progress flow. That is
not a settle wait and must stay wall-clock.

Polling waits all use `.catch(() => {})` to preserve existing
behavior when a selector or state never appears (the assertions further
down handle the failure case more informatively than a hung wait would).

* docs(pr-template): document label-gated CI workflows

Several heavy E2E workflows are label-gated and silently no-op on PRs
without the right label - new contributors had no way to know. Add a "CI
test coverage" section to the PR template enumerating each gated workflow
and the label that triggers it.

No CI behavior change; documentation only.

* test(form-validation): make waitForQueryReady detect validator attachment

Local smoke-test (9 tests, ran against `scripts/dev/restart_server.sh`)
exposed two latent races that the prior `await delay(500)` had been
quietly hiding:

1. `waitForQueryReady` returned as soon as `#query` was visible, but the
   FormValidator class is registered against the field a tick later
   (research.js setupEventListeners). Waiting for the `.ldr-field-error`
   sibling that addValidation() inserts is the actual signal that the
   validator is wired and the submit handler will take the early-return
   path on an empty query.

2. `noLoadingUiOnEmptySubmit` ran after `errorClearsOnValidSubmit`, which
   typed a real query and triggered a real submit (the fetch fails but
   creates `.ldr-loading-overlay` first). `navigateTo` skipped the
   re-navigation because we were already on `/`, so the stale overlay
   carried over. Force a real `page.goto` for this test so it asserts
   about a fresh page, not the leftover state of the previous test.

After the fix the suite passes 9/9 in ~1s (vs ~4.5s with the old delays).

* chore(labels): rewrite test-trigger label descriptions for AI reviewer auto-apply

The Friendly AI code reviewer (.github/workflows/ai-code-reviewer.yml)
auto-applies labels based on the labels' descriptions in the repo. The
existing test:puppeteer / test:e2e / ldr_research / ldr_research_static
descriptions were passive ("Triggers Puppeteer E2E tests on this PR"),
which doesn't guide the reviewer on *when* to apply them.

Rewrite them in the same imperative, bias-toward-action style used by
benchmark-needed ("Apply if a change risks degrading performance — when
in doubt, add it. Run compare_configurations()"):

- test:puppeteer + test:e2e — apply for any PR touching the web stack
- ldr_research / ldr_research_static — apply for substantive code/arch
  changes, with the static variant biased even more toward "run it"
  since it uses the cheaper model

Also add the test:* labels to labels.yml so they become version-controlled
(previously they existed only on GitHub, created out-of-band). label-sync
is additive and will overwrite the GitHub descriptions on next main push.
2026-05-16 13:17:28 +02:00

338 lines
12 KiB
YAML

name: Responsive UI Tests
on:
workflow_call:
workflow_dispatch:
# No concurrency group — intentionally omitted.
# This workflow runs only via workflow_call (from release.yml's
# responsive-test-gate) and workflow_dispatch. The pull_request trigger
# was deliberately removed in #2248 to reduce PR CI load on what is a
# heavy ~20-minute matrix build (mobile + desktop). A shared concurrency
# key here previously caused workflow_call invocations to cancel each
# other mid-flight; see #3554 (reverted in #3599) for that history.
permissions:
contents: read
jobs:
ui-tests:
runs-on: ubuntu-latest
timeout-minutes: 20
strategy:
fail-fast: false
matrix:
viewport: [mobile, desktop] # Test mobile and desktop on each PR
services:
postgres:
image: postgres:14@sha256:ca25035f7e6f74552655a1c5e4a9eb21f85e9d316f1f70371f790ef70095dd58 # v14
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: ldr_test
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@a5ad31d6a139d249332a2605b85202e8c0b78450 # v2.19.1
with:
egress-policy: audit
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Set up PDM
uses: pdm-project/setup-pdm@973541a5febeafcfdadf8a51211435be6ecfd90f # v4.5
with:
python-version: '3.12'
- name: Set up Node.js
uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
with:
node-version: '24'
cache: 'npm'
cache-dependency-path: tests/ui_tests/package-lock.json
- name: Free up disk space
run: |
# Remove unnecessary large packages to prevent disk space issues during cache save
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc
sudo rm -rf /opt/hostedtoolcache/CodeQL
df -h
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
wget \
gnupg \
ca-certificates \
fonts-liberation \
libasound2t64 \
libatk-bridge2.0-0 \
libatk1.0-0 \
libcups2 \
libdbus-1-3 \
libdrm2 \
libgbm1 \
libgtk-3-0 \
libnspr4 \
libnss3 \
libx11-xcb1 \
libxcomposite1 \
libxdamage1 \
libxrandr2 \
xdg-utils \
imagemagick
- name: Install Python dependencies
run: pdm install
- name: Install root frontend dependencies
run: npm ci
- name: Build Vite frontend bundle
# Generates src/local_deep_research/web/static/dist/, which the
# Flask app loads via vite_helper. Without this the responsive UI
# tests run against an unstyled page (no styles.css), which means
# any CSS source changes between PRs are invisible to the test
# baseline. See follow-up to PR #3985 for context.
run: npm run build
- name: Install Node test dependencies
working-directory: tests/ui_tests
run: |
npm ci
npx puppeteer@24.35.0 browsers install chrome
- name: Set up test directories
run: |
mkdir -p ${{ github.workspace }}/data/encrypted_databases
mkdir -p tests/ui_tests/screenshots
echo "Created data and screenshots directories for tests"
- name: Start test server
env:
CI: true
FLASK_ENV: testing
TEST_ENV: true
SECRET_KEY: test-secret-key-for-ci # Security: CI test credential, not production secret
LDR_DISABLE_RATE_LIMITING: true
LDR_DATA_DIR: ${{ github.workspace }}/data
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/ldr_test # Security: CI test database credentials, not production secrets
run: |
cd src
# Start server and get its PID
pdm run python -m local_deep_research.web.app > server.log 2>&1 &
SERVER_PID=$!
echo "Server PID: $SERVER_PID"
# Wait for server to start
for i in {1..30}; do
if curl -fsS --connect-timeout 2 --max-time 5 http://127.0.0.1:5000 2>/dev/null; then
echo "Server is ready after $i seconds"
break
fi
# Check if process is still running
if ! kill -0 $SERVER_PID 2>/dev/null; then
echo "Server process died!"
echo "Server log:"
cat server.log
exit 1
fi
echo "Waiting for server... ($i/30)"
sleep 1
done
- name: Register CI test user
working-directory: tests/ui_tests
run: node register_ci_user.js http://127.0.0.1:5000
- name: Run responsive UI tests - ${{ matrix.viewport }}
id: run-tests
working-directory: tests/ui_tests
env:
VIEWPORT: ${{ matrix.viewport }}
run: |
set +e # Don't exit on test failure
# Run tests and capture output
HEADLESS=true node test_responsive_ui_comprehensive.js "$VIEWPORT" 2>&1 | tee test-output.log
TEST_EXIT_CODE="${PIPESTATUS[0]}"
# Extract summary for PR comment
echo "### 📱 $VIEWPORT Test Results" > test-summary.md
echo "" >> test-summary.md
# Extract pass/fail counts
PASSED=$(grep -oP '\d+(?= passed)' test-output.log | tail -1 || echo "0")
FAILED=$(grep -oP '\d+(?= failed)' test-output.log | tail -1 || echo "0")
WARNINGS=$(grep -oP '\d+(?= warnings)' test-output.log | tail -1 || echo "0")
if [ "$FAILED" -eq "0" ]; then
echo "✅ **All tests passed!**" >> test-summary.md
else
echo "❌ **$FAILED critical issues found**" >> test-summary.md
fi
{
echo ""
echo "- ✅ Passed: $PASSED"
echo "- ❌ Failed: $FAILED"
echo "- ⚠️ Warnings: $WARNINGS"
} >> test-summary.md
{
echo "test_exit_code=$TEST_EXIT_CODE"
echo "test_failed=$FAILED"
} >> "$GITHUB_OUTPUT"
# Only fail if critical failures
exit "$TEST_EXIT_CODE"
- name: Upload viewport test results
if: always()
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: ui-test-results-${{ matrix.viewport }}
path: |
tests/ui_tests/test-output.log
tests/ui_tests/test-summary.md
tests/ui_tests/screenshots/
tests/ui_tests/responsive/
src/server.log
retention-days: 7
if-no-files-found: ignore
# NOTE: Earlier screenshot generation/upload steps were removed to fix actionlint warnings
# caused by "if: false" disablement. Screenshots are now captured into the artifact above
# when they exist. Visual regression workflows can re-enable dedicated screenshot steps later.
- name: Upload to GitHub Pages (if available)
if: always() && github.event_name == 'pull_request'
continue-on-error: true
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
VIEWPORT: ${{ matrix.viewport }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPOSITORY: ${{ github.repository }}
GH_REPO_OWNER: ${{ github.repository_owner }}
GH_REPO_NAME: ${{ github.event.repository.name }}
run: |
# This requires GitHub Pages to be enabled for the repo
# Create a branch for the screenshots
BRANCH_NAME="pr-screenshots-$PR_NUMBER-$VIEWPORT"
# Check if screenshots directory exists
if [ ! -d "tests/ui_tests/responsive" ]; then
echo "No screenshots directory found, skipping upload"
exit 0
fi
cd tests/ui_tests/responsive
git init
git config user.name "GitHub Actions"
git config user.email "actions@github.com"
# Add all screenshots and gallery
git add responsive-ui-tests screenshot-gallery.html
git commit -m "Screenshots for PR #$PR_NUMBER"
# Push to a dedicated branch (requires write permissions)
git push --force "https://x-access-token:$GH_TOKEN@github.com/$GH_REPOSITORY.git" "HEAD:$BRANCH_NAME" 2>/dev/null || true
# The URL would be: https://[owner].github.io/[repo]/pr-screenshots-[number]-[viewport]/screenshot-gallery.html
echo "Screenshots available at: https://$GH_REPO_OWNER.github.io/$GH_REPO_NAME/$BRANCH_NAME/screenshot-gallery.html" || true
- name: Stop application server
if: always()
run: |
if [ -f server.pid ]; then
kill "$(cat server.pid)" || true
rm server.pid
fi
post-results:
needs: ui-tests
runs-on: ubuntu-latest
# Workflow has no pull_request trigger; gating on event_name == 'pull_request' kept this job
# from ever running. Always run so the combined report is built for workflow_dispatch / release.
if: always()
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@a5ad31d6a139d249332a2605b85202e8c0b78450 # v2.19.1
with:
egress-policy: audit
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Download all artifacts
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
with:
path: test-artifacts/
- name: Generate combined report
run: |
# Create a combined markdown report with screenshot links
{
echo "# 📊 Responsive UI Test Report"
echo ""
echo "## Test Summary"
echo ""
} > combined-report.md
# Process each viewport's results
for dir in test-artifacts/ui-test-results-*/; do
if [ -d "$dir" ]; then
viewport=$(basename "$dir" | sed 's/ui-test-results-//')
echo "### $viewport" >> combined-report.md
if [ -f "$dir/test-summary.md" ]; then
cat "$dir/test-summary.md" >> combined-report.md
fi
# Count screenshots
screenshot_count=$(find "$dir" -name "*.png" 2>/dev/null | wc -l)
if [ "$screenshot_count" -gt "0" ]; then
{
echo ""
echo "📸 **$screenshot_count screenshots captured**"
} >> combined-report.md
fi
echo "" >> combined-report.md
fi
done
{
echo "## 📥 Download Options"
echo ""
echo "- [Download all test artifacts](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})"
echo "- View artifacts in the 'Artifacts' section below the workflow summary"
} >> combined-report.md
- name: Upload combined report
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: combined-test-report
path: combined-report.md
retention-days: 30