The gitleaks action was still flagging placeholder API key examples
despite having them in the allowlist. This fix addresses the root causes:
1. Add explicit GITLEAKS_CONFIG environment variable to workflow to
ensure the config file is loaded by gitleaks-action v2
2. Add GITLEAKS_BASELINE_PATH to use the baseline ignore file
3. Add secretGroup = 2 to the generic-secret rule to extract just the
secret value (not the full match including KEY=), allowing the
existing allowlist regexes like 'your-.*-key-here' to work properly
4. Create .gitleaksignore baseline file with specific fingerprints for
known false positives in historical commits
5. Update .gitignore to track .gitleaksignore file
6. Add .gitleaksignore to file-whitelist-check scripts in both
.github/scripts/ and .pre-commit-hooks/
The path_validator.py file legitimately contains system directory paths
like /var/log, /etc, etc. as part of its security policy for defining
which directories should be restricted from user access. These are not
environment-specific hardcoded paths but rather security policy definitions.
The SECRET_VIOLATIONS check was causing false positives by flagging
legitimate code that references keywords like 'api_key', 'password',
'token' (e.g., class attributes like `requires_api_key = False`).
Gitleaks already runs as a separate workflow and handles secret detection
with context-aware rules that don't produce these false positives.
- Remove pipe before while loop to fix subshell issue where violation
arrays were always empty (violations detected but never reported)
- Replace per-file "Checking:" output with progress dots every 10 files
- Add summary showing total files checked
Add defaults/llm_providers/*.json and defaults/research_library/*.json
to SAFE_FILE_PATTERNS. These configuration files contain 'api_key' as
field names (not actual secrets), which triggers false positive secret
pattern detection.
Add comprehensive validation to enforce SHA256 digest pinning across all
Docker image references (Dockerfiles, docker-compose, and workflow files).
New Files:
- .github/scripts/validate-docker-compose-images.sh
Bash script that validates docker-compose.yml files for unpinned images.
Allows documented exceptions for own images and templates.
- .github/scripts/validate-workflow-images.py
Python script with proper YAML parsing to validate GitHub Actions
workflow service containers and container images.
- .github/workflows/validate-image-pinning.yml
CI workflow that runs both validators on PR changes. Provides clear
error messages and fix instructions when violations are found.
Why This Matters:
Image tags are mutable and can be reassigned to malicious images in supply
chain attacks. SHA256 digests are immutable cryptographic identifiers that
guarantee the exact same image bytes every deployment.
This validation:
- Blocks PRs with unpinned images
- Shows violations directly in PR checks (not just Security tab)
- Provides clear fix instructions
- Runs efficiently (only on relevant file changes)
Complements:
- PR #1184 (pins Dockerfile and workflow images)
- PR #1218 (pins docker-compose images)
Resolved conflicts:
- .gitleaks.toml: Combined regex patterns from both branches, added path allowlists
- pyproject.toml: Kept updated versions from dev + added hypothesis from main
- __version__.py: Keep 1.3.0 from dev
- news.js: Removed duplicate toggleExpanded function (already exists at line 1291)
- pdm.lock: Regenerated with pdm lock
Resolve merge conflicts:
- .github/scripts/file-whitelist-check.sh: Keep both .tsv and .jinja2 patterns
- .pre-commit-hooks/file-whitelist-check.sh: Keep both .tsv and .jinja2 patterns
- docker-compose.yml: Take dev version
- package.json: Take dev version with security scripts
- pdm.lock: Take dev version
- pyproject.toml: Take dev version
- research_service.py: Keep usedforsecurity=False for security scanners
- ui.js: Take dev version with ldr-alert-close class
- search_engine_local.py: Keep usedforsecurity=False for security scanners
Update CI workflows and test files after memory_cache module removal:
- Update api-tests.yml to import search_cache instead of memory_cache
- Remove memory_cache from file-whitelist-check.sh
- Remove memory_cache from test_env_var_usage.py allowlist
- Quote variables to prevent word splitting (SC2086)
- Use 'read -r' to prevent backslash mangling (SC2162)
- Use 'cd ... || exit' for safe directory changes (SC2164)
- Use '-n' instead of '\! -z' for string checks (SC2236)
- Use pgrep instead of ps | grep (SC2009)
- Check exit codes directly instead of using $? (SC2181)
- Declare and assign separately for exports (SC2155)
- Fix unused loop variables with underscore prefix (SC2034)
- Remove stray markdown backticks from ollama_entrypoint.sh
* feat: Add favicon (PNG only, no duplication)
Adds the neon microscope icon as website favicon to fix 404 error.
Changes:
- Added favicon.png (256x256) in static/ directory
- Updated base.html to reference favicon using url_for() for proper path resolution
- Updated file whitelist to allow favicon
The icon was recovered from git history (commit 495a905d) and is metadata-clean.
Improvements based on AI review:
- Removed file duplication (single favicon.png instead of two copies)
- Used url_for() instead of hardcoded paths for better Flask compatibility
* chore: auto-bump version to 1.2.16
* fix: Use direct path for favicon instead of url_for()
The Flask app has static_folder=None and uses a custom static route
with endpoint 'app_serve_static', so url_for('static', ...) doesn't
exist and was causing BuildError on all pages.
Using hardcoded /static/favicon.png path which matches the custom
@app.route('/static/<path:path>') route in app_factory.py.
This fixes all UI test failures.
---------
Co-authored-by: GitHub Action <action@github.com>
- Replace DEBUG output with informative result summaries
- Fix file processing loop to handle filenames with spaces/special chars using printf
- Improve readability of security scan results with better formatting
- Maintain helpful output while removing debug terminology
These improvements make the script more robust and user-friendly while
maintaining all security checking functionality.
- Extract massive inline script to separate bash file (.github/scripts/file-whitelist-check.sh)
- Reduce workflow file from 680 lines to 25 lines
- Fix script logic issues (while loop for file processing)
- Make script executable and properly handle GitHub environment variables
- Maintain all original security checking functionality
This resolves the "Exceeded max expression length 21000" error that was preventing
the workflow from running.
Changes:
1. Remove deprecated get_report_as_temp_file() method
- Removed from base.py, file.py, database_with_file_backup.py
- Method had zero callers and created persistent unencrypted files
- Users should use export_report_to_memory() for in-memory exports
2. Add **kwargs support to write_json_verified()
- Allows passing any json.dumps() parameters (ensure_ascii, sort_keys, etc.)
- Maintains backward compatibility with indent=2 default
- Increases flexibility without breaking existing code
3. Update security check script
- Remove get_report_as_temp_file patterns from safe usage list
- Security check still passes after removal
- Updated _save_pdf() to use write_file_verified with research_library.enable_pdf_storage setting
- Updated _save_text() to use write_file_verified with research_library.enable_txt_storage setting
- Refactored _extract_text_from_pdf() to use in-memory pypdf processing instead of writing temp files to disk
- Removed research_library from security check exclusions to enable verification
All file writes in research_library now go through security verification with proper setting checks.
System config files (Flask secret key, server config, search index metadata)
are not user data - they're system configuration.
Safe temp files in database/encrypted_db.py use proper cleanup and are
not a security concern.
Research library downloads are user-controlled features where users
explicitly download academic PDFs/text to their library - not a
security issue.
The fix also corrects grep command option ordering: --exclude-dir
options must come BEFORE -- to be effective. Previously, EXCLUDE_ARGS
was placed after --, causing all exclusions to be ignored.
- Store error reports in encrypted database instead of unencrypted files
- Create static benchmark template file instead of generating at runtime
- Update security check to search only src/ directory (faster, avoids .venv)
- Exclude examples/, scripts/, .github/, cookiecutter-docker/ from checks
- Delete unused web/routes/benchmark_routes.py (dead code)
Error reports containing user queries are now stored in the encrypted
database using the same storage.save_report() mechanism as successful
research, preventing sensitive data from being written to unencrypted
disk files.
Critical bug fixes:
- Fix broken pipe chain logic (grep exit status issue)
- Add missing Python context manager patterns (with open)
- Fix export_matches to respect exclude arguments
Improvements:
- Extract 175+ line script from YAML to separate file
- Move logic to .github/scripts/check-file-writes.sh
- Makes script testable and maintainable
- Cleaner workflow file (now just 28 lines)
The script is now:
- Easier to test independently
- More maintainable as a standalone file
- Can be run locally for development