Commit Graph

39 Commits

Author SHA1 Message Date
LearningCircuit
0fa151a4eb fix: resolve gitleaks false positives with explicit config and baseline
The gitleaks action was still flagging placeholder API key examples
despite having them in the allowlist. This fix addresses the root causes:

1. Add explicit GITLEAKS_CONFIG environment variable to workflow to
   ensure the config file is loaded by gitleaks-action v2

2. Add GITLEAKS_BASELINE_PATH to use the baseline ignore file

3. Add secretGroup = 2 to the generic-secret rule to extract just the
   secret value (not the full match including KEY=), allowing the
   existing allowlist regexes like 'your-.*-key-here' to work properly

4. Create .gitleaksignore baseline file with specific fingerprints for
   known false positives in historical commits

5. Update .gitignore to track .gitleaksignore file

6. Add .gitleaksignore to file-whitelist-check scripts in both
   .github/scripts/ and .pre-commit-hooks/
2026-01-25 12:24:23 +01:00
LearningCircuit
b5b56d1b60 fix: address CI failures for form validation tests
- Add tests/ui_tests/*password*.js to whitelist safe filename patterns
- Add blur() call before checking pattern validity to ensure browser
  updates validation state properly
2025-12-24 11:05:11 +01:00
LearningCircuit
ab88b95d14 fix: add .trivyignore to whitelist 2025-12-08 02:25:26 +01:00
LearningCircuit
51fbcf3dda fix: exclude security/path_validator.py from hardcoded path check (#1254)
The path_validator.py file legitimately contains system directory paths
like /var/log, /etc, etc. as part of its security policy for defining
which directories should be restricted from user access. These are not
environment-specific hardcoded paths but rather security policy definitions.
2025-12-05 09:09:58 -05:00
LearningCircuit
730a17446b fix: remove naive secret detection from whitelist-check (gitleaks handles this)
The SECRET_VIOLATIONS check was causing false positives by flagging
legitimate code that references keywords like 'api_key', 'password',
'token' (e.g., class attributes like `requires_api_key = False`).

Gitleaks already runs as a separate workflow and handles secret detection
with context-aware rules that don't produce these false positives.
2025-12-05 00:12:14 +01:00
github-actions[bot]
34dcffcd13 Merge remote-tracking branch 'origin/dev' into sync-main-to-dev 2025-12-04 22:23:53 +00:00
LearningCircuit
04246669c3 fix: resolve subshell bug in whitelist-check and reduce output noise (#1231)
- Remove pipe before while loop to fix subshell issue where violation
  arrays were always empty (violations detected but never reported)
- Replace per-file "Checking:" output with progress dots every 10 files
- Add summary showing total files checked
2025-12-04 22:23:06 +00:00
LearningCircuit
be8311e061 fix: whitelist llm_providers and research_library JSON configs
Add defaults/llm_providers/*.json and defaults/research_library/*.json
to SAFE_FILE_PATTERNS. These configuration files contain 'api_key' as
field names (not actual secrets), which triggers false positive secret
pattern detection.
2025-12-04 20:27:30 +01:00
LearningCircuit
d869d4bf78 Merge branch 'dev' into feature/comprehensive-security-enhancements 2025-12-03 16:26:11 +01:00
LearningCircuit
40b26cfc60 Merge dev into feature/comprehensive-security-enhancements 2025-12-03 16:24:57 +01:00
LearningCircuit
ea73116db6 feat: add CI validation for Docker image SHA digest pinning
Add comprehensive validation to enforce SHA256 digest pinning across all
Docker image references (Dockerfiles, docker-compose, and workflow files).

New Files:
- .github/scripts/validate-docker-compose-images.sh
  Bash script that validates docker-compose.yml files for unpinned images.
  Allows documented exceptions for own images and templates.

- .github/scripts/validate-workflow-images.py
  Python script with proper YAML parsing to validate GitHub Actions
  workflow service containers and container images.

- .github/workflows/validate-image-pinning.yml
  CI workflow that runs both validators on PR changes. Provides clear
  error messages and fix instructions when violations are found.

Why This Matters:
Image tags are mutable and can be reassigned to malicious images in supply
chain attacks. SHA256 digests are immutable cryptographic identifiers that
guarantee the exact same image bytes every deployment.

This validation:
- Blocks PRs with unpinned images
- Shows violations directly in PR checks (not just Security tab)
- Provides clear fix instructions
- Runs efficiently (only on relevant file changes)

Complements:
- PR #1184 (pins Dockerfile and workflow images)
- PR #1218 (pins docker-compose images)
2025-12-03 12:35:09 +01:00
LearningCircuit
9332256489 Merge branch 'dev' into refactor/remove-dogpile-cache-add-stampede-protection
Resolve merge conflicts:
- pyproject.toml: Keep flask-limiter from dev, remove dogpile-cache/redis/msgpack as intended
- test_env_var_usage.py: Keep rate_limiter.py from dev, remove memory_cache/ as intended
2025-12-01 21:20:16 +01:00
LearningCircuit
94d9f08dd5 fix: resolve CI check failures in GitHub Actions workflows (#1197)
- Add required permissions blocks to workflow files (Checkov CKV2_GHA_1)
  - check-css-classes.yml: contents: read, pull-requests: write
  - notification-tests.yml: contents: read
  - security-file-write-check.yml: contents: read
  - mobile-ui-tests.yml: contents: read, checks: write
  - responsive-ui-tests-enhanced.yml: contents: read

- Fix shellcheck SC2086 warnings in check-file-writes.sh
  - Add disable comments for intentional word splitting

- Fix zizmor template-injection vulnerabilities in release.yml
  - Move template expansions to env blocks
  - Use environment variables in shell commands and scripts
  - Use process.env in github-script blocks

- Remove workflow_dispatch inputs from responsive-ui-tests-enhanced.yml
  (fixes Checkov CKV_GHA_7)
2025-12-01 08:21:36 -05:00
LearningCircuit
0d26c46c8a Merge dev into sync-main-to-dev - resolve conflicts
Resolved conflicts:
- .gitleaks.toml: Combined regex patterns from both branches, added path allowlists
- pyproject.toml: Kept updated versions from dev + added hypothesis from main
- __version__.py: Keep 1.3.0 from dev
- news.js: Removed duplicate toggleExpanded function (already exists at line 1291)
- pdm.lock: Regenerated with pdm lock
2025-11-29 19:36:36 +01:00
LearningCircuit
0d0f8e8cf6 Merge origin/dev into feature/comprehensive-security-enhancements
Resolve merge conflicts:
- .github/scripts/file-whitelist-check.sh: Keep both .tsv and .jinja2 patterns
- .pre-commit-hooks/file-whitelist-check.sh: Keep both .tsv and .jinja2 patterns
- docker-compose.yml: Take dev version
- package.json: Take dev version with security scripts
- pdm.lock: Take dev version
- pyproject.toml: Take dev version
- research_service.py: Keep usedforsecurity=False for security scanners
- ui.js: Take dev version with ldr-alert-close class
- search_engine_local.py: Keep usedforsecurity=False for security scanners
2025-11-28 01:11:20 +01:00
LearningCircuit
5989566fb1 fix: remove remaining memory_cache references
Update CI workflows and test files after memory_cache module removal:
- Update api-tests.yml to import search_cache instead of memory_cache
- Remove memory_cache from file-whitelist-check.sh
- Remove memory_cache from test_env_var_usage.py allowlist
2025-11-28 00:12:26 +01:00
LearningCircuit
309b2a619e Fix shellcheck warnings in all shell scripts
- Quote variables to prevent word splitting (SC2086)
- Use 'read -r' to prevent backslash mangling (SC2162)
- Use 'cd ... || exit' for safe directory changes (SC2164)
- Use '-n' instead of '\! -z' for string checks (SC2236)
- Use pgrep instead of ps | grep (SC2009)
- Check exit codes directly instead of using $? (SC2181)
- Declare and assign separately for exports (SC2155)
- Fix unused loop variables with underscore prefix (SC2034)
- Remove stray markdown backticks from ollama_entrypoint.sh
2025-11-27 19:18:10 +01:00
LearningCircuit
556786707c Fix shellcheck warnings in file-whitelist-check.sh
- Quote $GITHUB_BASE_REF to prevent word splitting (SC2086)
- Quote $FILE_SIZE in echo (SC2086)
- Use read -r to prevent backslash mangling (SC2162)
2025-11-27 01:35:27 +01:00
LearningCircuit
1dd64ebc76 Merge main into feature/comprehensive-security-enhancements
Resolved conflicts:
- docker-publish.yml: Combined Cosign/Syft install with version determination
- update-npm-dependencies.yml: Use pinned setup-node SHA
- docker-compose.yml: Keep RAG cache volume and Unraid documentation
- package.json: Include dompurify for XSS prevention, keep marked ^17.0.0
- pdm.lock: Accept main's version
- __version__.py: Keep 1.3.0 for comprehensive security release
- ui.js: Use safer textContent for close button (XSS prevention)
2025-11-27 00:33:02 +01:00
LearningCircuit
0685f311f6 Merge branch 'dev' into feat/notifications 2025-11-23 18:40:28 +01:00
LearningCircuit
c0bc156189 Merge branch 'dev' into sync-main-to-dev 2025-11-23 18:35:38 +01:00
LearningCircuit
9343c0b5e4 feat: Add favicon and project icon (#1106)
* feat: Add favicon (PNG only, no duplication)

Adds the neon microscope icon as website favicon to fix 404 error.

Changes:
- Added favicon.png (256x256) in static/ directory
- Updated base.html to reference favicon using url_for() for proper path resolution
- Updated file whitelist to allow favicon

The icon was recovered from git history (commit 495a905d) and is metadata-clean.

Improvements based on AI review:
- Removed file duplication (single favicon.png instead of two copies)
- Used url_for() instead of hardcoded paths for better Flask compatibility

* chore: auto-bump version to 1.2.16

* fix: Use direct path for favicon instead of url_for()

The Flask app has static_folder=None and uses a custom static route
with endpoint 'app_serve_static', so url_for('static', ...) doesn't
exist and was causing BuildError on all pages.

Using hardcoded /static/favicon.png path which matches the custom
@app.route('/static/<path:path>') route in app_factory.py.

This fixes all UI test failures.

---------

Co-authored-by: GitHub Action <action@github.com>
2025-11-22 14:28:50 -05:00
LearningCircuit
65519a7e66 Merge branch 'dev' into feat/notifications 2025-11-22 12:39:11 +01:00
LearningCircuit
83a88d260a Merge branch 'dev' into sync-main-to-dev 2025-11-21 23:20:30 +01:00
LearningCircuit
e5c8d5afcf Merge pull request #1084 from LearningCircuit/fix-ossf-scorecard-workflow
Fix OSSF Scorecard workflow
2025-11-20 01:34:55 +01:00
LearningCircuit
7391f36066 Merge branch 'fix/security-headers-zap-scan-1041' into feature/comprehensive-security-enhancements
Merges comprehensive security headers implementation from security-headers branch:
- SecurityHeaders middleware for HTTP security headers
- CORS handling with origin reflection
- CSP, X-Frame-Options, HSTS, and other security headers
- Removes inline security header code from app_factory
- Removes ZAP workflow (replaced by security headers)

Conflict resolutions:
- Kept our SESSION_COOKIE_SECURE CI detection logic (more secure than always False)
- Replaced inline security headers with SecurityHeaders middleware
- Updated version to 1.3.0
- Kept our search_engine_github implementation
2025-11-13 00:42:10 +01:00
LearningCircuit
eee317165f Add comprehensive security testing and supply chain security
This PR implements a comprehensive security enhancement plan addressing
identified gaps in the security testing infrastructure.

Phase 0: Fix Broken Security Foundation
- Create missing tests/security/ directory with 6 test files:
  * test_sql_injection.py - SQL injection prevention tests
  * test_xss_prevention.py - XSS sanitization tests
  * test_csrf_protection.py - CSRF token validation tests
  * test_auth_security.py - Authentication security tests
  * test_api_security.py - OWASP API Security Top 10 tests
  * test_input_validation.py - Input validation tests

- Add custom Semgrep security rules:
  * .semgrep/rules/ldr-security.yaml - 16 LDR-specific rules
  * Covers: hardcoded secrets, SQL injection, command injection,
    path traversal, SSRF, unsafe deserialization, and more

- Fix security-tests.yml workflow:
  * Remove || true to make tests actually fail when they should
  * Add conditional checks for legacy test files
  * Safety check uses continue-on-error (expected behavior)

Phase 1: Software Supply Chain Security
- Enhance docker-publish.yml with:
  * Cosign keyless signing with GitHub OIDC
  * SLSA provenance attestation for build integrity
  * SBOM generation with Syft
  * Automated signature verification
  * Required permissions for id-token and packages

Phase 2: Dynamic Application Security Testing (DAST)
- Add OWASP ZAP scanning workflow:
  * Baseline scan on PR/push (15-20 min)
  * Full scan nightly (30+ min)
  * API-focused scanning
  * Custom rules configuration (.zap/rules.tsv)

Security posture improved from 8/10 to 9/10 by addressing:
- Broken test references (tests that didn't exist)
- Docker image supply chain security
- Runtime vulnerability detection via DAST
- LDR-specific security patterns via Semgrep
2025-11-09 22:20:16 +01:00
tombii
9701760f3f Add .jinja2 to file whitelist in script 2025-11-04 15:46:21 +01:00
LearningCircuit
fee12aa6dc Resolve merge conflicts in file-whitelist-check.yml 2025-11-03 18:27:14 +01:00
LearningCircuit
057420bd0c improve: address AI code review suggestions for security script
- Replace DEBUG output with informative result summaries
- Fix file processing loop to handle filenames with spaces/special chars using printf
- Improve readability of security scan results with better formatting
- Maintain helpful output while removing debug terminology

These improvements make the script more robust and user-friendly while
maintaining all security checking functionality.
2025-11-03 17:50:27 +01:00
LearningCircuit
780eb973dc fix: resolve GitHub Actions expression length limit in file whitelist check
- Extract massive inline script to separate bash file (.github/scripts/file-whitelist-check.sh)
- Reduce workflow file from 680 lines to 25 lines
- Fix script logic issues (while loop for file processing)
- Make script executable and properly handle GitHub environment variables
- Maintain all original security checking functionality

This resolves the "Exceeded max expression length 21000" error that was preventing
the workflow from running.
2025-11-03 17:37:56 +01:00
LearningCircuit
47204c4cb1 refactor: address PR review feedback from djpetti
Changes:
1. Remove deprecated get_report_as_temp_file() method
   - Removed from base.py, file.py, database_with_file_backup.py
   - Method had zero callers and created persistent unencrypted files
   - Users should use export_report_to_memory() for in-memory exports

2. Add **kwargs support to write_json_verified()
   - Allows passing any json.dumps() parameters (ensure_ascii, sort_keys, etc.)
   - Maintains backward compatibility with indent=2 default
   - Increases flexibility without breaking existing code

3. Update security check script
   - Remove get_report_as_temp_file patterns from safe usage list
   - Security check still passes after removal
2025-10-07 01:15:57 +02:00
LearningCircuit
f987f293d3 fix: apply security verification to research_library file writes
- Updated _save_pdf() to use write_file_verified with research_library.enable_pdf_storage setting
- Updated _save_text() to use write_file_verified with research_library.enable_txt_storage setting
- Refactored _extract_text_from_pdf() to use in-memory pypdf processing instead of writing temp files to disk
- Removed research_library from security check exclusions to enable verification

All file writes in research_library now go through security verification with proper setting checks.
2025-10-06 00:51:24 +02:00
LearningCircuit
0b5c010880 security: add verified file write system with setting-based controls
- Create file_write_verifier.py with write_file_verified() and write_json_verified()
- All file writes now require explicit security settings to be enabled
- Settings control: benchmark.allow_file_output, api.allow_file_output,
  system.allow_config_write, storage.allow_file_backup, storage.allow_temp_file_export
- Update security check script to recognize verified write patterns
- Default deny policy: file writes fail if setting not explicitly set to true
- Clear error messages guide users to enable required settings
2025-10-06 00:03:39 +02:00
LearningCircuit
0d34d58502 fix: exclude system config and safe temp files from security check
System config files (Flask secret key, server config, search index metadata)
are not user data - they're system configuration.

Safe temp files in database/encrypted_db.py use proper cleanup and are
not a security concern.
2025-10-05 00:58:05 +02:00
LearningCircuit
f1be5bf417 fix: exclude research_library from security check
Research library downloads are user-controlled features where users
explicitly download academic PDFs/text to their library - not a
security issue.

The fix also corrects grep command option ordering: --exclude-dir
options must come BEFORE -- to be effective. Previously, EXCLUDE_ARGS
was placed after --, causing all exclusions to be ignored.
2025-10-05 00:48:35 +02:00
LearningCircuit
7fee0b24ed fix: remove unencrypted file writes for user data
- Store error reports in encrypted database instead of unencrypted files
- Create static benchmark template file instead of generating at runtime
- Update security check to search only src/ directory (faster, avoids .venv)
- Exclude examples/, scripts/, .github/, cookiecutter-docker/ from checks
- Delete unused web/routes/benchmark_routes.py (dead code)

Error reports containing user queries are now stored in the encrypted
database using the same storage.save_report() mechanism as successful
research, preventing sensitive data from being written to unencrypted
disk files.
2025-10-04 23:30:26 +02:00
LearningCircuit
9e3453892b fix: resolve binary file and performance issues in security check
Bug fixes:
- Add -I flag to grep to ignore binary files (fixes null byte warnings)
- Exclude static, dist, build directories (minified JS files)
- Add exclusions for *.min.js, *.bundle.js files
- Add tr command to filter null bytes from output
- Limit results with head to prevent memory issues

Performance improvements:
- Exclude more non-source directories
- Limit each grep to 500-1000 results
- Skip binary and minified files entirely

The script now:
- Runs without null byte warnings
- Completes successfully on src directory
- Properly filters binary/minified content
- Returns correct exit codes (1 when issues found)
2025-10-03 13:02:58 +02:00
LearningCircuit
1dbc97b415 fix: address additional review feedback for security check
Critical bug fixes:
- Fix broken pipe chain logic (grep exit status issue)
- Add missing Python context manager patterns (with open)
- Fix export_matches to respect exclude arguments

Improvements:
- Extract 175+ line script from YAML to separate file
- Move logic to .github/scripts/check-file-writes.sh
- Makes script testable and maintainable
- Cleaner workflow file (now just 28 lines)

The script is now:
- Easier to test independently
- More maintainable as a standalone file
- Can be run locally for development
2025-10-03 11:51:59 +02:00