local-deep-research

mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-16 03:51:07 +03:00

Author	SHA1	Message	Date
LearningCircuit	0fa151a4eb	fix: resolve gitleaks false positives with explicit config and baseline The gitleaks action was still flagging placeholder API key examples despite having them in the allowlist. This fix addresses the root causes: 1. Add explicit GITLEAKS_CONFIG environment variable to workflow to ensure the config file is loaded by gitleaks-action v2 2. Add GITLEAKS_BASELINE_PATH to use the baseline ignore file 3. Add secretGroup = 2 to the generic-secret rule to extract just the secret value (not the full match including KEY=), allowing the existing allowlist regexes like 'your-.*-key-here' to work properly 4. Create .gitleaksignore baseline file with specific fingerprints for known false positives in historical commits 5. Update .gitignore to track .gitleaksignore file 6. Add .gitleaksignore to file-whitelist-check scripts in both .github/scripts/ and .pre-commit-hooks/	2026-01-25 12:24:23 +01:00
LearningCircuit	b5b56d1b60	fix: address CI failures for form validation tests - Add tests/ui_tests/password.js to whitelist safe filename patterns - Add blur() call before checking pattern validity to ensure browser updates validation state properly	2025-12-24 11:05:11 +01:00
LearningCircuit	ab88b95d14	fix: add .trivyignore to whitelist	2025-12-08 02:25:26 +01:00
LearningCircuit	51fbcf3dda	fix: exclude security/path_validator.py from hardcoded path check (#1254 ) The path_validator.py file legitimately contains system directory paths like /var/log, /etc, etc. as part of its security policy for defining which directories should be restricted from user access. These are not environment-specific hardcoded paths but rather security policy definitions.	2025-12-05 09:09:58 -05:00
LearningCircuit	730a17446b	fix: remove naive secret detection from whitelist-check (gitleaks handles this) The SECRET_VIOLATIONS check was causing false positives by flagging legitimate code that references keywords like 'api_key', 'password', 'token' (e.g., class attributes like `requires_api_key = False`). Gitleaks already runs as a separate workflow and handles secret detection with context-aware rules that don't produce these false positives.	2025-12-05 00:12:14 +01:00
github-actions[bot]	34dcffcd13	Merge remote-tracking branch 'origin/dev' into sync-main-to-dev	2025-12-04 22:23:53 +00:00
LearningCircuit	04246669c3	fix: resolve subshell bug in whitelist-check and reduce output noise (#1231 ) - Remove pipe before while loop to fix subshell issue where violation arrays were always empty (violations detected but never reported) - Replace per-file "Checking:" output with progress dots every 10 files - Add summary showing total files checked	2025-12-04 22:23:06 +00:00
LearningCircuit	be8311e061	fix: whitelist llm_providers and research_library JSON configs Add defaults/llm_providers/.json and defaults/research_library/.json to SAFE_FILE_PATTERNS. These configuration files contain 'api_key' as field names (not actual secrets), which triggers false positive secret pattern detection.	2025-12-04 20:27:30 +01:00
LearningCircuit	d869d4bf78	Merge branch 'dev' into feature/comprehensive-security-enhancements	2025-12-03 16:26:11 +01:00
LearningCircuit	40b26cfc60	Merge dev into feature/comprehensive-security-enhancements	2025-12-03 16:24:57 +01:00
LearningCircuit	ea73116db6	feat: add CI validation for Docker image SHA digest pinning Add comprehensive validation to enforce SHA256 digest pinning across all Docker image references (Dockerfiles, docker-compose, and workflow files). New Files: - .github/scripts/validate-docker-compose-images.sh Bash script that validates docker-compose.yml files for unpinned images. Allows documented exceptions for own images and templates. - .github/scripts/validate-workflow-images.py Python script with proper YAML parsing to validate GitHub Actions workflow service containers and container images. - .github/workflows/validate-image-pinning.yml CI workflow that runs both validators on PR changes. Provides clear error messages and fix instructions when violations are found. Why This Matters: Image tags are mutable and can be reassigned to malicious images in supply chain attacks. SHA256 digests are immutable cryptographic identifiers that guarantee the exact same image bytes every deployment. This validation: - Blocks PRs with unpinned images - Shows violations directly in PR checks (not just Security tab) - Provides clear fix instructions - Runs efficiently (only on relevant file changes) Complements: - PR #1184 (pins Dockerfile and workflow images) - PR #1218 (pins docker-compose images)	2025-12-03 12:35:09 +01:00
LearningCircuit	9332256489	Merge branch 'dev' into refactor/remove-dogpile-cache-add-stampede-protection Resolve merge conflicts: - pyproject.toml: Keep flask-limiter from dev, remove dogpile-cache/redis/msgpack as intended - test_env_var_usage.py: Keep rate_limiter.py from dev, remove memory_cache/ as intended	2025-12-01 21:20:16 +01:00
LearningCircuit	94d9f08dd5	fix: resolve CI check failures in GitHub Actions workflows (#1197 ) - Add required permissions blocks to workflow files (Checkov CKV2_GHA_1) - check-css-classes.yml: contents: read, pull-requests: write - notification-tests.yml: contents: read - security-file-write-check.yml: contents: read - mobile-ui-tests.yml: contents: read, checks: write - responsive-ui-tests-enhanced.yml: contents: read - Fix shellcheck SC2086 warnings in check-file-writes.sh - Add disable comments for intentional word splitting - Fix zizmor template-injection vulnerabilities in release.yml - Move template expansions to env blocks - Use environment variables in shell commands and scripts - Use process.env in github-script blocks - Remove workflow_dispatch inputs from responsive-ui-tests-enhanced.yml (fixes Checkov CKV_GHA_7)	2025-12-01 08:21:36 -05:00
LearningCircuit	0d26c46c8a	Merge dev into sync-main-to-dev - resolve conflicts Resolved conflicts: - .gitleaks.toml: Combined regex patterns from both branches, added path allowlists - pyproject.toml: Kept updated versions from dev + added hypothesis from main - __version__.py: Keep 1.3.0 from dev - news.js: Removed duplicate toggleExpanded function (already exists at line 1291) - pdm.lock: Regenerated with pdm lock	2025-11-29 19:36:36 +01:00
LearningCircuit	0d0f8e8cf6	Merge origin/dev into feature/comprehensive-security-enhancements Resolve merge conflicts: - .github/scripts/file-whitelist-check.sh: Keep both .tsv and .jinja2 patterns - .pre-commit-hooks/file-whitelist-check.sh: Keep both .tsv and .jinja2 patterns - docker-compose.yml: Take dev version - package.json: Take dev version with security scripts - pdm.lock: Take dev version - pyproject.toml: Take dev version - research_service.py: Keep usedforsecurity=False for security scanners - ui.js: Take dev version with ldr-alert-close class - search_engine_local.py: Keep usedforsecurity=False for security scanners	2025-11-28 01:11:20 +01:00
LearningCircuit	5989566fb1	fix: remove remaining memory_cache references Update CI workflows and test files after memory_cache module removal: - Update api-tests.yml to import search_cache instead of memory_cache - Remove memory_cache from file-whitelist-check.sh - Remove memory_cache from test_env_var_usage.py allowlist	2025-11-28 00:12:26 +01:00
LearningCircuit	309b2a619e	Fix shellcheck warnings in all shell scripts - Quote variables to prevent word splitting (SC2086) - Use 'read -r' to prevent backslash mangling (SC2162) - Use 'cd ... \|\| exit' for safe directory changes (SC2164) - Use '-n' instead of '\! -z' for string checks (SC2236) - Use pgrep instead of ps \| grep (SC2009) - Check exit codes directly instead of using $? (SC2181) - Declare and assign separately for exports (SC2155) - Fix unused loop variables with underscore prefix (SC2034) - Remove stray markdown backticks from ollama_entrypoint.sh	2025-11-27 19:18:10 +01:00
LearningCircuit	556786707c	Fix shellcheck warnings in file-whitelist-check.sh - Quote $GITHUB_BASE_REF to prevent word splitting (SC2086) - Quote $FILE_SIZE in echo (SC2086) - Use read -r to prevent backslash mangling (SC2162)	2025-11-27 01:35:27 +01:00
LearningCircuit	1dd64ebc76	Merge main into feature/comprehensive-security-enhancements Resolved conflicts: - docker-publish.yml: Combined Cosign/Syft install with version determination - update-npm-dependencies.yml: Use pinned setup-node SHA - docker-compose.yml: Keep RAG cache volume and Unraid documentation - package.json: Include dompurify for XSS prevention, keep marked ^17.0.0 - pdm.lock: Accept main's version - __version__.py: Keep 1.3.0 for comprehensive security release - ui.js: Use safer textContent for close button (XSS prevention)	2025-11-27 00:33:02 +01:00
LearningCircuit	0685f311f6	Merge branch 'dev' into feat/notifications	2025-11-23 18:40:28 +01:00
LearningCircuit	c0bc156189	Merge branch 'dev' into sync-main-to-dev	2025-11-23 18:35:38 +01:00
LearningCircuit	9343c0b5e4	feat: Add favicon and project icon (#1106 ) * feat: Add favicon (PNG only, no duplication) Adds the neon microscope icon as website favicon to fix 404 error. Changes: - Added favicon.png (256x256) in static/ directory - Updated base.html to reference favicon using url_for() for proper path resolution - Updated file whitelist to allow favicon The icon was recovered from git history (commit `495a905d`) and is metadata-clean. Improvements based on AI review: - Removed file duplication (single favicon.png instead of two copies) - Used url_for() instead of hardcoded paths for better Flask compatibility * chore: auto-bump version to 1.2.16 * fix: Use direct path for favicon instead of url_for() The Flask app has static_folder=None and uses a custom static route with endpoint 'app_serve_static', so url_for('static', ...) doesn't exist and was causing BuildError on all pages. Using hardcoded /static/favicon.png path which matches the custom @app.route('/static/<path:path>') route in app_factory.py. This fixes all UI test failures. --------- Co-authored-by: GitHub Action <action@github.com>	2025-11-22 14:28:50 -05:00
LearningCircuit	65519a7e66	Merge branch 'dev' into feat/notifications	2025-11-22 12:39:11 +01:00
LearningCircuit	83a88d260a	Merge branch 'dev' into sync-main-to-dev	2025-11-21 23:20:30 +01:00
LearningCircuit	e5c8d5afcf	Merge pull request #1084 from LearningCircuit/fix-ossf-scorecard-workflow Fix OSSF Scorecard workflow	2025-11-20 01:34:55 +01:00
LearningCircuit	7391f36066	Merge branch 'fix/security-headers-zap-scan-1041' into feature/comprehensive-security-enhancements Merges comprehensive security headers implementation from security-headers branch: - SecurityHeaders middleware for HTTP security headers - CORS handling with origin reflection - CSP, X-Frame-Options, HSTS, and other security headers - Removes inline security header code from app_factory - Removes ZAP workflow (replaced by security headers) Conflict resolutions: - Kept our SESSION_COOKIE_SECURE CI detection logic (more secure than always False) - Replaced inline security headers with SecurityHeaders middleware - Updated version to 1.3.0 - Kept our search_engine_github implementation	2025-11-13 00:42:10 +01:00
LearningCircuit	eee317165f	Add comprehensive security testing and supply chain security This PR implements a comprehensive security enhancement plan addressing identified gaps in the security testing infrastructure. Phase 0: Fix Broken Security Foundation - Create missing tests/security/ directory with 6 test files: * test_sql_injection.py - SQL injection prevention tests * test_xss_prevention.py - XSS sanitization tests * test_csrf_protection.py - CSRF token validation tests * test_auth_security.py - Authentication security tests * test_api_security.py - OWASP API Security Top 10 tests * test_input_validation.py - Input validation tests - Add custom Semgrep security rules: * .semgrep/rules/ldr-security.yaml - 16 LDR-specific rules * Covers: hardcoded secrets, SQL injection, command injection, path traversal, SSRF, unsafe deserialization, and more - Fix security-tests.yml workflow: * Remove \|\| true to make tests actually fail when they should * Add conditional checks for legacy test files * Safety check uses continue-on-error (expected behavior) Phase 1: Software Supply Chain Security - Enhance docker-publish.yml with: * Cosign keyless signing with GitHub OIDC * SLSA provenance attestation for build integrity * SBOM generation with Syft * Automated signature verification * Required permissions for id-token and packages Phase 2: Dynamic Application Security Testing (DAST) - Add OWASP ZAP scanning workflow: * Baseline scan on PR/push (15-20 min) * Full scan nightly (30+ min) * API-focused scanning * Custom rules configuration (.zap/rules.tsv) Security posture improved from 8/10 to 9/10 by addressing: - Broken test references (tests that didn't exist) - Docker image supply chain security - Runtime vulnerability detection via DAST - LDR-specific security patterns via Semgrep	2025-11-09 22:20:16 +01:00
tombii	9701760f3f	Add .jinja2 to file whitelist in script	2025-11-04 15:46:21 +01:00
LearningCircuit	fee12aa6dc	Resolve merge conflicts in file-whitelist-check.yml	2025-11-03 18:27:14 +01:00
LearningCircuit	057420bd0c	improve: address AI code review suggestions for security script - Replace DEBUG output with informative result summaries - Fix file processing loop to handle filenames with spaces/special chars using printf - Improve readability of security scan results with better formatting - Maintain helpful output while removing debug terminology These improvements make the script more robust and user-friendly while maintaining all security checking functionality.	2025-11-03 17:50:27 +01:00
LearningCircuit	780eb973dc	fix: resolve GitHub Actions expression length limit in file whitelist check - Extract massive inline script to separate bash file (.github/scripts/file-whitelist-check.sh) - Reduce workflow file from 680 lines to 25 lines - Fix script logic issues (while loop for file processing) - Make script executable and properly handle GitHub environment variables - Maintain all original security checking functionality This resolves the "Exceeded max expression length 21000" error that was preventing the workflow from running.	2025-11-03 17:37:56 +01:00
LearningCircuit	47204c4cb1	refactor: address PR review feedback from djpetti Changes: 1. Remove deprecated get_report_as_temp_file() method - Removed from base.py, file.py, database_with_file_backup.py - Method had zero callers and created persistent unencrypted files - Users should use export_report_to_memory() for in-memory exports 2. Add **kwargs support to write_json_verified() - Allows passing any json.dumps() parameters (ensure_ascii, sort_keys, etc.) - Maintains backward compatibility with indent=2 default - Increases flexibility without breaking existing code 3. Update security check script - Remove get_report_as_temp_file patterns from safe usage list - Security check still passes after removal	2025-10-07 01:15:57 +02:00
LearningCircuit	f987f293d3	fix: apply security verification to research_library file writes - Updated _save_pdf() to use write_file_verified with research_library.enable_pdf_storage setting - Updated _save_text() to use write_file_verified with research_library.enable_txt_storage setting - Refactored _extract_text_from_pdf() to use in-memory pypdf processing instead of writing temp files to disk - Removed research_library from security check exclusions to enable verification All file writes in research_library now go through security verification with proper setting checks.	2025-10-06 00:51:24 +02:00
LearningCircuit	0b5c010880	security: add verified file write system with setting-based controls - Create file_write_verifier.py with write_file_verified() and write_json_verified() - All file writes now require explicit security settings to be enabled - Settings control: benchmark.allow_file_output, api.allow_file_output, system.allow_config_write, storage.allow_file_backup, storage.allow_temp_file_export - Update security check script to recognize verified write patterns - Default deny policy: file writes fail if setting not explicitly set to true - Clear error messages guide users to enable required settings	2025-10-06 00:03:39 +02:00
LearningCircuit	0d34d58502	fix: exclude system config and safe temp files from security check System config files (Flask secret key, server config, search index metadata) are not user data - they're system configuration. Safe temp files in database/encrypted_db.py use proper cleanup and are not a security concern.	2025-10-05 00:58:05 +02:00
LearningCircuit	f1be5bf417	fix: exclude research_library from security check Research library downloads are user-controlled features where users explicitly download academic PDFs/text to their library - not a security issue. The fix also corrects grep command option ordering: --exclude-dir options must come BEFORE -- to be effective. Previously, EXCLUDE_ARGS was placed after --, causing all exclusions to be ignored.	2025-10-05 00:48:35 +02:00
LearningCircuit	7fee0b24ed	fix: remove unencrypted file writes for user data - Store error reports in encrypted database instead of unencrypted files - Create static benchmark template file instead of generating at runtime - Update security check to search only src/ directory (faster, avoids .venv) - Exclude examples/, scripts/, .github/, cookiecutter-docker/ from checks - Delete unused web/routes/benchmark_routes.py (dead code) Error reports containing user queries are now stored in the encrypted database using the same storage.save_report() mechanism as successful research, preventing sensitive data from being written to unencrypted disk files.	2025-10-04 23:30:26 +02:00
LearningCircuit	9e3453892b	fix: resolve binary file and performance issues in security check Bug fixes: - Add -I flag to grep to ignore binary files (fixes null byte warnings) - Exclude static, dist, build directories (minified JS files) - Add exclusions for .min.js, .bundle.js files - Add tr command to filter null bytes from output - Limit results with head to prevent memory issues Performance improvements: - Exclude more non-source directories - Limit each grep to 500-1000 results - Skip binary and minified files entirely The script now: - Runs without null byte warnings - Completes successfully on src directory - Properly filters binary/minified content - Returns correct exit codes (1 when issues found)	2025-10-03 13:02:58 +02:00
LearningCircuit	1dbc97b415	fix: address additional review feedback for security check Critical bug fixes: - Fix broken pipe chain logic (grep exit status issue) - Add missing Python context manager patterns (with open) - Fix export_matches to respect exclude arguments Improvements: - Extract 175+ line script from YAML to separate file - Move logic to .github/scripts/check-file-writes.sh - Makes script testable and maintainable - Cleaner workflow file (now just 28 lines) The script is now: - Easier to test independently - More maintainable as a standalone file - Can be run locally for development	2025-10-03 11:51:59 +02:00

39 Commits