docs: add comprehensive CodeQL guide with JavaScript analysis

2026-06-15 19:46:56 +03:00 · 2025-04-02 15:14:46 +03:00
parent 80b1e62515
commit e83ea69f83
1 changed files with 193 additions and 0 deletions
--- a/docs/security/CODEQL_GUIDE.md
+++ b/docs/security/CODEQL_GUIDE.md
@@ -0,0 +1,193 @@
+# Local CodeQL Analysis Guide
+
+This guide explains how to set up and use CodeQL for local security analysis of the Local Deep Research project, covering both Python backend and JavaScript frontend code.
+
+## Prerequisites
+
+1. **CodeQL CLI**
+   - Download the latest CodeQL bundle from [GitHub](https://github.com/github/codeql-action/releases)
+   - Extract it to a permanent location (e.g., `C:\codeql` on Windows or `/opt/codeql` on Linux)
+   - Add the CodeQL executable to your system PATH
+
+2. **Ollama**
+   - Install Ollama from [ollama.ai](https://ollama.ai)
+   - Pull the required model: `ollama pull deepseek-r1:32b`
+
+3. **Analysis Scripts**
+   - Copy the appropriate analysis script from `docs/security/` to your project root:
+     - Windows: `analyze_sarif.ps1`
+     - Linux/Mac: `analyze_sarif.sh`
+
+## Setup Instructions
+
+### Windows
+
+1. **Install CodeQL**
+   ```powershell
+   # Download and extract CodeQL bundle
+   Invoke-WebRequest -Uri "https://github.com/github/codeql-action/releases/latest/download/codeql-bundle-win64.tar.gz" -OutFile "codeql.zip"
+   tar -xvzf codeql.zip -C C:\codeql
+
+   # Add to PATH (run as administrator)
+   [Environment]::SetEnvironmentVariable("Path", "$env:Path;C:\codeql\codeql", "Machine")
+   ```
+
+2. **Create CodeQL Databases**
+   ```powershell
+   # Navigate to project root/src (src is recommended for codeql analysis, to avoid analyzing venv folder)
+   cd path/to/local-deep-research/src
+
+   # Create Python database
+   codeql database create --language=python --source-root . ./python-db
+
+   # Create JavaScript database (for frontend code)
+   codeql database create --language=javascript --source-root . ./js-db
+   ```
+
+3. **Run Analysis**
+   ```powershell
+   # Run Python CodeQL analysis
+   codeql database analyze ./python-db python-security-and-quality.qls --format=sarif-latest --output=python-results.sarif
+
+   # Run JavaScript CodeQL analysis
+   codeql database analyze ./js-db javascript-security-extended.qls --format=sarif-latest --output=js-results.sarif
+
+   # Merge results (optional)
+   codeql dataset merge --output=combined-db --source=python-db --source=js-db
+   codeql database analyze ./combined-db --format=sarif-latest --output=combined-results.sarif
+
+   # Analyze results with Ollama
+   .\analyze_sarif.ps1
+   ```
+
+### Linux/Mac
+
+1. **Install CodeQL**
+   ```bash
+   # Download and extract CodeQL bundle
+   wget https://github.com/github/codeql-action/releases/latest/download/codeql-bundle-linux64.tar.gz
+   tar -xvzf codeql-bundle-linux64.tar.gz -C /opt/codeql
+
+   # Add to PATH (add to ~/.bashrc or ~/.zshrc)
+   export PATH=$PATH:/opt/codeql/codeql
+   ```
+
+2. **Create CodeQL Databases**
+   ```bash
+   # Navigate to project root/src
+   cd path/to/local-deep-research/src
+
+   # Create Python database
+   codeql database create --language=python --source-root . ./python-db
+
+   # Create JavaScript database
+   codeql database create --language=javascript --source-root . ./js-db
+   ```
+
+3. **Run Analysis**
+   ```bash
+   # Make script executable
+   chmod +x analyze_sarif.sh
+
+   # Run Python CodeQL analysis
+   codeql database analyze ./python-db python-security-and-quality.qls --format=sarif-latest --output=python-results.sarif
+
+   # Run JavaScript CodeQL analysis
+   codeql database analyze ./js-db javascript-security-extended.qls --format=sarif-latest --output=js-results.sarif
+
+   # Merge results (optional)
+   codeql dataset merge --output=combined-db --source=python-db --source=js-db
+   codeql database analyze ./combined-db --format=sarif-latest --output=combined-results.sarif
+
+   # Analyze results with Ollama
+   ./analyze_sarif.sh
+   ```
+
+## Important Queries
+
+The following CodeQL queries are particularly relevant for our project:
+
+1. **Python Security**
+   - `python-security-and-quality.qls`: General security and code quality
+   - `python/sql-injection`: SQL injection vulnerabilities
+   - `python/hardcoded-credentials`: Hardcoded secrets
+   - `python/unsafe-deserialization`: Insecure deserialization
+
+2. **JavaScript Security**
+   - `javascript-security-extended.qls`: Comprehensive security checks
+   - `javascript/xss`: Cross-site scripting vulnerabilities
+   - `javascript/prototype-pollution`: Prototype pollution issues
+   - `javascript/express-misconfigured-cors`: CORS misconfigurations
+   - `javascript/unsafe-dynamic-import`: Unsafe dynamic imports
+   - `javascript/unsafe-eval`: Unsafe eval() usage
+
+3. **Custom Queries**
+   - Logging injection vulnerabilities
+   - Uninitialized variables
+   - API security issues
+   - Frontend security best practices
+
+## Analysis Script Features
+
+The analysis scripts (`analyze_sarif.ps1` and `analyze_sarif.sh`) provide:
+
+1. **Input Validation**
+   - Checks if Ollama is running
+   - Validates SARIF file format
+   - Verifies required dependencies
+
+2. **Error Handling**
+   - Graceful error messages
+   - Color-coded output
+   - Detailed error reporting
+
+3. **Output**
+   - Human-readable analysis
+   - Prioritized findings
+   - Recommended fixes
+   - Language-specific recommendations
+
+## Troubleshooting
+
+1. **Ollama Connection Issues**
+   - Ensure Ollama is running: `ollama serve`
+   - Check if the model is pulled: `ollama list`
+   - Verify the endpoint in the script matches your setup
+
+2. **CodeQL Database Creation**
+   - Clean previous databases: `rm -rf ./python-db ./js-db`
+   - Ensure dependencies are installed (Python and Node.js)
+   - Check for sufficient disk space
+   - For JavaScript: Ensure node_modules is present
+
+3. **Analysis Script Issues**
+   - Verify script permissions (Linux/Mac)
+   - Check PowerShell execution policy (Windows)
+   - Ensure all required tools are installed
+
+## Best Practices
+
+1. **Regular Analysis**
+   - Run analysis before major commits
+   - Schedule regular security scans
+   - Review and address findings promptly
+   - Run both frontend and backend scans
+
+2. **Database Management**
+   - Clean old databases regularly
+   - Keep CodeQL CLI updated
+   - Use appropriate query suites
+   - Consider using RAM disk for large projects
+
+3. **Results Handling**
+   - Document resolved issues
+   - Track false positives
+   - Share findings with the team
+   - Prioritize critical frontend and backend issues
+
+## Additional Resources
+
+- [CodeQL Documentation](https://codeql.github.com/docs/)
+- [Python CodeQL Queries](https://github.com/github/codeql/tree/main/python/ql/src)
+- [JavaScript CodeQL Queries](https://github.com/github/codeql/tree/main/javascript/ql/src)
+- [Ollama Documentation](https://ollama.ai/docs)