Local CodeQL Analysis Guide

This guide explains how to set up and use CodeQL for local security analysis of the Local Deep Research project, covering both Python backend and JavaScript frontend code.

Prerequisites

CodeQL CLI
- Download the latest CodeQL bundle from GitHub
- Extract it to a permanent location (e.g., C:\codeql on Windows or /opt/codeql on Linux)
- Add the CodeQL executable to your system PATH
Ollama
- Install Ollama from ollama.ai
- Pull the required model: ollama pull deepseek-r1:32b
Analysis Scripts
- Copy the appropriate analysis script from docs/security/ to your project root:
  - Windows: analyze_sarif.ps1
  - Linux/Mac: analyze_sarif.sh

Setup Instructions

Windows

Install CodeQL

# Download and extract CodeQL bundle
Invoke-WebRequest -Uri "https://github.com/github/codeql-action/releases/latest/download/codeql-bundle-win64.tar.gz" -OutFile "codeql.zip"
tar -xvzf codeql.zip -C C:\codeql

# Add to PATH (run as administrator)
[Environment]::SetEnvironmentVariable("Path", "$env:Path;C:\codeql\codeql", "Machine")

Create CodeQL Databases

# Navigate to project root/src (src is recommended for codeql analysis, to avoid analyzing venv folder)
cd path/to/local-deep-research/src

# Create Python database
codeql database create --language=python --source-root . ./python-db

# Create JavaScript database (for frontend code)
codeql database create --language=javascript --source-root . ./js-db

Run Analysis

# Run Python CodeQL analysis
codeql database analyze ./python-db python-security-and-quality.qls --format=sarif-latest --output=python-results.sarif

# Run JavaScript CodeQL analysis
codeql database analyze ./js-db javascript-security-extended.qls --format=sarif-latest --output=js-results.sarif

# Merge results (optional)
codeql dataset merge --output=combined-db --source=python-db --source=js-db
codeql database analyze ./combined-db --format=sarif-latest --output=combined-results.sarif

# Analyze results with Ollama
.\analyze_sarif.ps1

Linux/Mac

Install CodeQL

# Download and extract CodeQL bundle
wget https://github.com/github/codeql-action/releases/latest/download/codeql-bundle-linux64.tar.gz
tar -xvzf codeql-bundle-linux64.tar.gz -C /opt/codeql

# Add to PATH (add to ~/.bashrc or ~/.zshrc)
export PATH=$PATH:/opt/codeql/codeql

Create CodeQL Databases

# Navigate to project root/src
cd path/to/local-deep-research/src

# Create Python database
codeql database create --language=python --source-root . ./python-db

# Create JavaScript database
codeql database create --language=javascript --source-root . ./js-db

Run Analysis

# Make script executable
chmod +x analyze_sarif.sh

# Run Python CodeQL analysis
codeql database analyze ./python-db python-security-and-quality.qls --format=sarif-latest --output=python-results.sarif

# Run JavaScript CodeQL analysis
codeql database analyze ./js-db javascript-security-extended.qls --format=sarif-latest --output=js-results.sarif

# Merge results (optional)
codeql dataset merge --output=combined-db --source=python-db --source=js-db
codeql database analyze ./combined-db --format=sarif-latest --output=combined-results.sarif

# Analyze results with Ollama
./analyze_sarif.sh

Important Queries

The following CodeQL queries are particularly relevant for our project:

Python Security
- python-security-and-quality.qls: General security and code quality
- python/sql-injection: SQL injection vulnerabilities
- python/hardcoded-credentials: Hardcoded secrets
- python/unsafe-deserialization: Insecure deserialization
JavaScript Security
- javascript-security-extended.qls: Comprehensive security checks
- javascript/xss: Cross-site scripting vulnerabilities
- javascript/prototype-pollution: Prototype pollution issues
- javascript/express-misconfigured-cors: CORS misconfigurations
- javascript/unsafe-dynamic-import: Unsafe dynamic imports
- javascript/unsafe-eval: Unsafe eval() usage
Custom Queries
- Logging injection vulnerabilities
- Uninitialized variables
- API security issues
- Frontend security best practices

Analysis Script Features

The analysis scripts (analyze_sarif.ps1 and analyze_sarif.sh) provide:

Input Validation
- Checks if Ollama is running
- Validates SARIF file format
- Verifies required dependencies
Error Handling
- Graceful error messages
- Color-coded output
- Detailed error reporting
Output
- Human-readable analysis
- Prioritized findings
- Recommended fixes
- Language-specific recommendations

Troubleshooting

Ollama Connection Issues
- Ensure Ollama is running: ollama serve
- Check if the model is pulled: ollama list
- Verify the endpoint in the script matches your setup
CodeQL Database Creation
- Clean previous databases: rm -rf ./python-db ./js-db
- Ensure dependencies are installed (Python and Node.js)
- Check for sufficient disk space
- For JavaScript: Ensure node_modules is present
Analysis Script Issues
- Verify script permissions (Linux/Mac)
- Check PowerShell execution policy (Windows)
- Ensure all required tools are installed

Best Practices

Regular Analysis
- Run analysis before major commits
- Schedule regular security scans
- Review and address findings promptly
- Run both frontend and backend scans
Database Management
- Clean old databases regularly
- Keep CodeQL CLI updated
- Use appropriate query suites
- Consider using RAM disk for large projects
Results Handling
- Document resolved issues
- Track false positives
- Share findings with the team
- Prioritize critical frontend and backend issues

6.3 KiB Raw Blame History