* fix(security): SSRF parser-differential bypass (GHSA-g23j-2vwm-5c25) The SSRF validator parsed URLs with `urllib.parse.urlparse` while `requests` parsed them with `urllib3`. For URLs like `http://127.0.0.1\@1.1.1.1` the two parsers disagreed: urlparse extracted `1.1.1.1` (passing the SSRF check) while requests connected to `127.0.0.1`. Two-layer fix in `ssrf_validator.validate_url` and `NotificationURLValidator.validate_service_url`: - Layer 1: reject URLs containing backslash, ASCII control bytes, or whitespace (RFC 3986 forbids these). Catches the advisory PoC. - Layer 2: extract host with `urllib3.util.parse_url` — the same parser `requests` uses internally — so the validator and the HTTP client agree on destination by construction. Load-bearing on the SafeSession.send path where requests has canonicalised `\` to `%5C`. Credit: @Fushuling, @RacerZ-fighting. * fix(security): block IPv6 unspecified address (::) in SSRF check Follow-up to the parser-differential SSRF fix. ``::`` (and equivalent representations ``0::``, ``0:0:0:0:0:0:0:0``, ``::0``) was not in ``BLOCKED_IP_RANGES`` even though the IPv4 equivalent ``0.0.0.0`` was (via ``0.0.0.0/8``). On Linux the kernel routes connections to ``[::]:port`` to a service bound on ``[::1]:port`` — same semantics as ``0.0.0.0`` for IPv4 — so an attacker could reach loopback services through the unspecified-IPv6 form. Verified end-to-end: a server bound on ``[::1]:<port>`` (loopback only) received connections from ``http://[::]:<port>/`` before this fix and none after. Add ``::/128`` to ``PRIVATE_IP_RANGES`` so all four equivalent representations (``::``, ``0::``, ``::0``, ``0:0:0:0:0:0:0:0``) are caught after ``ipaddress.ip_address`` normalisation. Adds regression tests in both ``test_ssrf_validator.py`` and ``test_notification_validator.py``. * test(security): expand SSRF coverage across DNS, alt IP forms, flags Adds 62 tests across six new classes in test_ssrf_validator.py: - TestDnsResolvedBypass — load-bearing path for hostname URLs (not IP literals): hostname resolves to loopback / RFC1918 / link-local / AWS metadata; multi-A-record DNS with one private IP; gaierror fail-closed; IPv6 DNS resolution; IPv4-mapped-IPv6 DNS resolution. - TestAlternateIpFormsBlocked — octal, decimal-int, short-form (127.1), IPv4-mapped IPv6 literals for loopback / RFC1918 / AWS metadata. - TestAllowFlagMatrix — allow_localhost / allow_private_ips combinations against the new ::/128 entry; locks in that :: stays blocked under every flag (it is unspecified, not loopback) and that AWS metadata stays blocked under every flag. - TestSchemeRejection — file:, ftp:, gopher:, dict:, schemeless, scheme-relative; uppercase HTTPS still accepted (case-insensitive). - TestNeverRaises — parametrized pathological inputs including empty, control bytes, malformed brackets, overflow ports, lone surrogates, 100k-char URLs. Asserts validate_url returns bool, never raises. - TestOutOfScopeBehaviorLockedIn — documents current behaviour for 6to4 (2002:7f00:1::) and NAT64 (64:ff9b::7f00:1) wrapped loopback. These pass today (filed as separate hardening); flip the assertions if BLOCKED_IP_RANGES is extended. Full security suite: 3161 passed. * fix(security): harden SSRF metadata blocks and redact log userinfo Two defense-in-depth improvements to SSRF protection. 1. Hardcode-block additional cloud-provider metadata IPs. Previously only AWS IMDS (169.254.169.254) was always-blocked. The same parallel applies to other cloud-credential endpoints that become reachable when a caller passes allow_private_ips=True (legitimately used for SearXNG / Ollama / etc on private networks): - 169.254.170.2 AWS ECS task metadata v3 - 169.254.170.23 AWS ECS task metadata v4 - 169.254.0.23 Tencent Cloud - 100.100.100.200 AlibabaCloud Replace AWS_METADATA_IP with ALWAYS_BLOCKED_METADATA_IPS frozenset and update the membership check in is_ip_blocked. Test files that imported AWS_METADATA_IP updated to import the new constant. 2. Redact userinfo from URL rejection logs. RFC 3986 §3.2.1 allows credentials in URL userinfo. Five log sites in ssrf_validator.py and three in notification_validator.py used to interpolate {url} or url[:50]; route all of them through a new redact_url_for_log() helper that returns only scheme://host:port. Plus drift cleanup: SECURITY.md / SearXNG-Setup.md / safe_requests.py docstrings / pdf_service.py comment refreshed for the five-IP set. Tech-debt: add membership tests for ::/128 and 0.0.0.0/8 that were missing after PR #3873's IPv6-unspecified bypass fix. * fix(security): address review nits on #3882 - Fix docstring indentation in SafeSession.__init__ (Note: continuation was 12-space indented in a 16-space context). Sphinx/autodoc would have rendered it misaligned. - Remove unused _all_metadata_ips helper from TestAlwaysBlockedMetadataIPs — both test methods inline the same logic; the helper was dead. AI code review feedback on #3882, no behavior change.
3.9 KiB
SearXNG Integration for Local Deep Research
This document explains how to configure and use the SearXNG integration with Local Deep Research.
Configuring SearXNG Access
The SearXNG search engine is disabled by default until you provide an instance URL. This ensures the system doesn't attempt to use public instances without explicit configuration.
Setting Up Access
You have two ways to enable the SearXNG search engine:
-
Environment Variable (Recommended):
# Add to your .env file or set in your environment SEARXNG_INSTANCE=http://localhost:8080 # Optional: Set custom delay between requests (in seconds) SEARXNG_DELAY=2.0 -
Configuration Parameter: Add to your
config.py:# In config.py SEARXNG_CONFIG = { "instance_url": "http://localhost:8080", "delay_between_requests": 2.0 }
Self-Hosting SearXNG (Recommended)
For the most ethical usage, we strongly recommend self-hosting your own SearXNG instance:
Using Docker (easiest method)
# Pull the SearXNG Docker image
docker pull searxng/searxng
# Run SearXNG (will be available at http://localhost:8080)
docker run -d -p 8080:8080 --name searxng searxng/searxng
Using Docker Compose (recommended for production)
- Create a file named
docker-compose.ymlwith the following content:
version: '3'
services:
searxng:
container_name: searxng
image: searxng/searxng
ports:
- "8080:8080"
volumes:
- ./searxng:/etc/searxng
environment:
- SEARXNG_BASE_URL=http://localhost:8080/
restart: unless-stopped
- Run with Docker Compose:
docker-compose up -d
Using Public Instances
If you must use a public instance:
- Get Permission: Always contact the administrator of any public instance
- Respect Resources: Use a longer delay (4-5 seconds minimum) between requests
- Limited Usage: Keep your research volume reasonable
Example configuration for a public instance:
SEARXNG_INSTANCE=https://instance.example.com
SEARXNG_DELAY=5.0
Checking Configuration
To verify if SearXNG is properly configured:
from web_search_engines.search_engine_factory import create_search_engine
# Create the engine
engine = create_search_engine("searxng")
# Check if available
if engine and hasattr(engine, 'is_available') and engine.is_available:
print(f"SearXNG configured with instance: {engine.instance_url}")
print(f"Delay between requests: {engine.delay_between_requests} seconds")
else:
print("SearXNG is not properly configured or is disabled")
Network Security
SearXNG is designed for self-hosting, so Local Deep Research allows SearXNG to access private network IPs by default. This means you can run SearXNG on:
- Localhost:
http://127.0.0.1:8080orhttp://localhost:8080 - LAN IPs:
http://192.168.1.100:8080,http://10.0.0.5:8080,http://172.16.0.2:8080 - Docker networks:
http://172.17.0.2:8080 - Local hostnames:
http://searxng.local:8080(if configured in DNS/hosts)
This is intentional and secure because:
- The SearXNG URL is admin-configured, not user input
- Private IPs are only accessible from your local network
- Cloud metadata endpoints (AWS IMDS / ECS, Azure, OCI, DigitalOcean, AlibabaCloud, Tencent Cloud — see
ssrf_validator.ALWAYS_BLOCKED_METADATA_IPS) are always blocked to prevent credential theft in cloud environments
Troubleshooting
If you encounter errors:
- Check that your instance is running
- Verify the URL is correct in your environment variables
- Ensure you can access the instance in your browser
- Check firewall settings and network connectivity