docs: add config docs generator script (#2134)

* docs: add config docs generator script

Add scripts/generate_config_docs.py that auto-generates
docs/CONFIGURATION.md from default settings JSON files and
env_definitions/ modules. Supports both database-managed settings
and pre-database env-only settings.

Extracted from PR #1393.

Co-authored-by: daryltucker <daryltucker@users.noreply.github.com>

* docs: improve config docs generator with auto-discovery, --check mode, and CI

- Auto-discover env_definitions modules instead of hardcoding filenames
- Extract additional AST fields: required, min/max_value, allowed_values,
  deprecated_env_var
- Expand env-only settings table with Type, Required, Constraints,
  Deprecated Alias columns
- Add --check mode (exit 1 when docs are stale) for CI validation
- Add inline gitleaks:allow on key extraction line
- Generate initial docs/CONFIGURATION.md covering all 18 JSON files and
  5 env_definitions modules
- Add check-config-docs.yml PR workflow (zero deps, stdlib only)
- Add docs regeneration step to version_check.yml
- Allowlist docs/CONFIGURATION.md in .gitleaks.toml (references env var
  names, not actual secrets)
- Add comprehensive tests (27 tests: unit, integration, check mode,
  error handling)

* docs: add CONFIGURATION.md references to README, env_configuration, and developing guides

* docs: regenerate CONFIGURATION.md after merge with main

Picks up db_config.cipher_memory_security default change (OFF -> ON).

---------

Co-authored-by: daryltucker <daryltucker@users.noreply.github.com>
This commit is contained in:
LearningCircuit
2026-02-22 21:06:27 +01:00
committed by GitHub
parent 76a2c77d3e
commit 20fedc67b1
10 changed files with 1376 additions and 1 deletions

35
.github/workflows/check-config-docs.yml vendored Normal file
View File

@@ -0,0 +1,35 @@
name: Check Configuration Docs
on:
pull_request:
paths:
- 'src/local_deep_research/defaults/**/*.json'
- 'src/local_deep_research/settings/env_definitions/**'
- 'scripts/generate_config_docs.py'
- 'docs/CONFIGURATION.md'
workflow_dispatch:
permissions:
contents: read
jobs:
check-config-docs:
runs-on: ubuntu-latest
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@5ef0c079ce82195b2a36a210272d6b661572d83e # v2.14.2
with:
egress-policy: audit
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.11'
- name: Check configuration docs are up to date
run: python scripts/generate_config_docs.py --check

View File

@@ -121,6 +121,10 @@ jobs:
mv package.json.tmp package.json
echo "Updated package.json version to $new_version"
- name: Regenerate configuration docs
if: steps.check.outputs.needs_bump != 'false'
run: python scripts/generate_config_docs.py
- name: Generate PR body
if: steps.check.outputs.needs_bump != 'false'
id: pr_body
@@ -152,6 +156,8 @@ jobs:
echo "This PR was manually triggered via the Actions tab."
fi
echo "Approve and merge to trigger a new release."
echo ""
echo "Configuration docs (docs/CONFIGURATION.md) have been regenerated."
} > "$body_file"
echo "body_file=$body_file" >> "$GITHUB_OUTPUT"

View File

@@ -130,7 +130,9 @@ paths = [
# Encrypted DB manager uses key derivation and password parameters (not actual secrets)
'''src/local_deep_research/database/encrypted_db\.py$''',
# Metrics search tracker uses password variable names for DB credentials (not actual secrets)
'''src/local_deep_research/metrics/search_tracker\.py$'''
'''src/local_deep_research/metrics/search_tracker\.py$''',
# Auto-generated configuration docs reference env var names like ENCRYPTION_KEY (not actual secrets)
'''docs/CONFIGURATION\.md$'''
]
files = [
"\\.env\\.example$",

View File

@@ -568,3 +568,6 @@ dc366dab85c1a02f8ffadc38c8db210cf2a61b95:src/local_deep_research/defaults/llm_co
# README.md: Documentation example showing settings.get() usage
fed86a7c9481006cb963b1039bf9dc818a5af063:README.md:generic-secret:115
# generate_config_docs.py: AST node value extraction (key = key_node.value), not a secret
scripts/generate_config_docs.py:generic-secret:92

View File

@@ -542,6 +542,7 @@ Track costs, performance, and usage with detailed metrics. [Learn more →](docs
- [Frequently Asked Questions](docs/faq.md)
- [API Quickstart](docs/api-quickstart.md)
- [Configuration Guide](docs/env_configuration.md)
- [Full Configuration Reference](docs/CONFIGURATION.md)
### Core Features
- [All Features Guide](docs/features.md)

605
docs/CONFIGURATION.md Normal file
View File

@@ -0,0 +1,605 @@
# Configuration Reference
This document is automatically generated from the application's default settings.
All settings can be configured via the Web UI (Settings page), or overridden via Environment Variables.
## Environment Variables
To override a setting using an environment variable, convert the key to uppercase, replace dots with underscores, and prefix with `LDR_`.
For example, `app.debug` becomes `LDR_APP_DEBUG`.
Configuration Priority: Web UI Config > Environment Variables > Default Values
> Environmental Variables are used to override default values, easing installation, while allowing for adjustments to configuration via Web UI.
### System Locking
There is a special environment variable `LDR_LOCKED_SETTINGS` that allows administrators to strictly enforce specific settings.
* **Variable**: `LDR_LOCKED_SETTINGS`
* **Format**: Comma-separated list of setting keys (e.g., `llm.model,app.port`)
* **Behavior**:
1. Any setting listed here **MUST** have a corresponding value defined in the environment variables (e.g., `LDR_LLM_MODEL`). If not, the application will fail to start.
2. The setting becomes **read-only** in the Web UI.
3. The **Environment Variable** value takes absolute precedence, ignoring any value in the database.
**Priority for Locked Settings**: Environment Variable > Database (Ignored) > Default (Ignored)
## Pre-Database (Env-Only) Settings
These settings are **required before database initialization** and can only be set via environment variables.
They are not available in the Web UI because they are needed to start the application.
| Environment Variable | Type | Default | Required | Constraints | Description | Category | Deprecated Alias |
|----------------------|------|---------|----------|-------------|-------------|----------|------------------|
| `LDR_BOOTSTRAP_ALLOW_UNENCRYPTED` | Boolean | `False` | No | | Allow unencrypted database (for development) | Bootstrap | LDR_ALLOW_UNENCRYPTED |
| `LDR_BOOTSTRAP_CONFIG_DIR` | Path | `None` | No | | Configuration directory path | Bootstrap | |
| `LDR_BOOTSTRAP_DATABASE_URL` | String | `None` | No | | Database connection URL | Bootstrap | |
| `LDR_BOOTSTRAP_DATA_DIR` | Path | `None` | No | | Data directory path | Bootstrap | |
| `LDR_BOOTSTRAP_ENABLE_FILE_LOGGING` | Boolean | `False` | No | | Enable logging to file | Bootstrap | |
| `LDR_BOOTSTRAP_ENCRYPTION_KEY` | Secret | `None` | No | | Database encryption key | Bootstrap | |
| `LDR_BOOTSTRAP_LOG_DIR` | Path | `None` | No | | Log directory path | Bootstrap | |
| `LDR_BOOTSTRAP_SECRET_KEY` | Secret | `None` | No | | Application secret key for session encryption | Bootstrap | |
| `LDR_DB_CONFIG_CACHE_SIZE_MB` | Integer | `64` | No | 1..10000 | SQLite cache size in MB | Db Config | LDR_DB_CACHE_SIZE_MB |
| `LDR_DB_CONFIG_CIPHER_MEMORY_SECURITY` | Enum | `'OFF'` | No | OFF, ON | SQLCipher memory security (ON=clear memory after use + mlock, OFF=faster). ON requires IPC_LOCK in Docker. | Db Config | |
| `LDR_DB_CONFIG_HMAC_ALGORITHM` | Enum | `'HMAC_SHA512'` | No | HMAC_SHA1, HMAC_SHA256, HMAC_SHA512 | HMAC algorithm for database integrity | Db Config | LDR_DB_HMAC_ALGORITHM |
| `LDR_DB_CONFIG_JOURNAL_MODE` | Enum | `'WAL'` | No | DELETE, MEMORY, OFF, PERSIST, TRUNCATE, WAL | SQLite journal mode | Db Config | LDR_DB_JOURNAL_MODE |
| `LDR_DB_CONFIG_KDF_ALGORITHM` | Enum | `'PBKDF2_HMAC_SHA512'` | No | PBKDF2_HMAC_SHA1, PBKDF2_HMAC_SHA256, PBKDF2_HMAC_SHA512 | Key derivation function algorithm | Db Config | LDR_DB_KDF_ALGORITHM |
| `LDR_DB_CONFIG_KDF_ITERATIONS` | Integer | `256000` | No | 1000..1000000 | Number of KDF iterations for key derivation | Db Config | LDR_DB_KDF_ITERATIONS |
| `LDR_DB_CONFIG_PAGE_SIZE` | Integer | `16384` | No | 512..65536 | SQLite page size (must be power of 2) | Db Config | LDR_DB_PAGE_SIZE |
| `LDR_DB_CONFIG_SYNCHRONOUS` | Enum | `'NORMAL'` | No | EXTRA, FULL, NORMAL, OFF | SQLite synchronous mode | Db Config | LDR_DB_SYNCHRONOUS |
| `LDR_NEWS_SCHEDULER_ENABLED` | Boolean | `True` | No | | Enable or disable the news subscription scheduler | News Scheduler | |
| `LDR_SECURITY_CORS_ALLOWED_ORIGINS` | String | `None` | No | | Allowed CORS origins for API routes (comma-separated). Use '*' for all origins, empty for same-origin only. Example: 'https://example.com,https://app.example.com' | Security | |
| `LDR_SECURITY_SSRF_DISABLE_VALIDATION` | Boolean | `False` | No | | Disable SSRF validation (test/dev only - NEVER in production) | Security | |
| `LDR_SECURITY_WEBSOCKET_ALLOWED_ORIGINS` | String | `None` | No | | Allowed origins for WebSocket/Socket.IO connections (comma-separated). Use '*' for all origins (default), empty for same-origin only. Example: 'https://example.com,https://app.example.com' | Security | |
| `LDR_TESTING_TEST_MODE` | Boolean | `False` | No | | Enable test mode (adds delays for testing concurrency) | Testing | |
| `LDR_TESTING_USE_FALLBACK_LLM` | Boolean | `False` | No | | Use mock LLM for testing (skips API calls and DB operations) | Testing | |
## Settings List
| Key | Environment Variable | Default Value | Description | Type |
|-----|----------------------|---------------|-------------|------|
| `app.allow_registrations` | `LDR_APP_ALLOW_REGISTRATIONS` | `true` | Allow new user registrations through the web interface. This setting cannot be changed from the UI for security reasons. To disable registration on public-facing deployments, set the environment variable LDR_APP_ALLOW_REGISTRATIONS=false. Enabled by default so initial account creation works out of the box. | APP |
| `app.debug` | `LDR_APP_DEBUG` | `false` | Produces more informative errors when something goes wrong, but may expose sensitive data. | APP |
| `app.enable_file_logging` | `LDR_APP_ENABLE_FILE_LOGGING` | `false` | Enable logging to files (WARNING: Log files are unencrypted and may contain sensitive data) | APP |
| `app.enable_notifications` | `LDR_APP_ENABLE_NOTIFICATIONS` | `true` | Enable browser notifications for research events | APP |
| `app.enable_web` | `LDR_APP_ENABLE_WEB` | `true` | Enable the web server | APP |
| `app.external_url` | `LDR_APP_EXTERNAL_URL` | `` | Public URL where this application is accessible (e.g., http://localhost:5000, https://myapp.example.com). Used for generating links in notifications. Leave empty to auto-detect from request context. | APP |
| `app.host` | `LDR_APP_HOST` | `0.0.0.0` | Host address to bind the web server | APP |
| `app.lock_settings` | `LDR_APP_LOCK_SETTINGS` | `false` | If true, disables editing for all settings | APP |
| `app.max_concurrent_researches` | `LDR_MAX_CONCURRENT` | `3` | Maximum number of concurrent research processes allowed per user | APP |
| `app.max_user_query_length` | `LDR_APP_MAX_USER_QUERY_LENGTH` | `300` | Maximum character length for user queries before disabling direct search. Longer queries will use LLM-generated questions only. | SEARCH |
| `app.port` | `LDR_APP_PORT` | `5000` | Port for the web server | APP |
| `app.queue_mode` | `LDR_QUEUE_MODE` | `direct` | Queue processing mode: 'direct' for immediate execution, 'queue' for background processing | APP |
| `app.theme` | `LDR_APP_THEME` | `dark` | User interface color theme. Choose from 20 themes including popular developer themes. | APP |
| `app.timezone` | `TZ` | `UTC` | Timezone for date calculations (e.g., news subscription YYYY-MM-DD placeholder). Use 'UTC' for server time or specify your local timezone. | APP |
| `app.warnings.dismiss_high_context` | `LDR_APP_WARNINGS_DISMISS_HIGH_CONTEXT` | `false` | Dismiss warnings about high context window sizes that may cause memory issues | APP |
| `app.warnings.dismiss_low_context_focused` | `LDR_APP_WARNINGS_DISMISS_LOW_CONTEXT_FOCUSED` | `false` | Dismiss warnings about using focused iteration with low context window sizes | APP |
| `app.warnings.dismiss_model_mismatch` | `LDR_APP_WARNINGS_DISMISS_MODEL_MISMATCH` | `false` | Dismiss warnings about context size vs model size mismatches | APP |
| `app.warnings.dismiss_searxng_recommendation` | `LDR_APP_WARNINGS_DISMISS_SEARXNG_RECOMMENDATION` | `false` | Dismiss recommendations about using more questions instead of iterations with SearXNG | APP |
| `app.web_interface` | `LDR_APP_WEB_INTERFACE` | `true` | Enable the web interface | APP |
| `benchmark.evaluation.endpoint_url` | `LDR_BENCHMARK_EVALUATION_ENDPOINT_URL` | `https://openrouter.ai/api/v1` | Endpoint URL for evaluation model (when using OpenAI-compatible APIs) | SEARCH |
| `benchmark.evaluation.model` | `LDR_BENCHMARK_EVALUATION_MODEL` | `anthropic/claude-3.7-sonnet` | Model for evaluating benchmark results | SEARCH |
| `benchmark.evaluation.provider` | `LDR_BENCHMARK_EVALUATION_PROVIDER` | `openai_endpoint` | Provider for benchmark evaluation model | SEARCH |
| `benchmark.evaluation.temperature` | `LDR_BENCHMARK_EVALUATION_TEMPERATURE` | `0` | Temperature for evaluation (0 recommended for consistency) | SEARCH |
| `document_scheduler.download_pdfs` | `LDR_DOCUMENT_SCHEDULER_DOWNLOAD_PDFS` | `false` | Automatically download PDF files during scheduled document processing. Files will be stored on disk (requires storage space). WARNING: Downloaded files are stored unencrypted on disk. | APP |
| `document_scheduler.enabled` | `LDR_DOCUMENT_SCHEDULER_ENABLED` | `true` | Enable automatic document processing from research history | APP |
| `document_scheduler.extract_text` | `LDR_DOCUMENT_SCHEDULER_EXTRACT_TEXT` | `true` | Extract text content to database (minimal space usage) | APP |
| `document_scheduler.generate_rag` | `LDR_DOCUMENT_SCHEDULER_GENERATE_RAG` | `false` | Generate RAG embeddings for semantic search | APP |
| `document_scheduler.interval_seconds` | `LDR_DOCUMENT_SCHEDULER_INTERVAL_SECONDS` | `1800` | How often to process new research (seconds) | APP |
| `embeddings.ollama.url` | `LDR_EMBEDDINGS_OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama endpoint for embedding models. This setting allows you to use a different Ollama server for embeddings than for LLM operations. If not set, the system will fall back to the LLM Ollama URL. | APP |
| `focused_iteration.adaptive_questions` | `LDR_FOCUSED_ITERATION_ADAPTIVE_QUESTIONS` | `0` | Enables intelligent adaptation of search queries based on previous results. 1=ON: If 3+ recent searches return 0 results, the LLM is warned that queries are 'too narrow'. 0=OFF: Stable main-branch behavior. | SEARCH |
| `focused_iteration.knowledge_summary_limit` | `LDR_FOCUSED_ITERATION_KNOWLEDGE_SUMMARY_LIMIT` | `10` | Controls how many previous search results the LLM sees when generating follow-up questions. Lower values (e.g., 10) reduce context size and focus the LLM on top results. Set to 0 for unlimited (LLM sees all results). NOTE: This only affects question generation - the final answer synthesis always sees ALL results. | SEARCH |
| `focused_iteration.previous_searches_limit` | `LDR_FOCUSED_ITERATION_PREVIOUS_SEARCHES_LIMIT` | `10` | Maximum number of previous searches to show the LLM when generating follow-up questions. This helps the LLM avoid duplicate searches. Set to 0 for unlimited (show all previous searches). Higher values provide more context but use more tokens. | SEARCH |
| `focused_iteration.prompt_knowledge_truncate` | `LDR_FOCUSED_ITERATION_PROMPT_KNOWLEDGE_TRUNCATE` | `1500` | Maximum characters of knowledge context to include in the LLM prompt when generating follow-up questions. This truncates the 'Current Knowledge Summary' section in the prompt. Set to 0 for unlimited. Lower values reduce token usage but may miss relevant context. | SEARCH |
| `focused_iteration.question_generator` | `LDR_FOCUSED_ITERATION_QUESTION_GENERATOR` | `browsecomp` | Algorithm for generating follow-up search queries. 'browsecomp': Extracts entities (names, dates, locations) from the query and systematically combines them - more structured and predictable. 'flexible': Gives the LLM more freedom to explore different search strategies - less rigid but may be less consistent. | SEARCH |
| `focused_iteration.snippet_truncate` | `LDR_FOCUSED_ITERATION_SNIPPET_TRUNCATE` | `200` | Maximum characters per snippet when showing search results to LLM for question generation. Shorter snippets (e.g., 200) reduce token usage and keep context focused. Set to 0 for full snippets. NOTE: This only affects question generation - the final answer synthesis sees full snippets. | SEARCH |
| `general.enable_fact_checking` | `LDR_GENERAL_ENABLE_FACT_CHECKING` | `false` | Enable LLM-based fact verification for research findings. Adds additional validation pass to check claims against sources. | APP |
| `general.knowledge_accumulation` | `LDR_GENERAL_KNOWLEDGE_ACCUMULATION` | `ITERATION` | How to accumulate knowledge across iterations: 'ITERATION' (carry forward per iteration), 'SECTION' (per section), 'FULL' (all content), 'NONE' (fresh each time). | APP |
| `general.knowledge_accumulation_context_limit` | `LDR_GENERAL_KNOWLEDGE_ACCUMULATION_CONTEXT_LIMIT` | `2000000` | Maximum token count for accumulated knowledge context. Higher values preserve more history but increase LLM costs/latency. | APP |
| `general.output_dir` | `LDR_GENERAL_OUTPUT_DIR` | `research_outputs` | Directory where research report files are saved. Relative paths are relative to the application root. | APP |
| `general.output_instructions` | `LDR_GENERAL_OUTPUT_INSTRUCTIONS` | `` | Customize how research outputs are formatted: language, tone, style, formatting preferences, audience level, etc. Examples: 'Respond in Spanish with formal tone' \| 'Use simple language for beginners' \| 'Be concise with bullet points'. Leave empty for default English output. | APP |
| `llm.anthropic.api_key` | `LDR_LLM_ANTHROPIC_API_KEY` | `` | Your Anthropic API key for Claude models. Get one at console.anthropic.com. Required when using the Anthropic provider. | SEARCH |
| `llm.context_window_size` | `LDR_LLM_CONTEXT_WINDOW_SIZE` | `128000` | Maximum context window size in tokens for cloud LLMs. Only used when unrestricted context is disabled. | LLM |
| `llm.context_window_unrestricted` | `LDR_LLM_CONTEXT_WINDOW_UNRESTRICTED` | `true` | Let cloud providers automatically handle context sizing (recommended). Uncheck to set a specific limit. | LLM |
| `llm.google.api_key` | `LDR_LLM_GOOGLE_API_KEY` | `null` | API key to use for the Google Gemini provider. | SEARCH |
| `llm.ionos.api_key` | `LDR_LLM_IONOS_API_KEY` | `null` | API key to use for the IONOS AI Model Hub provider. | SEARCH |
| `llm.llamacpp_f16_kv` | `LDR_LLM_LLAMACPP_F16_KV` | `true` | Use 16-bit floating point for LlamaCpp's key-value cache. Reduces memory usage compared to 32-bit; disable only if you need higher precision and have enough VRAM. | LLM |
| `llm.llamacpp_model_path` | `LDR_LLM_LLAMACPP_MODEL_PATH` | `` | Path to GGUF model file relative to models directory (~/.local/share/llm_models/). For llama.cpp server connections, use 'openai_endpoint' provider instead with the server's /v1 URL. | LLM |
| `llm.llamacpp_n_batch` | `LDR_LLM_LLAMACPP_N_BATCH` | `512` | Batch size for LlamaCpp token processing. Higher values improve inference speed but increase memory usage. Reduce if experiencing out-of-memory errors. | LLM |
| `llm.llamacpp_n_gpu_layers` | `LDR_LLM_LLAMACPP_N_GPU_LAYERS` | `1` | Number of model layers to offload to GPU for acceleration in LlamaCpp. 0 = CPU only, higher values = more GPU usage and faster inference. Set based on your available VRAM. | LLM |
| `llm.lmstudio.url` | `LDR_LLM_LMSTUDIO_URL` | `http://localhost:1234/v1` | HTTP endpoint URL where LM Studio is running locally. Include the full path (e.g., http://localhost:1234/v1 for OpenAI-compatible API format). | LLM |
| `llm.local_context_window_size` | `LDR_LLM_LOCAL_CONTEXT_WINDOW_SIZE` | `4096` | Context window size in tokens for local LLMs (Ollama, LlamaCpp). Smaller values prevent memory issues. | LLM |
| `llm.max_tokens` | `LDR_LLM_MAX_TOKENS` | `30000` | Maximum tokens in model responses. Automatically capped at 80% of the context window size to leave room for prompt tokens. | LLM |
| `llm.model` | `LDR_LLM_MODEL` | `gemma3:12b` | The language model to use. Available models depend on the selected provider (e.g., 'gpt-4o' for OpenAI, 'gemma:7b' for Ollama). You can type any model name supported by your provider. | LLM |
| `llm.ollama.enable_thinking` | `LDR_LLM_OLLAMA_ENABLE_THINKING` | `true` | Enable thinking/reasoning mode for models like deepseek-r1 and qwen2.5. When enabled (recommended), the model performs internal reasoning for more accurate answers, but the reasoning content is automatically separated and excluded from the final response. When disabled, the model gives faster but potentially less accurate direct answers without reasoning. | LLM |
| `llm.ollama.url` | `LDR_LLM_OLLAMA_URL` | `http://localhost:11434` | HTTP endpoint URL where Ollama is running. Used to connect to local or remote Ollama instances for model inference. | LLM |
| `llm.openai.api_key` | `LDR_LLM_OPENAI_API_KEY` | `` | Your OpenAI API key for GPT models. Get one at platform.openai.com. Required when using the OpenAI provider. | SEARCH |
| `llm.openai_endpoint.api_key` | `LDR_LLM_OPENAI_ENDPOINT_API_KEY` | `` | API key for your custom OpenAI-compatible endpoint. For cloud services (OpenRouter, Groq), use your actual API key. For local servers (llama.cpp, vLLM), use any non-empty value like 'not-needed'. | SEARCH |
| `llm.openai_endpoint.url` | `LDR_LLM_OPENAI_ENDPOINT_URL` | `https://openrouter.ai/api/v1` | URL of a custom OpenAI-compatible API endpoint. Use for llama.cpp server, vLLM, Ollama, OpenRouter, Groq, Together, or any service that implements the OpenAI API format. Example for local llama.cpp: http://localhost:8000/v1 | LLM |
| `llm.openrouter.api_key` | `LDR_LLM_OPENROUTER_API_KEY` | `null` | API key to use for the OpenRouter provider. | SEARCH |
| `llm.provider` | `LDR_LLM_PROVIDER` | `OLLAMA` | The LLM service provider. Choose local providers (Ollama, LM Studio, LlamaCpp, vLLM) for free on-device inference, or cloud APIs (OpenAI, Anthropic, Google, OpenRouter) for hosted models. | LLM |
| `llm.supports_max_tokens` | `LDR_LLM_SUPPORTS_MAX_TOKENS` | `true` | Feature flag for max_tokens parameter. Disable if your LLM API returns errors when max_tokens is set. When disabled, the parameter is not passed to the API. | LLM |
| `llm.temperature` | `LDR_LLM_TEMPERATURE` | `0.7` | Controls randomness in model outputs. Lower values (0.0-0.3) produce deterministic, factual responses; higher values (0.7-1.0) produce more creative and varied outputs. | LLM |
| `llm.xai.api_key` | `LDR_LLM_XAI_API_KEY` | `null` | API key to use for the xAI Grok provider. | LLM |
| `news.display.default_headline_max_length` | `LDR_NEWS_DISPLAY_DEFAULT_HEADLINE_MAX_LENGTH` | `100` | Default maximum headline length | APP |
| `news.display.max_query_length` | `LDR_NEWS_DISPLAY_MAX_QUERY_LENGTH` | `50` | Maximum query display length in UI | APP |
| `news.feed.default_limit` | `LDR_NEWS_FEED_DEFAULT_LIMIT` | `20` | Default number of news items to show in feed | APP |
| `news.preferences.max_stored` | `LDR_NEWS_PREFERENCES_MAX_STORED` | `100` | Maximum number of stored preferences per user | APP |
| `news.progress.complete` | `LDR_NEWS_PROGRESS_COMPLETE` | `100` | Progress percentage when complete | APP |
| `news.progress.generating_searches` | `LDR_NEWS_PROGRESS_GENERATING_SEARCHES` | `50` | Progress percentage when generating searches | APP |
| `news.refresh.default_hours` | `LDR_NEWS_REFRESH_DEFAULT_HOURS` | `4` | Default refresh interval in hours | APP |
| `news.scheduler.activity_check_interval` | `LDR_NEWS_SCHEDULER_ACTIVITY_CHECK_INTERVAL` | `5` | How often to check for due subscriptions (minutes) | APP |
| `news.scheduler.batch_size` | `LDR_NEWS_SCHEDULER_BATCH_SIZE` | `5` | Number of subscriptions to process in each batch | APP |
| `news.scheduler.cleanup_interval_hours` | `LDR_NEWS_SCHEDULER_CLEANUP_INTERVAL_HOURS` | `1` | How often to run cleanup job for inactive users | APP |
| `news.scheduler.enabled` | `LDR_NEWS_SCHEDULER_ENABLED` | `true` | Enable automatic news subscription updates for active users | APP |
| `news.scheduler.max_concurrent_jobs` | `LDR_NEWS_SCHEDULER_MAX_CONCURRENT_JOBS` | `10` | Maximum concurrent subscription checks across all users | APP |
| `news.scheduler.max_jitter_seconds` | `LDR_NEWS_SCHEDULER_MAX_JITTER_SECONDS` | `300` | Maximum random delay added to subscription checks to spread load | APP |
| `news.scheduler.retention_hours` | `LDR_NEWS_SCHEDULER_RETENTION_HOURS` | `48` | Hours to keep user credentials in memory after last activity | APP |
| `news.storage.default_limit` | `LDR_NEWS_STORAGE_DEFAULT_LIMIT` | `100` | Default limit for storing news items | APP |
| `news.subscription.default_type` | `LDR_NEWS_SUBSCRIPTION_DEFAULT_TYPE` | `search` | Default subscription type | APP |
| `news.subscription.refresh_minutes` | `LDR_NEWS_SUBSCRIPTION_REFRESH_MINUTES` | `360` | Default subscription refresh interval in minutes | APP |
| `news.trending.lookback_hours` | `LDR_NEWS_TRENDING_LOOKBACK_HOURS` | `24` | Hours to look back for trending analysis | APP |
| `notifications.enabled` | `LDR_NOTIFICATIONS_ENABLED` | `false` | Enable external notifications (email, messaging, etc.) | APP |
| `notifications.on_api_quota_warning` | `LDR_NOTIFICATIONS_ON_API_QUOTA_WARNING` | `false` | Send notification when API quota/rate limits are exceeded | APP |
| `notifications.on_auth_issue` | `LDR_NOTIFICATIONS_ON_AUTH_ISSUE` | `false` | Send notification when authentication fails for API services | APP |
| `notifications.on_research_completed` | `LDR_NOTIFICATIONS_ON_RESEARCH_COMPLETED` | `true` | Send notification when research completes | APP |
| `notifications.on_research_failed` | `LDR_NOTIFICATIONS_ON_RESEARCH_FAILED` | `true` | Send notification when research fails | APP |
| `notifications.on_research_queued` | `LDR_NOTIFICATIONS_ON_RESEARCH_QUEUED` | `false` | Send notification when research is added to queue | APP |
| `notifications.on_subscription_error` | `LDR_NOTIFICATIONS_ON_SUBSCRIPTION_ERROR` | `false` | Send notification when subscription encounters an error | APP |
| `notifications.on_subscription_update` | `LDR_NOTIFICATIONS_ON_SUBSCRIPTION_UPDATE` | `true` | Send notification when subscriptions have new content | APP |
| `notifications.rate_limit_per_day` | `LDR_NOTIFICATIONS_RATE_LIMIT_PER_DAY` | `50` | Maximum number of notifications per day (prevents spam) | APP |
| `notifications.rate_limit_per_hour` | `LDR_NOTIFICATIONS_RATE_LIMIT_PER_HOUR` | `10` | Maximum number of notifications per hour (prevents spam) | APP |
| `notifications.service_url` | `LDR_NOTIFICATIONS_SERVICE_URL` | `` | Notification service URL(s). Supports multiple comma-separated URLs to send notifications to multiple destinations (e.g., discord://YOUR_WEBHOOK_ID/YOUR_TOKEN, mailto://YOUR_USER:YOUR_PASS@smtp.example.com?to=recipient@example.com). See <a href="https://github.com/LearningCircuit/local-deep-research/blob/main/docs/NOTIFICATIONS.md" target="_blank">documentation</a> for supported services. | APP |
| `rag.indexing_batch_size` | `LDR_RAG_INDEXING_BATCH_SIZE` | `15` | Number of documents to process in parallel during RAG indexing. Higher values are faster but use more memory. | APP |
| `rate_limiting.decay_per_day` | `LDR_RATE_LIMITING_DECAY_PER_DAY` | `0.95` | Confidence decay factor per day for old rate limit estimates (0.5-0.99, lower = faster decay) | APP |
| `rate_limiting.enabled` | `LDR_RATE_LIMITING_ENABLED` | `true` | Enable adaptive rate limiting system that learns optimal wait times for each search engine | APP |
| `rate_limiting.exploration_rate` | `LDR_RATE_LIMITING_EXPLORATION_RATE` | `0.1` | Percentage of attempts that will explore faster rates to discover improvements (0.0-1.0) | APP |
| `rate_limiting.learning_rate` | `LDR_RATE_LIMITING_LEARNING_RATE` | `0.45` | How quickly to adapt to new rate limit information (higher = faster adaptation) | APP |
| `rate_limiting.llm_enabled` | `LDR_RATE_LIMITING_LLM_ENABLED` | `false` | Enable adaptive rate limiting for LLM API calls (e.g., for free tier APIs with request limits) | APP |
| `rate_limiting.memory_window` | `LDR_RATE_LIMITING_MEMORY_WINDOW` | `100` | Number of recent attempts to keep in memory for learning | APP |
| `rate_limiting.profile` | `LDR_RATE_LIMITING_PROFILE` | `balanced` | Rate limiting aggressiveness profile | APP |
| `report.citation_format` | `LDR_REPORT_CITATION_FORMAT` | `number_hyperlinks` | Citation format style for reports | REPORT |
| `report.detailed_citations` | `LDR_REPORT_DETAILED_CITATIONS` | `true` | Include detailed citations in reports | REPORT |
| `report.enable_fact_checking` | `LDR_REPORT_ENABLE_FACT_CHECKING` | `true` | Enable fact checking for report contents | REPORT |
| `report.enable_file_backup` | `LDR_REPORT_ENABLE_FILE_BACKUP` | `false` | Also save reports to file system for external access (reports are always stored in database) | REPORT |
| `report.export_formats` | `LDR_REPORT_EXPORT_FORMATS` | ``["markdown", "latex", "quarto", "ris"]`` | Available export formats for reports | REPORT |
| `report.max_context_chars` | `LDR_REPORT_MAX_CONTEXT_CHARS` | `4000` | Maximum characters for context when generating sections (lower values are safer for smaller local models) | REPORT |
| `report.max_context_sections` | `LDR_REPORT_MAX_CONTEXT_SECTIONS` | `3` | Number of previous sections to include as context when generating new sections (helps reduce repetition in detailed reports) | REPORT |
| `report.searches_per_section` | `LDR_REPORT_SEARCHES_PER_SECTION` | `2` | Number of searches to run per report section | REPORT |
| `research_library.auto_index_enabled` | `LDR_RESEARCH_LIBRARY_AUTO_INDEX_ENABLED` | `false` | Automatically index documents for RAG search when they are added to collections (via upload or download). Indexing happens in the background without blocking uploads. | APP |
| `research_library.confirm_deletions` | `LDR_RESEARCH_LIBRARY_CONFIRM_DELETIONS` | `true` | Show confirmation dialogs before delete operations. Recommended to keep enabled to prevent accidental data loss. | APP |
| `research_library.enable_pdf_storage` | `RESEARCH_LIBRARY_ENABLE_PDF_STORAGE` | `false` | DEPRECATED: Use pdf_storage_mode instead. Kept for backward compatibility. | APP |
| `research_library.enable_txt_storage` | `LDR_RESEARCH_LIBRARY_ENABLE_TXT_STORAGE` | `false` | ⚠️ PRIVACY WARNING: Text files will be stored unencrypted on disk for RAG/ChromaDB integration. This exposes document contents to anyone with filesystem access. | APP |
| `research_library.max_pdf_size_mb` | `LDR_RESEARCH_LIBRARY_MAX_PDF_SIZE_MB` | `100` | Maximum allowed PDF file size in megabytes. PDFs larger than this will not be stored. | APP |
| `research_library.pdf_storage_mode` | `RESEARCH_LIBRARY_PDF_STORAGE_MODE` | `database` | Choose how PDFs are stored. Each user has their own encrypted database. Database mode stores PDFs encrypted within it (secure, portable, isolated per-user). Filesystem mode stores unencrypted files on disk (faster for large files). None stores only extracted text (smallest footprint). | APP |
| `research_library.shared_library` | `RESEARCH_LIBRARY_SHARED_LIBRARY` | `false` | When enabled, all users share the same library directory. Documents from all users will be visible to each other. | APP |
| `research_library.storage_path` | `RESEARCH_LIBRARY_STORAGE_PATH` | `~/Documents/LocalDeepResearch/Library` | Base path for storing research library documents. Each user will have their own subdirectory unless shared library is enabled. | APP |
| `research_library.upload_pdf_storage` | `RESEARCH_LIBRARY_UPLOAD_PDF_STORAGE` | `none` | Choose how user-uploaded PDFs are stored. Database mode stores PDFs encrypted in your personal database (increases database size but allows viewing/downloading later). Text Only extracts and stores only the text content (smallest footprint). Note: Filesystem storage is not available for uploads for security reasons. | APP |
| `search.cross_engine_max_results` | `LDR_SEARCH_CROSS_ENGINE_MAX_RESULTS` | `100` | Maximum number of search results to keep after cross-engine filtering. When results from multiple search engines are combined, this limits how many total results are displayed. Higher values show more comprehensive results. | SEARCH |
| `search.engine.DEFAULT_SEARCH_ENGINE` | `LDR_SEARCH_ENGINE_DEFAULT_SEARCH_ENGINE` | `wikipedia` | Fallback search engine used when the configured engine is unavailable or has errors. | SEARCH |
| `search.engine.auto.class_name` | `LDR_SEARCH_ENGINE_AUTO_CLASS_NAME` | `MetaSearchEngine` | Internal: Python class implementing the automatic search engine selector. | SEARCH |
| `search.engine.auto.default_params.max_engines_to_try` | `LDR_SEARCH_ENGINE_AUTO_DEFAULT_PARAMS_MAX_ENGINES_TO_TRY` | `3` | Maximum number of search engines to try before returning results. Auto mode selects the best engine(s) based on your query type. | SEARCH |
| `search.engine.auto.default_params.use_api_key_services` | `LDR_SEARCH_ENGINE_AUTO_DEFAULT_PARAMS_USE_API_KEY_SERVICES` | `true` | Whether to include paid API services (Brave, SerpAPI, etc.) in auto-selection. Disable to only use free search engines. | SEARCH |
| `search.engine.auto.description` | `LDR_SEARCH_ENGINE_AUTO_DESCRIPTION` | `Attempt to choose the best combination of search engines automatically.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.auto.display_name` | `LDR_SEARCH_ENGINE_AUTO_DISPLAY_NAME` | `Auto` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.auto.module_path` | `LDR_SEARCH_ENGINE_AUTO_MODULE_PATH` | `.engines.meta_search_engine` | Internal: Python module path for the auto search engine selector. | SEARCH |
| `search.engine.auto.reliability` | `LDR_SEARCH_ENGINE_AUTO_RELIABILITY` | `0.85` | Reliability rating for auto mode. Varies based on which underlying engine is selected. | SEARCH |
| `search.engine.auto.requires_api_key` | `LDR_SEARCH_ENGINE_AUTO_REQUIRES_API_KEY` | `false` | Whether auto mode requires API keys. False because it can fall back to free engines like Wikipedia. | SEARCH |
| `search.engine.auto.requires_llm` | `LDR_SEARCH_ENGINE_AUTO_REQUIRES_LLM` | `true` | Whether auto mode uses LLM to select the best engine. When disabled, falls back to rule-based selection. | SEARCH |
| `search.engine.auto.strengths` | `LDR_SEARCH_ENGINE_AUTO_STRENGTHS` | ``["intelligent engine selection", "adaptable to query type", "fallback capabilities"]`` | Key advantages of auto mode: Intelligently selects the best engine for each query type, tries multiple sources if needed. | SEARCH |
| `search.engine.auto.weaknesses` | `LDR_SEARCH_ENGINE_AUTO_WEAKNESSES` | ``["slightly slower due to LLM analysis"]`` | Limitations of auto mode: May use more API calls trying multiple engines, selection heuristics may not always be optimal. | SEARCH |
| `search.engine.local.local_all.class_name` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_CLASS_NAME` | `LocalAllSearchEngine` | Internal: Python class implementing local document search. Do not modify. | SEARCH |
| `search.engine.local.local_all.description` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_DESCRIPTION` | `Search only local documents using RAG.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.local.local_all.display_name` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_DISPLAY_NAME` | `Local Documents` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.local.local_all.module_path` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_MODULE_PATH` | `.engines.search_engine_local_all` | Internal: Python module path for local search implementation. Do not modify. | SEARCH |
| `search.engine.local.local_all.reliability` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_RELIABILITY` | `0.85` | Reliability score (0-1) for local search. Quality depends on your document collection and indexing. | SEARCH |
| `search.engine.local.local_all.requires_api_key` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_REQUIRES_API_KEY` | `false` | Local document search does not require any external API keys. | SEARCH |
| `search.engine.local.local_all.requires_llm` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_REQUIRES_LLM` | `true` | Indicates this engine uses the LLM to rerank and filter results for relevance. | SEARCH |
| `search.engine.local.local_all.strengths` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_STRENGTHS` | ``["searches all local collections", "personal documents", "offline access"]`` | Advantages: Searches all local document collections at once, works offline, uses your private documents. | SEARCH |
| `search.engine.local.local_all.use_in_auto_search` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_USE_IN_AUTO_SEARCH` | `true` | Include local documents in auto search mode | SEARCH |
| `search.engine.local.local_all.weaknesses` | `LDR_SEARCH_ENGINE_LOCAL_LOCAL_ALL_WEAKNESSES` | ``["may return too many results", "requires indexing"]`` | Limitations: May return too many results from mixed collections, requires documents to be indexed first. | SEARCH |
| `search.engine.local.personal_notes.cache_dir` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_CACHE_DIR` | `null` | Directory for storing indexed embeddings and cache files. Uses system cache location by default. | SEARCH |
| `search.engine.local.personal_notes.chunk_overlap` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_CHUNK_OVERLAP` | `100` | Number of characters to overlap between chunks for context continuity during RAG indexing. | SEARCH |
| `search.engine.local.personal_notes.chunk_size` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_CHUNK_SIZE` | `500` | Maximum characters per chunk when splitting documents for RAG indexing. Smaller = more precise, larger = more context. | SEARCH |
| `search.engine.local.personal_notes.description` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_DESCRIPTION` | `Personal notes and documents` | Human-readable description of this document collection shown in the UI. | SEARCH |
| `search.engine.local.personal_notes.embedding_device` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_EMBEDDING_DEVICE` | `cpu` | Device for computing embeddings. 'cpu' works everywhere; 'cuda' is faster but requires NVIDIA GPU. | SEARCH |
| `search.engine.local.personal_notes.embedding_model` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_EMBEDDING_MODEL` | `all-MiniLM-L6-v2` | Model for generating text embeddings. Default 'all-MiniLM-L6-v2' is fast and works well; larger models may improve accuracy. | SEARCH |
| `search.engine.local.personal_notes.embedding_model_type` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_EMBEDDING_MODEL_TYPE` | `sentence_transformers` | Model provider to use for generating document embeddings. | SEARCH |
| `search.engine.local.personal_notes.enabled` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_ENABLED` | `true` | Enable this document collection for searching. Disable if you don't want to index these documents. | SEARCH |
| `search.engine.local.personal_notes.max_filtered_results` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_MAX_FILTERED_RESULTS` | `10` | Maximum results to return after LLM relevance filtering. These are the final results used in research. | SEARCH |
| `search.engine.local.personal_notes.max_results` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_MAX_RESULTS` | `30` | Maximum results from initial vector similarity search, before LLM filters for relevance. | SEARCH |
| `search.engine.local.personal_notes.name` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_NAME` | `Personal Notes` | Internal identifier for this collection. Used in logs and configuration. | SEARCH |
| `search.engine.local.personal_notes.paths` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_PATHS` | ``["/local_collections/personal_notes"]`` | File paths to include in this collection. Supports directories (recursively indexed) and individual files. | SEARCH |
| `search.engine.local.personal_notes.reliability` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_RELIABILITY` | `0.75` | Reliability score (0-1). Personal notes are rated lower (0.75) as they may contain informal or subjective content. | SEARCH |
| `search.engine.local.personal_notes.strengths` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_STRENGTHS` | ``["personal knowledge", "notes", "private documents"]`` | Advantages: Access to your personal knowledge, notes, and private documents not available elsewhere. | SEARCH |
| `search.engine.local.personal_notes.use_in_auto_search` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_USE_IN_AUTO_SEARCH` | `false` | Include personal notes in auto search mode | SEARCH |
| `search.engine.local.personal_notes.weaknesses` | `LDR_SEARCH_ENGINE_LOCAL_PERSONAL_NOTES_WEAKNESSES` | ``["subjective content", "informal information"]`` | Limitations: Content may be subjective, informal, or incomplete compared to published sources. | SEARCH |
| `search.engine.local.project_docs.cache_dir` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_CACHE_DIR` | `null` | Directory for storing indexed embeddings and cache files. Uses system cache location by default. | SEARCH |
| `search.engine.local.project_docs.chunk_overlap` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_CHUNK_OVERLAP` | `200` | Number of characters to overlap between chunks for context continuity during RAG indexing. | SEARCH |
| `search.engine.local.project_docs.chunk_size` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_CHUNK_SIZE` | `1000` | Maximum characters per chunk when splitting documents for RAG indexing. Larger default (1000) suits technical documentation. | SEARCH |
| `search.engine.local.project_docs.description` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_DESCRIPTION` | `Project documentation and specifications` | Human-readable description of this document collection shown in the UI. | SEARCH |
| `search.engine.local.project_docs.embedding_device` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_EMBEDDING_DEVICE` | `cpu` | Device for computing embeddings. 'cpu' works everywhere; 'cuda' is faster but requires NVIDIA GPU. | SEARCH |
| `search.engine.local.project_docs.embedding_model` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_EMBEDDING_MODEL` | `all-MiniLM-L6-v2` | Model for generating text embeddings. Default 'all-MiniLM-L6-v2' is fast and works well; larger models may improve accuracy. | SEARCH |
| `search.engine.local.project_docs.embedding_model_type` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_EMBEDDING_MODEL_TYPE` | `sentence_transformers` | Model provider to use for generating document embeddings. | SEARCH |
| `search.engine.local.project_docs.enabled` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_ENABLED` | `true` | Enable this document collection for searching. Disable if you don't want to index these documents. | SEARCH |
| `search.engine.local.project_docs.max_filtered_results` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_MAX_FILTERED_RESULTS` | `5` | Maximum results to return after LLM relevance filtering. These are the final results used in research. | SEARCH |
| `search.engine.local.project_docs.max_results` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_MAX_RESULTS` | `20` | Maximum results from initial vector similarity search, before LLM filters for relevance. | SEARCH |
| `search.engine.local.project_docs.name` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_NAME` | `Project Documents` | Internal identifier for this collection. Used in logs and configuration. | SEARCH |
| `search.engine.local.project_docs.paths` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_PATHS` | ``["/local_collections/project_docs/"]`` | File paths to include in this collection. Supports directories (recursively indexed) and individual files. | SEARCH |
| `search.engine.local.project_docs.reliability` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_RELIABILITY` | `0.9` | Reliability score (0-1). Project docs rated moderately (0.8) as they are semi-formal technical content. | SEARCH |
| `search.engine.local.project_docs.strengths` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_STRENGTHS` | ``["project documentation", "specifications", "internal documents"]`` | Advantages: Access to project-specific technical docs, READMEs, and internal documentation not available online. | SEARCH |
| `search.engine.local.project_docs.use_in_auto_search` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_USE_IN_AUTO_SEARCH` | `false` | Include project documents in auto search mode | SEARCH |
| `search.engine.local.project_docs.weaknesses` | `LDR_SEARCH_ENGINE_LOCAL_PROJECT_DOCS_WEAKNESSES` | ``["no external information", "limited to organizational knowledge"]`` | Limitations: May be outdated if docs not maintained, limited scope to specific projects. | SEARCH |
| `search.engine.local.research_papers.cache_dir` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_CACHE_DIR` | `null` | Directory for storing indexed embeddings and cache files. Uses system cache location by default. | SEARCH |
| `search.engine.local.research_papers.chunk_overlap` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_CHUNK_OVERLAP` | `150` | Number of characters to overlap between chunks for context continuity during RAG indexing. | SEARCH |
| `search.engine.local.research_papers.chunk_size` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_CHUNK_SIZE` | `800` | Maximum characters per chunk when splitting papers for RAG indexing. Default (800) balances context and precision. | SEARCH |
| `search.engine.local.research_papers.description` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_DESCRIPTION` | `Academic research papers and articles` | Human-readable description of this document collection shown in the UI. | SEARCH |
| `search.engine.local.research_papers.embedding_device` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_EMBEDDING_DEVICE` | `cpu` | Device for computing embeddings. 'cpu' works everywhere; 'cuda' is faster but requires NVIDIA GPU. | SEARCH |
| `search.engine.local.research_papers.embedding_model` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_EMBEDDING_MODEL` | `all-MiniLM-L6-v2` | Model for generating text embeddings. Consider 'allenai/specter' for academic papers if available. | SEARCH |
| `search.engine.local.research_papers.embedding_model_type` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_EMBEDDING_MODEL_TYPE` | `sentence_transformers` | Model provider to use for generating document embeddings. | SEARCH |
| `search.engine.local.research_papers.enabled` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_ENABLED` | `true` | Enable this document collection for searching. Disable if you don't have local research papers. | SEARCH |
| `search.engine.local.research_papers.max_filtered_results` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_MAX_FILTERED_RESULTS` | `5` | Maximum results to return after LLM relevance filtering. These are the final results used in research. | SEARCH |
| `search.engine.local.research_papers.max_results` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_MAX_RESULTS` | `20` | Maximum results from initial vector similarity search, before LLM filters for relevance. | SEARCH |
| `search.engine.local.research_papers.name` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_NAME` | `Research Papers` | Internal identifier for this collection. Used in logs and configuration. | SEARCH |
| `search.engine.local.research_papers.paths` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_PATHS` | ``["/local_collections/research_papers/"]`` | File paths containing academic papers. Supports PDFs and text formats; directories are indexed recursively. | SEARCH |
| `search.engine.local.research_papers.reliability` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_RELIABILITY` | `0.85` | Reliability score (0-1). Research papers rated high (0.95) as they are peer-reviewed academic content. | SEARCH |
| `search.engine.local.research_papers.strengths` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_STRENGTHS` | ``["academic research", "scientific papers", "scholarly content"]`` | Advantages: Access to peer-reviewed academic content, scientific papers, and scholarly research in your collection. | SEARCH |
| `search.engine.local.research_papers.use_in_auto_search` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_USE_IN_AUTO_SEARCH` | `false` | Include research papers in auto search mode | SEARCH |
| `search.engine.local.research_papers.weaknesses` | `LDR_SEARCH_ENGINE_LOCAL_RESEARCH_PAPERS_WEAKNESSES` | ``["potentially outdated", "limited to collected papers"]`` | Limitations: Limited to papers in your collection, may be outdated if not regularly updated. | SEARCH |
| `search.engine.web.arxiv.class_name` | `LDR_SEARCH_ENGINE_WEB_ARXIV_CLASS_NAME` | `ArXivSearchEngine` | Internal: Python class implementing the arXiv search engine. | SEARCH |
| `search.engine.web.arxiv.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_ARXIV_DEFAULT_PARAMS_MAX_RESULTS` | `20` | Maximum number of papers to retrieve from arXiv per search query. | SEARCH |
| `search.engine.web.arxiv.default_params.sort_by` | `LDR_SEARCH_ENGINE_WEB_ARXIV_DEFAULT_PARAMS_SORT_BY` | `relevance` | How to sort arXiv results: 'relevance' for best match, 'lastUpdatedDate' for recently updated, 'submittedDate' for newest submissions. | SEARCH |
| `search.engine.web.arxiv.default_params.sort_order` | `LDR_SEARCH_ENGINE_WEB_ARXIV_DEFAULT_PARAMS_SORT_ORDER` | `descending` | Sort order for arXiv results: 'descending' (newest/most relevant first) or 'ascending' (oldest/least relevant first). | SEARCH |
| `search.engine.web.arxiv.description` | `LDR_SEARCH_ENGINE_WEB_ARXIV_DESCRIPTION` | `Search papers uploaded to ArXiv.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.arxiv.display_name` | `LDR_SEARCH_ENGINE_WEB_ARXIV_DISPLAY_NAME` | `ArXiv` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.arxiv.journal_reputation.enabled` | `LDR_SEARCH_ENGINE_WEB_ARXIV_JOURNAL_REPUTATION_ENABLED` | `true` | Enable journal quality filtering for this search engine. | SEARCH |
| `search.engine.web.arxiv.module_path` | `LDR_SEARCH_ENGINE_WEB_ARXIV_MODULE_PATH` | `.engines.search_engine_arxiv` | Internal: Python module path for the arXiv search engine implementation. | SEARCH |
| `search.engine.web.arxiv.reliability` | `LDR_SEARCH_ENGINE_WEB_ARXIV_RELIABILITY` | `0.9` | Reliability rating (0-1) for arXiv as a source. Higher values indicate more trustworthy academic content. | SEARCH |
| `search.engine.web.arxiv.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_ARXIV_REQUIRES_API_KEY` | `false` | Whether arXiv requires an API key. arXiv is free and open access, so this is false. | SEARCH |
| `search.engine.web.arxiv.strengths` | `LDR_SEARCH_ENGINE_WEB_ARXIV_STRENGTHS` | ``["scientific papers", "academic research", "physics", "computer science", "mathematics", "statistics", "machine learning", "preprints"]`` | Key advantages of arXiv: Free access to preprints, covers physics/math/CS/biology, fast publication of cutting-edge research. | SEARCH |
| `search.engine.web.arxiv.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_ARXIV_USE_IN_AUTO_SEARCH` | `true` | Include ArXiv in auto search mode | SEARCH |
| `search.engine.web.arxiv.weaknesses` | `LDR_SEARCH_ENGINE_WEB_ARXIV_WEAKNESSES` | ``["non-academic topics", "consumer products", "news", "general information"]`` | Limitations of arXiv: Preprints aren't peer-reviewed, limited to STEM fields, keyword-based search may miss semantic matches. | SEARCH |
| `search.engine.web.brave.api_key` | `LDR_SEARCH_ENGINE_WEB_BRAVE_API_KEY` | `` | The Brave API key to use. | SEARCH |
| `search.engine.web.brave.class_name` | `LDR_SEARCH_ENGINE_WEB_BRAVE_CLASS_NAME` | `BraveSearchEngine` | Internal: Python class implementing the Brave search engine. | SEARCH |
| `search.engine.web.brave.default_params.region` | `LDR_SEARCH_ENGINE_WEB_BRAVE_DEFAULT_PARAMS_REGION` | `US` | Geographic region code for Brave search results (e.g., 'US', 'UK', 'DE'). | SEARCH |
| `search.engine.web.brave.default_params.safe_search` | `LDR_SEARCH_ENGINE_WEB_BRAVE_DEFAULT_PARAMS_SAFE_SEARCH` | `true` | Enable safe search filtering for Brave to exclude adult content. | SEARCH |
| `search.engine.web.brave.default_params.search_language` | `LDR_SEARCH_ENGINE_WEB_BRAVE_DEFAULT_PARAMS_SEARCH_LANGUAGE` | `English` | Preferred language for Brave search results. | SEARCH |
| `search.engine.web.brave.default_params.time_period` | `LDR_SEARCH_ENGINE_WEB_BRAVE_DEFAULT_PARAMS_TIME_PERIOD` | `y` | Time range filter for Brave results (e.g., 'd' for day, 'w' for week, 'm' for month, 'y' for year). | SEARCH |
| `search.engine.web.brave.description` | `LDR_SEARCH_ENGINE_WEB_BRAVE_DESCRIPTION` | `Search the web using the Brave search engine.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.brave.display_name` | `LDR_SEARCH_ENGINE_WEB_BRAVE_DISPLAY_NAME` | `Brave` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.brave.full_search_class` | `LDR_SEARCH_ENGINE_WEB_BRAVE_FULL_SEARCH_CLASS` | `FullSearchResults` | Internal: Class for fetching full webpage content from Brave results. | SEARCH |
| `search.engine.web.brave.full_search_module` | `LDR_SEARCH_ENGINE_WEB_BRAVE_FULL_SEARCH_MODULE` | `.engines.full_search` | Internal: Module for fetching full webpage content from Brave results. | SEARCH |
| `search.engine.web.brave.module_path` | `LDR_SEARCH_ENGINE_WEB_BRAVE_MODULE_PATH` | `.engines.search_engine_brave` | Internal: Python module path for the Brave search engine implementation. | SEARCH |
| `search.engine.web.brave.reliability` | `LDR_SEARCH_ENGINE_WEB_BRAVE_RELIABILITY` | `0.7` | Reliability rating (0-1) for Brave Search. Good general-purpose search with privacy focus. | SEARCH |
| `search.engine.web.brave.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_BRAVE_REQUIRES_API_KEY` | `true` | Whether Brave Search requires an API key. Yes - get one at brave.com/search/api. | SEARCH |
| `search.engine.web.brave.strengths` | `LDR_SEARCH_ENGINE_WEB_BRAVE_STRENGTHS` | ``["privacy-focused web search", "product information", "reviews", "recent content", "news", "broad coverage"]`` | Key advantages of Brave: Privacy-focused, good semantic search, reasonable pricing, full-text retrieval available. | SEARCH |
| `search.engine.web.brave.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_BRAVE_SUPPORTS_FULL_SEARCH` | `true` | Whether Brave supports fetching full webpage content beyond snippets. | SEARCH |
| `search.engine.web.brave.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_BRAVE_USE_IN_AUTO_SEARCH` | `false` | Include Brave search in auto search mode | SEARCH |
| `search.engine.web.brave.weaknesses` | `LDR_SEARCH_ENGINE_WEB_BRAVE_WEAKNESSES` | ``["requires API key with usage limits", "smaller index than Google"]`` | Limitations of Brave: Smaller index than Google, requires paid API for production use. | SEARCH |
| `search.engine.web.elasticsearch.class_name` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_CLASS_NAME` | `ElasticsearchSearchEngine` | Setting for elasticsearch.class_name | SEARCH |
| `search.engine.web.elasticsearch.default_params.api_key` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_API_KEY` | `` | API key for Elasticsearch authentication (optional) | SEARCH |
| `search.engine.web.elasticsearch.default_params.cloud_id` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_CLOUD_ID` | `` | Elastic Cloud ID for cloud-hosted Elasticsearch (optional) | SEARCH |
| `search.engine.web.elasticsearch.default_params.highlight_fields` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_HIGHLIGHT_FIELDS` | ``["content", "title"]`` | Fields to highlight in search results (JSON array) | SEARCH |
| `search.engine.web.elasticsearch.default_params.hosts` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_HOSTS` | ``["http://localhost:9200"]`` | Elasticsearch server URLs (JSON array format) | SEARCH |
| `search.engine.web.elasticsearch.default_params.index_name` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_INDEX_NAME` | `documents` | Name of the Elasticsearch index to search | SEARCH |
| `search.engine.web.elasticsearch.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_MAX_RESULTS` | `10` | Maximum number of search results to return | SEARCH |
| `search.engine.web.elasticsearch.default_params.password` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_PASSWORD` | `` | Password for Elasticsearch authentication (optional) | SEARCH |
| `search.engine.web.elasticsearch.default_params.search_fields` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_SEARCH_FIELDS` | ``["content", "title", "description", "text"]`` | Fields to search in (JSON array) | SEARCH |
| `search.engine.web.elasticsearch.default_params.username` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DEFAULT_PARAMS_USERNAME` | `` | Username for Elasticsearch authentication (optional) | SEARCH |
| `search.engine.web.elasticsearch.description` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DESCRIPTION` | `Search engine for Elasticsearch databases. Efficient for searching document collections and structured data.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.elasticsearch.display_name` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_DISPLAY_NAME` | `Elasticsearch` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.elasticsearch.module_path` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_MODULE_PATH` | `.engines.search_engine_elasticsearch` | Setting for elasticsearch.module_path | SEARCH |
| `search.engine.web.elasticsearch.reliability` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_RELIABILITY` | `0.95` | Reliability of the Elasticsearch search engine | SEARCH |
| `search.engine.web.elasticsearch.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_REQUIRES_API_KEY` | `false` | Whether this search engine requires an API key | SEARCH |
| `search.engine.web.elasticsearch.requires_llm` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_REQUIRES_LLM` | `true` | Whether this search engine requires an LLM for relevance filtering | SEARCH |
| `search.engine.web.elasticsearch.strengths` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_STRENGTHS` | ``["Fast full-text search", "Scalable and distributed", "Rich query DSL", "Supports aggregations", "Real-time search"]`` | Strengths of this search engine | SEARCH |
| `search.engine.web.elasticsearch.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_SUPPORTS_FULL_SEARCH` | `true` | Whether this search engine supports full document search | SEARCH |
| `search.engine.web.elasticsearch.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_USE_IN_AUTO_SEARCH` | `false` | Include Elasticsearch in auto search mode | SEARCH |
| `search.engine.web.elasticsearch.weaknesses` | `LDR_SEARCH_ENGINE_WEB_ELASTICSEARCH_WEAKNESSES` | ``["Requires running Elasticsearch instance", "Needs proper index configuration", "Data must be pre-indexed"]`` | Weaknesses of this search engine | SEARCH |
| `search.engine.web.github.api_key` | `LDR_SEARCH_ENGINE_WEB_GITHUB_API_KEY` | `` | GitHub Personal Access Token for API authentication. Without this, you're limited to 10 requests/minute for search API. With authentication, you get 30 requests/minute for search API. | SEARCH |
| `search.engine.web.github.class_name` | `LDR_SEARCH_ENGINE_WEB_GITHUB_CLASS_NAME` | `GitHubSearchEngine` | Internal: Python class implementing the GitHub search engine. | SEARCH |
| `search.engine.web.github.default_params.include_issues` | `LDR_SEARCH_ENGINE_WEB_GITHUB_DEFAULT_PARAMS_INCLUDE_ISSUES` | `false` | Include recent GitHub issues in repository search results. | SEARCH |
| `search.engine.web.github.default_params.include_readme` | `LDR_SEARCH_ENGINE_WEB_GITHUB_DEFAULT_PARAMS_INCLUDE_README` | `true` | Fetch and include README content when searching repositories. | SEARCH |
| `search.engine.web.github.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_GITHUB_DEFAULT_PARAMS_MAX_RESULTS` | `15` | Maximum number of GitHub results to retrieve per search. | SEARCH |
| `search.engine.web.github.default_params.search_type` | `LDR_SEARCH_ENGINE_WEB_GITHUB_DEFAULT_PARAMS_SEARCH_TYPE` | `repositories` | What to search on GitHub: 'repositories', 'code', 'issues', or 'users'. | SEARCH |
| `search.engine.web.github.description` | `LDR_SEARCH_ENGINE_WEB_GITHUB_DESCRIPTION` | `Search projects on Github.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.github.display_name` | `LDR_SEARCH_ENGINE_WEB_GITHUB_DISPLAY_NAME` | `Github` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.github.module_path` | `LDR_SEARCH_ENGINE_WEB_GITHUB_MODULE_PATH` | `.engines.search_engine_github` | Internal: Python module path for the GitHub search engine implementation. | SEARCH |
| `search.engine.web.github.reliability` | `LDR_SEARCH_ENGINE_WEB_GITHUB_RELIABILITY` | `0.99` | Reliability rating (0-1) for GitHub as a source. Great for code and technical projects. | SEARCH |
| `search.engine.web.github.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_GITHUB_REQUIRES_API_KEY` | `true` | Whether GitHub requires a personal access token. Optional but recommended for higher rate limits (10→30 req/min). | SEARCH |
| `search.engine.web.github.strengths` | `LDR_SEARCH_ENGINE_WEB_GITHUB_STRENGTHS` | ``["code repositories", "software documentation", "open source projects", "programming issues", "developer information", "technical documentation"]`` | Key advantages of GitHub: Excellent for code/technical queries, includes README content, issue discussions, works without API key. | SEARCH |
| `search.engine.web.github.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_GITHUB_SUPPORTS_FULL_SEARCH` | `true` | Whether GitHub supports fetching full content (README, code files) beyond basic search results. | SEARCH |
| `search.engine.web.github.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_GITHUB_USE_IN_AUTO_SEARCH` | `true` | Include GitHub in auto search mode | SEARCH |
| `search.engine.web.github.weaknesses` | `LDR_SEARCH_ENGINE_WEB_GITHUB_WEAKNESSES` | ``["non-technical content", "content outside GitHub", "rate limits without API key"]`` | Limitations of GitHub: Limited to code/technical content, strict rate limits without authentication. | SEARCH |
| `search.engine.web.google_pse.api_key` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_API_KEY` | `` | The Google PSE API key to use. | SEARCH |
| `search.engine.web.google_pse.class_name` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_CLASS_NAME` | `GooglePSESearchEngine` | Internal: Python class implementing this search engine. Do not modify unless extending functionality. | SEARCH |
| `search.engine.web.google_pse.default_params.region` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_DEFAULT_PARAMS_REGION` | `us` | Geographic region for search results (e.g., 'us', 'uk', 'de'). Results prioritized from this region. | SEARCH |
| `search.engine.web.google_pse.default_params.safe_search` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_DEFAULT_PARAMS_SAFE_SEARCH` | `true` | Enable Google SafeSearch to filter explicit content from results. | SEARCH |
| `search.engine.web.google_pse.default_params.search_language` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_DEFAULT_PARAMS_SEARCH_LANGUAGE` | `English` | Language for search queries and results (e.g., 'English', 'German'). Filters results by language. | SEARCH |
| `search.engine.web.google_pse.description` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_DESCRIPTION` | `Search the web using Google's Programmable Search Engine API.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.google_pse.display_name` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_DISPLAY_NAME` | `Google PSE` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.google_pse.full_search_class` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_FULL_SEARCH_CLASS` | `FullSearchResults` | Internal: Class used for full-text content retrieval after initial search. Do not modify. | SEARCH |
| `search.engine.web.google_pse.full_search_module` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_FULL_SEARCH_MODULE` | `.engines.full_search` | Internal: Python module path for full search functionality. Do not modify. | SEARCH |
| `search.engine.web.google_pse.module_path` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_MODULE_PATH` | `.engines.search_engine_google_pse` | Internal: Python module path for this search engine implementation. Do not modify. | SEARCH |
| `search.engine.web.google_pse.reliability` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_RELIABILITY` | `0.9` | Reliability score (0-1) indicating result quality. Google PSE is highly reliable (0.9) due to Google's indexing quality. | SEARCH |
| `search.engine.web.google_pse.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_REQUIRES_API_KEY` | `true` | Indicates this engine requires a Google API key and Custom Search Engine ID. Get credentials from Google Cloud Console. | SEARCH |
| `search.engine.web.google_pse.strengths` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_STRENGTHS` | ``["custom search scope", "high-quality results", "domain-specific search", "configurable search experience", "control over search index"]`` | Advantages of Google PSE: Custom search scope, high-quality results, domain-specific filtering, configurable search experience. | SEARCH |
| `search.engine.web.google_pse.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_SUPPORTS_FULL_SEARCH` | `true` | Indicates this engine can fetch full page content after initial search, not just snippets. | SEARCH |
| `search.engine.web.google_pse.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_USE_IN_AUTO_SEARCH` | `false` | Include Google PSE in auto search mode | SEARCH |
| `search.engine.web.google_pse.weaknesses` | `LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_WEAKNESSES` | ``["requires API key with usage limits", "limited to 10,000 queries/day on free tier", "requires search engine configuration in Google Control Panel"]`` | Limitations: Requires API key, limited to 10,000 queries/day on free tier, requires search engine configuration in Google Control Panel. | SEARCH |
| `search.engine.web.mojeek.api_key` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_API_KEY` | `` | The Mojeek API key to use. Get one at mojeek.com/services/api. | SEARCH |
| `search.engine.web.mojeek.class_name` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_CLASS_NAME` | `MojeekSearchEngine` | Internal: Python class implementing the Mojeek search engine. | SEARCH |
| `search.engine.web.mojeek.default_params.language` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_DEFAULT_PARAMS_LANGUAGE` | `en` | Language code in ISO 639-1 format for Mojeek language boost (e.g., 'en', 'fr'). | SEARCH |
| `search.engine.web.mojeek.default_params.region` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_DEFAULT_PARAMS_REGION` | `` | Country code in ISO 3166-1 alpha-2 format for Mojeek region boost (e.g., 'GB', 'US'). | SEARCH |
| `search.engine.web.mojeek.default_params.safe_search` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_DEFAULT_PARAMS_SAFE_SEARCH` | `false` | Enable safe search filtering for Mojeek to exclude adult content. | SEARCH |
| `search.engine.web.mojeek.description` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_DESCRIPTION` | `Privacy-focused search engine with its own independent web index.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.mojeek.display_name` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_DISPLAY_NAME` | `Mojeek` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.mojeek.full_search_class` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_FULL_SEARCH_CLASS` | `FullSearchResults` | Internal: Class for fetching full webpage content from Mojeek results. | SEARCH |
| `search.engine.web.mojeek.full_search_module` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_FULL_SEARCH_MODULE` | `.engines.full_search` | Internal: Module for fetching full webpage content from Mojeek results. | SEARCH |
| `search.engine.web.mojeek.module_path` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_MODULE_PATH` | `.engines.search_engine_mojeek` | Internal: Python module path for the Mojeek search engine implementation. | SEARCH |
| `search.engine.web.mojeek.reliability` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_RELIABILITY` | `0.65` | Reliability rating (0-1) for Mojeek. Independent index, smaller than Google/Bing. | SEARCH |
| `search.engine.web.mojeek.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_REQUIRES_API_KEY` | `true` | Whether Mojeek requires an API key. Yes - get one at mojeek.com/services/api. | SEARCH |
| `search.engine.web.mojeek.strengths` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_STRENGTHS` | ``["privacy-focused with own index", "no tracking or profiling", "independent results", "European-based"]`` | Key advantages of Mojeek: Privacy-focused with own independent web index, no tracking. | SEARCH |
| `search.engine.web.mojeek.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_SUPPORTS_FULL_SEARCH` | `true` | Whether Mojeek supports fetching full webpage content beyond snippets. | SEARCH |
| `search.engine.web.mojeek.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_USE_IN_AUTO_SEARCH` | `false` | Include Mojeek in auto search mode | SEARCH |
| `search.engine.web.mojeek.weaknesses` | `LDR_SEARCH_ENGINE_WEB_MOJEEK_WEAKNESSES` | ``["smaller index than Google/Bing", "requires paid API key"]`` | Limitations of Mojeek: Smaller index than Google/Bing, requires paid API key. | SEARCH |
| `search.engine.web.nasa_ads` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS` | ``{"module_path": ".engines.search_engine_nasa_ads", "class_name": "NasaAdsSearchEngine", "requires_api_key": true, "requires_llm": true, "description": "Search millions of astronomy and physics papers with natural language queries", "reliability": 0.95}`` | NASA Astrophysics Data System search engine configuration | SEARCH |
| `search.engine.web.nasa_ads.api_key` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_API_KEY` | `` | NASA ADS API key (required). Get one at https://ui.adsabs.harvard.edu/user/settings/token | SEARCH |
| `search.engine.web.nasa_ads.class_name` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_CLASS_NAME` | `NasaAdsSearchEngine` | Python class name for the NASA ADS search engine implementation | SEARCH |
| `search.engine.web.nasa_ads.default_params.from_publication_date` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_DEFAULT_PARAMS_FROM_PUBLICATION_DATE` | `` | Only return papers published after this date (YYYY-MM-DD format, leave empty for all dates) | SEARCH |
| `search.engine.web.nasa_ads.default_params.include_arxiv` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_DEFAULT_PARAMS_INCLUDE_ARXIV` | `true` | Include ArXiv preprints in search results | SEARCH |
| `search.engine.web.nasa_ads.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_DEFAULT_PARAMS_MAX_RESULTS` | `25` | Maximum number of results to retrieve from NASA ADS | SEARCH |
| `search.engine.web.nasa_ads.default_params.min_citations` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_DEFAULT_PARAMS_MIN_CITATIONS` | `0` | Minimum number of citations required (0 for no filter) | SEARCH |
| `search.engine.web.nasa_ads.default_params.sort_by` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_DEFAULT_PARAMS_SORT_BY` | `relevance` | How to sort the search results | SEARCH |
| `search.engine.web.nasa_ads.description` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_DESCRIPTION` | `NASA Astrophysics Data System - comprehensive database of astronomy, astrophysics, physics, and geophysics papers. Includes both ArXiv preprints and published papers.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.nasa_ads.display_name` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_DISPLAY_NAME` | `NASA ADS` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.nasa_ads.journal_reputation.enabled` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_JOURNAL_REPUTATION_ENABLED` | `true` | Enable journal quality filtering for this search engine. | SEARCH |
| `search.engine.web.nasa_ads.module_path` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_MODULE_PATH` | `.engines.search_engine_nasa_ads` | Python module path for importing the NASA ADS search engine class | SEARCH |
| `search.engine.web.nasa_ads.reliability` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_RELIABILITY` | `0.95` | Reliability score for NASA ADS search results (0.0-1.0, higher is better) | SEARCH |
| `search.engine.web.nasa_ads.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_REQUIRES_API_KEY` | `true` | Whether NASA ADS requires an API key (always true for NASA ADS) | SEARCH |
| `search.engine.web.nasa_ads.strengths` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_STRENGTHS` | ``["astronomy papers", "astrophysics research", "physics papers", "space science", "planetary science", "cosmology", "high energy physics", "historical papers", "citation tracking", "natural language queries"]`` | List of research areas where NASA ADS excels | SEARCH |
| `search.engine.web.nasa_ads.weaknesses` | `LDR_SEARCH_ENGINE_WEB_NASA_ADS_WEAKNESSES` | ``["biomedical content", "computer science papers", "social sciences", "humanities", "requires API key for good performance"]`` | List of research areas where NASA ADS has limited coverage | SEARCH |
| `search.engine.web.openalex` | `LDR_SEARCH_ENGINE_WEB_OPENALEX` | ``{"module_path": ".engines.search_engine_openalex", "class_name": "OpenAlexSearchEngine", "requires_api_key": false, "requires_llm": true, "description": "Search 250+ million academic papers with natural language queries", "reliability": 0.95}`` | OpenAlex academic search engine configuration | SEARCH |
| `search.engine.web.openalex.class_name` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_CLASS_NAME` | `OpenAlexSearchEngine` | Python class name for the OpenAlex search engine implementation | SEARCH |
| `search.engine.web.openalex.default_params.filter_open_access` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_DEFAULT_PARAMS_FILTER_OPEN_ACCESS` | `false` | Only return open access papers that are freely available | SEARCH |
| `search.engine.web.openalex.default_params.from_publication_date` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_DEFAULT_PARAMS_FROM_PUBLICATION_DATE` | `` | Only return papers published after this date (YYYY-MM-DD format, leave empty for all dates) | SEARCH |
| `search.engine.web.openalex.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_DEFAULT_PARAMS_MAX_RESULTS` | `25` | Maximum number of results to retrieve from OpenAlex | SEARCH |
| `search.engine.web.openalex.default_params.min_citations` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_DEFAULT_PARAMS_MIN_CITATIONS` | `0` | Minimum number of citations required (0 for no filter) | SEARCH |
| `search.engine.web.openalex.default_params.sort_by` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_DEFAULT_PARAMS_SORT_BY` | `relevance` | How to sort the search results | SEARCH |
| `search.engine.web.openalex.description` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_DESCRIPTION` | `Search 250+ million academic papers with natural language queries. Comprehensive scholarly database covering all disciplines.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.openalex.display_name` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_DISPLAY_NAME` | `OpenAlex` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.openalex.email` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_EMAIL` | `` | Email address for polite pool access (optional but recommended). Provides faster response times. | SEARCH |
| `search.engine.web.openalex.journal_reputation.enabled` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_JOURNAL_REPUTATION_ENABLED` | `true` | Enable journal quality filtering for this search engine. | SEARCH |
| `search.engine.web.openalex.module_path` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_MODULE_PATH` | `.engines.search_engine_openalex` | Python module path for importing the OpenAlex search engine class | SEARCH |
| `search.engine.web.openalex.reliability` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_RELIABILITY` | `0.95` | Reliability score for OpenAlex search results (0.0-1.0, higher is better) | SEARCH |
| `search.engine.web.openalex.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_REQUIRES_API_KEY` | `false` | Whether OpenAlex requires an API key (always false for OpenAlex) | SEARCH |
| `search.engine.web.openalex.strengths` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_STRENGTHS` | ``["academic papers", "scholarly research", "natural language queries", "all disciplines", "citation analysis", "open access content", "comprehensive coverage", "no API key required", "high rate limits"]`` | List of research areas where OpenAlex excels | SEARCH |
| `search.engine.web.openalex.weaknesses` | `LDR_SEARCH_ENGINE_WEB_OPENALEX_WEAKNESSES` | ``["non-academic content", "news articles", "general web content", "real-time information"]`` | List of content types where OpenAlex has limited coverage | SEARCH |
| `search.engine.web.paperless.api_key` | `PAPERLESS_API_TOKEN` | `` | API key for authentication (get from Paperless admin panel) | SEARCH |
| `search.engine.web.paperless.class_name` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_CLASS_NAME` | `PaperlessSearchEngine` | Class name for the Paperless search engine | SEARCH |
| `search.engine.web.paperless.default_params.api_url` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_DEFAULT_PARAMS_API_URL` | `http://localhost:8000` | URL of your Paperless-ngx instance (e.g., http://localhost:8000) | SEARCH |
| `search.engine.web.paperless.default_params.include_content` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_DEFAULT_PARAMS_INCLUDE_CONTENT` | `true` | Include full document content (not just snippets) | SEARCH |
| `search.engine.web.paperless.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_DEFAULT_PARAMS_MAX_RESULTS` | `10` | Maximum number of documents to return | SEARCH |
| `search.engine.web.paperless.default_params.timeout` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_DEFAULT_PARAMS_TIMEOUT` | `30` | Request timeout in seconds | SEARCH |
| `search.engine.web.paperless.default_params.verify_ssl` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_DEFAULT_PARAMS_VERIFY_SSL` | `true` | Verify SSL certificates (disable for self-signed certs) | SEARCH |
| `search.engine.web.paperless.description` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_DESCRIPTION` | `Search your personal Paperless-ngx document archive.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.paperless.display_name` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_DISPLAY_NAME` | `Paperless` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.paperless.enabled` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_ENABLED` | `false` | Enable or disable this search engine | SEARCH |
| `search.engine.web.paperless.module_path` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_MODULE_PATH` | `.engines.search_engine_paperless` | Python module path for the search engine | SEARCH |
| `search.engine.web.paperless.reliability` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_RELIABILITY` | `0.95` | Reliability score for this search engine | SEARCH |
| `search.engine.web.paperless.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_REQUIRES_API_KEY` | `true` | Whether this engine requires an API key | SEARCH |
| `search.engine.web.paperless.requires_llm` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_REQUIRES_LLM` | `true` | Whether this engine requires an LLM for operation | SEARCH |
| `search.engine.web.paperless.strengths` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_STRENGTHS` | ``["personal documents", "OCR-processed content", "tagged and organized", "full-text search", "no rate limits", "private data"]`` | Strengths of this search engine | SEARCH |
| `search.engine.web.paperless.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_SUPPORTS_FULL_SEARCH` | `true` | Whether this engine supports full document search | SEARCH |
| `search.engine.web.paperless.weaknesses` | `LDR_SEARCH_ENGINE_WEB_PAPERLESS_WEAKNESSES` | ``["keyword-based search only", "no semantic understanding", "limited query operators"]`` | Weaknesses of this search engine | SEARCH |
| `search.engine.web.parallel.class_name` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_CLASS_NAME` | `ParallelSearchEngine` | Class name for the parallel search engine | SEARCH |
| `search.engine.web.parallel.default_params.allow_local_engines` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_DEFAULT_PARAMS_ALLOW_LOCAL_ENGINES` | `false` | ⚠️ PRIVACY WARNING: Allow local/private document engines in parallel search. This may expose personal data from your documents to web search queries and LLMs. Only enable if you understand the privacy implications. | SEARCH |
| `search.engine.web.parallel.default_params.enable_llm_relevance_filter` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_DEFAULT_PARAMS_ENABLE_LLM_RELEVANCE_FILTER` | `true` | Enable LLM-based relevance filtering to improve result quality by filtering out irrelevant results. This adds an extra LLM call but significantly improves result relevance. | SEARCH |
| `search.engine.web.parallel.default_params.max_engines_to_select` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_DEFAULT_PARAMS_MAX_ENGINES_TO_SELECT` | `5` | Maximum number of search engines to select and run in parallel | SEARCH |
| `search.engine.web.parallel.default_params.use_api_key_services` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_DEFAULT_PARAMS_USE_API_KEY_SERVICES` | `true` | Include search engines that require API keys | SEARCH |
| `search.engine.web.parallel.description` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_DESCRIPTION` | `Executes multiple search engines in parallel and combines their results. Uses LLM to select the most appropriate engines for each query.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.parallel.display_name` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_DISPLAY_NAME` | `Parallel Search` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.parallel.module_path` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_MODULE_PATH` | `.engines.parallel_search_engine` | Module path for the parallel search engine | SEARCH |
| `search.engine.web.parallel.reliability` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_RELIABILITY` | `0.9` | Reliability of the parallel search engine | SEARCH |
| `search.engine.web.parallel.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_REQUIRES_API_KEY` | `false` | Whether this search engine requires an API key | SEARCH |
| `search.engine.web.parallel.requires_llm` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_REQUIRES_LLM` | `true` | Whether this search engine requires an LLM for engine selection | SEARCH |
| `search.engine.web.parallel.strengths` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_STRENGTHS` | `["Comprehensive results from multiple sources", "Parallel execution for faster results", "Intelligent engine selection", "Automatic deduplication", "Combines strengths of all engines"]` | Strengths of this search engine | SEARCH |
| `search.engine.web.parallel.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_SUPPORTS_FULL_SEARCH` | `false` | Whether this search engine supports full document search | SEARCH |
| `search.engine.web.parallel.weaknesses` | `LDR_SEARCH_ENGINE_WEB_PARALLEL_WEAKNESSES` | `["Higher resource usage", "May hit rate limits on multiple engines", "Results may vary in quality", "Requires LLM for optimal selection"]` | Weaknesses of this search engine | SEARCH |
| `search.engine.web.pubmed.api_key` | `LDR_SEARCH_ENGINE_WEB_PUBMED_API_KEY` | `` | The PubMed API key to use. | SEARCH |
| `search.engine.web.pubmed.class_name` | `LDR_SEARCH_ENGINE_WEB_PUBMED_CLASS_NAME` | `PubMedSearchEngine` | Internal: Python class implementing the PubMed search engine. | SEARCH |
| `search.engine.web.pubmed.default_params.days_limit` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_DAYS_LIMIT` | `0` | Restrict results to articles published within the last N days. Leave empty for no time restriction. | SEARCH |
| `search.engine.web.pubmed.default_params.full_text_limit` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_FULL_TEXT_LIMIT` | `3` | Maximum number of full-text articles to download from PubMed Central when get_full_text is enabled. | SEARCH |
| `search.engine.web.pubmed.default_params.get_abstracts` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_GET_ABSTRACTS` | `true` | Fetch article abstracts for search results. Recommended for understanding paper content. | SEARCH |
| `search.engine.web.pubmed.default_params.get_full_text` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_GET_FULL_TEXT` | `false` | Attempt to retrieve full-text articles from PubMed Central (PMC). Only works for open-access papers. | SEARCH |
| `search.engine.web.pubmed.default_params.include_authors_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_AUTHORS_IN_CONTEXT` | `false` | Include author names in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_citation_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_CITATION_IN_CONTEXT` | `false` | Include volume, issue, and pages in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_doi_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_DOI_IN_CONTEXT` | `false` | Include DOI (Digital Object Identifier) in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_full_date_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_FULL_DATE_IN_CONTEXT` | `false` | Include full publication date in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_journal_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_JOURNAL_IN_CONTEXT` | `true` | Include journal name in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_keywords_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_KEYWORDS_IN_CONTEXT` | `true` | Include article keywords in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_language_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_LANGUAGE_IN_CONTEXT` | `false` | Include language indicator in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_mesh_terms_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_MESH_TERMS_IN_CONTEXT` | `true` | Include MeSH terms (medical subject headings) in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_pmc_availability_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_PMC_AVAILABILITY_IN_CONTEXT` | `false` | Include PMC free full text availability in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_pmid_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_PMID_IN_CONTEXT` | `false` | Include PubMed ID in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_publication_type_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_PUBLICATION_TYPE_IN_CONTEXT` | `true` | Include publication type (e.g., Clinical Trial, Meta-Analysis) in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.include_year_in_context` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_INCLUDE_YEAR_IN_CONTEXT` | `true` | Include publication year in context for analysis | SEARCH |
| `search.engine.web.pubmed.default_params.max_keywords` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_MAX_KEYWORDS` | `3` | Maximum number of keywords to show (0 = all) | SEARCH |
| `search.engine.web.pubmed.default_params.max_mesh_terms` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_MAX_MESH_TERMS` | `3` | Maximum number of MeSH terms to show (0 = all) | SEARCH |
| `search.engine.web.pubmed.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_MAX_RESULTS` | `20` | Maximum number of PubMed articles to retrieve per search query. | SEARCH |
| `search.engine.web.pubmed.default_params.optimize_queries` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DEFAULT_PARAMS_OPTIMIZE_QUERIES` | `true` | Automatically optimize natural language queries for PubMed's specialized syntax (MeSH terms, Boolean operators). | SEARCH |
| `search.engine.web.pubmed.description` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DESCRIPTION` | `Search papers indexed by PubMed.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.pubmed.display_name` | `LDR_SEARCH_ENGINE_WEB_PUBMED_DISPLAY_NAME` | `PubMed` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.pubmed.module_path` | `LDR_SEARCH_ENGINE_WEB_PUBMED_MODULE_PATH` | `.engines.search_engine_pubmed` | Internal: Python module path for the PubMed search engine implementation. | SEARCH |
| `search.engine.web.pubmed.reliability` | `LDR_SEARCH_ENGINE_WEB_PUBMED_RELIABILITY` | `0.98` | Reliability rating (0-1) for PubMed. Peer-reviewed biomedical literature is highly reliable. | SEARCH |
| `search.engine.web.pubmed.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_PUBMED_REQUIRES_API_KEY` | `false` | Whether PubMed requires an NCBI API key. Optional but recommended for higher rate limits. | SEARCH |
| `search.engine.web.pubmed.requires_llm` | `LDR_SEARCH_ENGINE_WEB_PUBMED_REQUIRES_LLM` | `true` | Whether PubMed query optimization uses LLM. Helps convert natural language to effective PubMed queries. | SEARCH |
| `search.engine.web.pubmed.strengths` | `LDR_SEARCH_ENGINE_WEB_PUBMED_STRENGTHS` | ``["biomedical literature", "medical research", "clinical studies", "life sciences", "health information", "scientific papers"]`` | Key advantages of PubMed: Peer-reviewed biomedical literature, MeSH indexing, abstracts, citation data, free access. | SEARCH |
| `search.engine.web.pubmed.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_PUBMED_USE_IN_AUTO_SEARCH` | `true` | Include PubMed in auto search mode | SEARCH |
| `search.engine.web.pubmed.weaknesses` | `LDR_SEARCH_ENGINE_WEB_PUBMED_WEAKNESSES` | ``["non-medical topics", "very recent papers may be missing", "limited to published research"]`` | Limitations of PubMed: Limited to biomedical literature, keyword-based search, full-text only for open-access papers. | SEARCH |
| `search.engine.web.scaleserp.api_key` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_API_KEY` | `` | The ScaleSerp API key to use. Get yours at https://scaleserp.com | SEARCH |
| `search.engine.web.scaleserp.class_name` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_CLASS_NAME` | `ScaleSerpSearchEngine` | Class name for the ScaleSerp search engine | SEARCH |
| `search.engine.web.scaleserp.default_params.device` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_DEFAULT_PARAMS_DEVICE` | `desktop` | Device type for search results | SEARCH |
| `search.engine.web.scaleserp.default_params.enable_cache` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_DEFAULT_PARAMS_ENABLE_CACHE` | `true` | Enable 1-hour caching to save API credits (cached searches are FREE within 1 hour) | SEARCH |
| `search.engine.web.scaleserp.default_params.include_full_content` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_DEFAULT_PARAMS_INCLUDE_FULL_CONTENT` | `false` | Include full webpage content in results (may slow down search) | SEARCH |
| `search.engine.web.scaleserp.default_params.language` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_DEFAULT_PARAMS_LANGUAGE` | `en` | Language code for search results (e.g., 'en', 'es', 'fr') | SEARCH |
| `search.engine.web.scaleserp.default_params.location` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_DEFAULT_PARAMS_LOCATION` | `United States` | Location for localized results (e.g., 'United States', 'London,England,United Kingdom') | SEARCH |
| `search.engine.web.scaleserp.default_params.safe_search` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_DEFAULT_PARAMS_SAFE_SEARCH` | `true` | Enable or disable safe search filtering | SEARCH |
| `search.engine.web.scaleserp.description` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_DESCRIPTION` | `Cost-effective Google Search API with 1-hour free caching that can reduce costs by 40-70% for repeated searches.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.scaleserp.display_name` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_DISPLAY_NAME` | `ScaleSerp` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.scaleserp.module_path` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_MODULE_PATH` | `.engines.search_engine_scaleserp` | Module path for the ScaleSerp search engine | SEARCH |
| `search.engine.web.scaleserp.reliability` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_RELIABILITY` | `0.95` | Reliability of the ScaleSerp search engine | SEARCH |
| `search.engine.web.scaleserp.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_REQUIRES_API_KEY` | `true` | Whether this search engine requires an API key | SEARCH |
| `search.engine.web.scaleserp.requires_llm` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_REQUIRES_LLM` | `false` | Whether this search engine requires an LLM for relevance filtering | SEARCH |
| `search.engine.web.scaleserp.strengths` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_STRENGTHS` | ``["1-hour free caching for repeated searches", "High-quality Google search results", "Real-time results with knowledge graph", "Cached searches don't count against quota", "Rich snippets and related questions"]`` | Strengths of this search engine | SEARCH |
| `search.engine.web.scaleserp.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_SUPPORTS_FULL_SEARCH` | `true` | Whether this search engine supports full document search | SEARCH |
| `search.engine.web.scaleserp.weaknesses` | `LDR_SEARCH_ENGINE_WEB_SCALESERP_WEAKNESSES` | ``["Requires API key with usage limits", "Not specialized for academic content", "Cache only lasts 1 hour"]`` | Weaknesses of this search engine | SEARCH |
| `search.engine.web.searxng.class_name` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_CLASS_NAME` | `SearXNGSearchEngine` | Internal: Python class implementing the SearXNG search engine. | SEARCH |
| `search.engine.web.searxng.default_params.categories` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_DEFAULT_PARAMS_CATEGORIES` | ``["general"]`` | SearXNG search categories to use (general, images, videos, news, music, files, etc.). | SEARCH |
| `search.engine.web.searxng.default_params.delay_between_requests` | `SEARXNG_DELAY` | `0` | Seconds to wait between SearXNG requests. Set higher values to avoid rate limiting on public instances. | SEARCH |
| `search.engine.web.searxng.default_params.include_full_content` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_DEFAULT_PARAMS_INCLUDE_FULL_CONTENT` | `true` | Fetch full webpage content for SearXNG results, not just snippets. | SEARCH |
| `search.engine.web.searxng.default_params.instance_url` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_DEFAULT_PARAMS_INSTANCE_URL` | `http://localhost:8080` | The SearXNG API endpoint URL. | SEARCH |
| `search.engine.web.searxng.default_params.language` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_DEFAULT_PARAMS_LANGUAGE` | `en` | Language code for SearXNG results (e.g., 'en', 'de', 'fr'). | SEARCH |
| `search.engine.web.searxng.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_DEFAULT_PARAMS_MAX_RESULTS` | `15` | Maximum number of results to retrieve from SearXNG per search. | SEARCH |
| `search.engine.web.searxng.default_params.safe_search` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_DEFAULT_PARAMS_SAFE_SEARCH` | `OFF` | Configure the safe search level | SEARCH |
| `search.engine.web.searxng.description` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_DESCRIPTION` | `A locally-hosted meta-search engine.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.searxng.display_name` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_DISPLAY_NAME` | `SearXNG (Locally-hosted)` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.searxng.full_search_class` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_FULL_SEARCH_CLASS` | `FullSearchResults` | Internal: Class for fetching full webpage content from SearXNG results. | SEARCH |
| `search.engine.web.searxng.full_search_module` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_FULL_SEARCH_MODULE` | `.engines.full_search` | Internal: Module for fetching full webpage content from SearXNG results. | SEARCH |
| `search.engine.web.searxng.module_path` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_MODULE_PATH` | `.engines.search_engine_searxng` | Internal: Python module path for the SearXNG search engine implementation. | SEARCH |
| `search.engine.web.searxng.reliability` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_RELIABILITY` | `1.0` | Reliability rating (0-1) for SearXNG. Aggregates multiple search engines for broad coverage. | SEARCH |
| `search.engine.web.searxng.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_REQUIRES_API_KEY` | `false` | Whether SearXNG requires an API key. No - SearXNG is self-hosted and free. | SEARCH |
| `search.engine.web.searxng.strengths` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_STRENGTHS` | ``["comprehensive general information", "current events and news", "technical documentation", "factual queries", "historical information", "consumer products", "educational content", "multi-source aggregation", "real-time results", "combined results from major search engines"]`` | Key advantages of SearXNG: Free, self-hosted, privacy-respecting, aggregates multiple engines, no API limits. | SEARCH |
| `search.engine.web.searxng.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_SUPPORTS_FULL_SEARCH` | `true` | Whether SearXNG supports fetching full webpage content beyond snippets. | SEARCH |
| `search.engine.web.searxng.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_USE_IN_AUTO_SEARCH` | `true` | Include SearXNG in auto search mode | SEARCH |
| `search.engine.web.searxng.weaknesses` | `LDR_SEARCH_ENGINE_WEB_SEARXNG_WEAKNESSES` | ``["requires self-hosting", "depends on other search engines", "may be rate limited by underlying engines"]`` | Limitations of SearXNG: Requires self-hosting, public instances may have rate limits, result quality varies. | SEARCH |
| `search.engine.web.semantic_scholar` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR` | ``{"module_path": ".engines.search_engine_semantic_scholar", "class_name": "SemanticScholarSearchEngine", "requires_api_key": false, "requires_llm": true, "description": "Search millions of academic papers across all scientific fields with AI-powered features", "reliability": 0.9}`` | Semantic Scholar search engine configuration | SEARCH |
| `search.engine.web.semantic_scholar.api_key` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_API_KEY` | `` | Semantic Scholar API key (optional but recommended for higher rate limits). Get one at https://www.semanticscholar.org/product/api | SEARCH |
| `search.engine.web.semantic_scholar.class_name` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_CLASS_NAME` | `SemanticScholarSearchEngine` | Class name for the search engine implementation | SEARCH |
| `search.engine.web.semantic_scholar.default_params.citation_limit` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DEFAULT_PARAMS_CITATION_LIMIT` | `10` | Maximum number of citations to fetch per paper | SEARCH |
| `search.engine.web.semantic_scholar.default_params.get_abstracts` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DEFAULT_PARAMS_GET_ABSTRACTS` | `true` | Fetch full abstracts for papers | SEARCH |
| `search.engine.web.semantic_scholar.default_params.get_citations` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DEFAULT_PARAMS_GET_CITATIONS` | `false` | Fetch citation information for papers | SEARCH |
| `search.engine.web.semantic_scholar.default_params.get_references` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DEFAULT_PARAMS_GET_REFERENCES` | `false` | Fetch reference information for papers | SEARCH |
| `search.engine.web.semantic_scholar.default_params.get_tldr` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DEFAULT_PARAMS_GET_TLDR` | `true` | Fetch AI-generated TLDR summaries for papers | SEARCH |
| `search.engine.web.semantic_scholar.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DEFAULT_PARAMS_MAX_RESULTS` | `20` | Maximum number of results to retrieve | SEARCH |
| `search.engine.web.semantic_scholar.default_params.optimize_queries` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DEFAULT_PARAMS_OPTIMIZE_QUERIES` | `true` | Use LLM to optimize natural language queries for better search results | SEARCH |
| `search.engine.web.semantic_scholar.default_params.reference_limit` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DEFAULT_PARAMS_REFERENCE_LIMIT` | `10` | Maximum number of references to fetch per paper | SEARCH |
| `search.engine.web.semantic_scholar.description` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DESCRIPTION` | `AI-powered search for scientific literature. Features include TLDR summaries, citation graphs, and semantic search across millions of papers.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.semantic_scholar.display_name` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_DISPLAY_NAME` | `Semantic Scholar` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.semantic_scholar.module_path` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_MODULE_PATH` | `.engines.search_engine_semantic_scholar` | Module path for the search engine implementation | SEARCH |
| `search.engine.web.semantic_scholar.reliability` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_RELIABILITY` | `0.9` | Reliability rating for the search engine | SEARCH |
| `search.engine.web.semantic_scholar.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_REQUIRES_API_KEY` | `false` | Whether this search engine requires an API key | SEARCH |
| `search.engine.web.semantic_scholar.strengths` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_STRENGTHS` | ``["AI-powered features", "TLDR summaries", "citation graphs", "semantic search", "all scientific fields", "no API key required", "computer science focus", "paper influence metrics"]`` | Strengths of this search engine | SEARCH |
| `search.engine.web.semantic_scholar.weaknesses` | `LDR_SEARCH_ENGINE_WEB_SEMANTIC_SCHOLAR_WEAKNESSES` | ``["rate limits without API key", "not all papers have TLDR", "coverage varies by field", "newer papers may be missing"]`` | Weaknesses of this search engine | SEARCH |
| `search.engine.web.serpapi.api_key` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_API_KEY` | `` | The Serp API key to use. | SEARCH |
| `search.engine.web.serpapi.class_name` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_CLASS_NAME` | `SerpAPISearchEngine` | Internal: Python class implementing the SerpAPI search engine. Do not modify. | SEARCH |
| `search.engine.web.serpapi.default_params.region` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_DEFAULT_PARAMS_REGION` | `us` | Geographic region for search results (e.g., 'us', 'uk'). Results prioritized from this region. | SEARCH |
| `search.engine.web.serpapi.default_params.safe_search` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_DEFAULT_PARAMS_SAFE_SEARCH` | `true` | Enable safe search to filter explicit content from results. | SEARCH |
| `search.engine.web.serpapi.default_params.search_language` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_DEFAULT_PARAMS_SEARCH_LANGUAGE` | `English` | Language for search queries and results. Filters results by language. | SEARCH |
| `search.engine.web.serpapi.default_params.time_period` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_DEFAULT_PARAMS_TIME_PERIOD` | `y` | Time filter for results: 'd' (day), 'w' (week), 'm' (month), 'y' (year), or empty for all time. | SEARCH |
| `search.engine.web.serpapi.description` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_DESCRIPTION` | `Search the web with Google's search API.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.serpapi.display_name` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_DISPLAY_NAME` | `SerpApi` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.serpapi.full_search_class` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_FULL_SEARCH_CLASS` | `FullSerpAPISearchResults` | Internal: Class used for full-text content retrieval after initial search. Do not modify. | SEARCH |
| `search.engine.web.serpapi.full_search_module` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_FULL_SEARCH_MODULE` | `.engines.full_serp_search_results_old` | Internal: Python module path for full search functionality. Do not modify. | SEARCH |
| `search.engine.web.serpapi.module_path` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_MODULE_PATH` | `.engines.search_engine_serpapi` | Internal: Python module path for this search engine implementation. Do not modify. | SEARCH |
| `search.engine.web.serpapi.reliability` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_RELIABILITY` | `0.6` | Reliability score (0-1). SerpAPI rated moderate (0.6) - good for general search but less authoritative than specialized sources. | SEARCH |
| `search.engine.web.serpapi.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_REQUIRES_API_KEY` | `true` | Indicates this engine requires a SerpAPI key. Get one at serpapi.com. | SEARCH |
| `search.engine.web.serpapi.strengths` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_STRENGTHS` | ``["comprehensive web search", "product information", "reviews", "recent content", "news", "broad coverage"]`` | Advantages: Comprehensive web search via Google SERP, good for products, reviews, news, and broad coverage. | SEARCH |
| `search.engine.web.serpapi.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_SUPPORTS_FULL_SEARCH` | `true` | Indicates this engine can fetch full page content after initial search, not just snippets. | SEARCH |
| `search.engine.web.serpapi.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_USE_IN_AUTO_SEARCH` | `false` | Include SerpAPI in auto search mode | SEARCH |
| `search.engine.web.serpapi.weaknesses` | `LDR_SEARCH_ENGINE_WEB_SERPAPI_WEAKNESSES` | ``["requires API key with usage limits", "not specialized for academic content"]`` | Limitations: Requires paid API key with usage limits, not specialized for academic or technical content. | SEARCH |
| `search.engine.web.serper.api_key` | `LDR_SEARCH_ENGINE_WEB_SERPER_API_KEY` | `` | The Serper API key to use. Get yours at https://serper.dev | SEARCH |
| `search.engine.web.serper.class_name` | `LDR_SEARCH_ENGINE_WEB_SERPER_CLASS_NAME` | `SerperSearchEngine` | Class name for the Serper search engine | SEARCH |
| `search.engine.web.serper.default_params.include_full_content` | `LDR_SEARCH_ENGINE_WEB_SERPER_DEFAULT_PARAMS_INCLUDE_FULL_CONTENT` | `false` | Include full webpage content in results (may slow down search) | SEARCH |
| `search.engine.web.serper.default_params.region` | `LDR_SEARCH_ENGINE_WEB_SERPER_DEFAULT_PARAMS_REGION` | `us` | Country code for localized results (e.g., 'us', 'gb', 'fr') | SEARCH |
| `search.engine.web.serper.default_params.safe_search` | `LDR_SEARCH_ENGINE_WEB_SERPER_DEFAULT_PARAMS_SAFE_SEARCH` | `true` | Enable or disable safe search filtering | SEARCH |
| `search.engine.web.serper.default_params.search_language` | `LDR_SEARCH_ENGINE_WEB_SERPER_DEFAULT_PARAMS_SEARCH_LANGUAGE` | `en` | Language code for search results (e.g., 'en', 'es', 'fr') | SEARCH |
| `search.engine.web.serper.default_params.time_period` | `LDR_SEARCH_ENGINE_WEB_SERPER_DEFAULT_PARAMS_TIME_PERIOD` | `null` | Filter results by time period ('day', 'week', 'month', 'year', or null for all time) | SEARCH |
| `search.engine.web.serper.description` | `LDR_SEARCH_ENGINE_WEB_SERPER_DESCRIPTION` | `Fast and affordable Google Search API with real-time results.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.serper.display_name` | `LDR_SEARCH_ENGINE_WEB_SERPER_DISPLAY_NAME` | `Serper` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.serper.module_path` | `LDR_SEARCH_ENGINE_WEB_SERPER_MODULE_PATH` | `.engines.search_engine_serper` | Module path for the Serper search engine | SEARCH |
| `search.engine.web.serper.reliability` | `LDR_SEARCH_ENGINE_WEB_SERPER_RELIABILITY` | `0.95` | Reliability of the Serper search engine | SEARCH |
| `search.engine.web.serper.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_SERPER_REQUIRES_API_KEY` | `true` | Whether this search engine requires an API key | SEARCH |
| `search.engine.web.serper.requires_llm` | `LDR_SEARCH_ENGINE_WEB_SERPER_REQUIRES_LLM` | `false` | Whether this search engine requires an LLM for relevance filtering | SEARCH |
| `search.engine.web.serper.strengths` | `LDR_SEARCH_ENGINE_WEB_SERPER_STRENGTHS` | ``["Lightning-fast response times (1-2 seconds)", "Very affordable pricing", "High-quality Google search results", "Real-time results", "Knowledge graph and related searches", "People Also Ask data", "Simple pay-per-use pricing"]`` | Strengths of this search engine | SEARCH |
| `search.engine.web.serper.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_SERPER_SUPPORTS_FULL_SEARCH` | `true` | Whether this search engine supports full document search | SEARCH |
| `search.engine.web.serper.weaknesses` | `LDR_SEARCH_ENGINE_WEB_SERPER_WEAKNESSES` | ``["Requires API key with usage limits", "Not specialized for academic content"]`` | Weaknesses of this search engine | SEARCH |
| `search.engine.web.tavily.api_key` | `LDR_SEARCH_ENGINE_WEB_TAVILY_API_KEY` | `` | The Tavily API key to use. | SEARCH |
| `search.engine.web.tavily.class_name` | `LDR_SEARCH_ENGINE_WEB_TAVILY_CLASS_NAME` | `TavilySearchEngine` | Internal: Python class implementing Tavily AI search. Do not modify. | SEARCH |
| `search.engine.web.tavily.default_params.include_full_content` | `LDR_SEARCH_ENGINE_WEB_TAVILY_DEFAULT_PARAMS_INCLUDE_FULL_CONTENT` | `true` | Include full webpage content in results | SEARCH |
| `search.engine.web.tavily.default_params.search_depth` | `LDR_SEARCH_ENGINE_WEB_TAVILY_DEFAULT_PARAMS_SEARCH_DEPTH` | `basic` | Search depth - basic for speed, advanced for quality | SEARCH |
| `search.engine.web.tavily.description` | `LDR_SEARCH_ENGINE_WEB_TAVILY_DESCRIPTION` | `AI-powered search engine optimized for research with built-in answer extraction.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.tavily.display_name` | `LDR_SEARCH_ENGINE_WEB_TAVILY_DISPLAY_NAME` | `Tavily` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.tavily.module_path` | `LDR_SEARCH_ENGINE_WEB_TAVILY_MODULE_PATH` | `.engines.search_engine_tavily` | Internal: Python module path for this search engine implementation. Do not modify. | SEARCH |
| `search.engine.web.tavily.reliability` | `LDR_SEARCH_ENGINE_WEB_TAVILY_RELIABILITY` | `0.8` | Reliability score (0-1). Tavily rated high (0.8) - AI-optimized search designed for research applications. | SEARCH |
| `search.engine.web.tavily.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_TAVILY_REQUIRES_API_KEY` | `true` | Indicates this engine requires a Tavily API key. Get one at tavily.com. | SEARCH |
| `search.engine.web.tavily.strengths` | `LDR_SEARCH_ENGINE_WEB_TAVILY_STRENGTHS` | ``["AI-powered search optimization", "built-in answer extraction", "research-focused results", "high-quality content filtering", "fast response times"]`` | Advantages: AI-powered search optimization, built-in answer extraction, research-focused results, fast responses. | SEARCH |
| `search.engine.web.tavily.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_TAVILY_SUPPORTS_FULL_SEARCH` | `true` | Indicates this engine can fetch full page content after initial search, not just snippets. | SEARCH |
| `search.engine.web.tavily.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_TAVILY_USE_IN_AUTO_SEARCH` | `false` | Include Tavily in auto search mode | SEARCH |
| `search.engine.web.tavily.weaknesses` | `LDR_SEARCH_ENGINE_WEB_TAVILY_WEAKNESSES` | ``["requires API key with usage limits", "newer service with smaller historical data"]`` | Limitations: Requires paid API key with usage limits, newer service with less historical data coverage. | SEARCH |
| `search.engine.web.wayback.class_name` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_CLASS_NAME` | `WaybackSearchEngine` | Internal: Python class implementing the Wayback Machine search. Do not modify. | SEARCH |
| `search.engine.web.wayback.default_params.closest_only` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_DEFAULT_PARAMS_CLOSEST_ONLY` | `false` | When enabled, only return the snapshot closest to the target date instead of all available snapshots. | SEARCH |
| `search.engine.web.wayback.default_params.language` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_DEFAULT_PARAMS_LANGUAGE` | `English` | Preferred language for archived content when multiple versions are available. | SEARCH |
| `search.engine.web.wayback.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_DEFAULT_PARAMS_MAX_RESULTS` | `15` | Maximum number of archived snapshots to return per search query. | SEARCH |
| `search.engine.web.wayback.default_params.max_snapshots_per_url` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_DEFAULT_PARAMS_MAX_SNAPSHOTS_PER_URL` | `3` | Maximum historical snapshots to retrieve for each URL. Lower values are faster; higher shows more history. | SEARCH |
| `search.engine.web.wayback.description` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_DESCRIPTION` | `Search the Internet Archive's Wayback Machine.` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.wayback.display_name` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_DISPLAY_NAME` | `Wayback` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.wayback.module_path` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_MODULE_PATH` | `.engines.search_engine_wayback` | Internal: Python module path for this search engine implementation. Do not modify. | SEARCH |
| `search.engine.web.wayback.reliability` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_RELIABILITY` | `0.5` | Reliability score (0-1). Wayback rated moderate (0.5) - great for historical content but archives may be incomplete. | SEARCH |
| `search.engine.web.wayback.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_REQUIRES_API_KEY` | `false` | Wayback Machine is free and does not require an API key. | SEARCH |
| `search.engine.web.wayback.strengths` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_STRENGTHS` | ``["historical web content", "archived websites", "content verification", "deleted or changed web pages", "website evolution tracking"]`` | Advantages: Access to historical web content, archived/deleted pages, content verification, and website evolution. | SEARCH |
| `search.engine.web.wayback.supports_full_search` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_SUPPORTS_FULL_SEARCH` | `true` | Indicates this engine can fetch full archived page content, not just metadata. | SEARCH |
| `search.engine.web.wayback.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_USE_IN_AUTO_SEARCH` | `false` | Include Wayback in auto search mode | SEARCH |
| `search.engine.web.wayback.weaknesses` | `LDR_SEARCH_ENGINE_WEB_WAYBACK_WEAKNESSES` | ``["limited to previously archived content", "may miss recent changes", "archiving quality varies"]`` | Limitations: Only finds previously archived content, may miss recent changes, archive quality varies by site. | SEARCH |
| `search.engine.web.wikinews.adaptive_search` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_ADAPTIVE_SEARCH` | `true` | Automatically adjust the search period based on the query type (recent or historical) to improve news search relevance | SEARCH |
| `search.engine.web.wikinews.class_name` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_CLASS_NAME` | `WikinewsSearchEngine` | Internal: Python class implementing Wikinews search. Do not modify. | SEARCH |
| `search.engine.web.wikinews.description` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_DESCRIPTION` | `Search news articles written by volunteers with verified sources` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.wikinews.display_name` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_DISPLAY_NAME` | `Wikinews` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.wikinews.module_path` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_MODULE_PATH` | `.engines.search_engine_wikinews` | Internal: Python module path for this search engine implementation. Do not modify. | SEARCH |
| `search.engine.web.wikinews.reliability` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_RELIABILITY` | `0.85` | Reliability score (0-1). Wikinews rated moderately high (0.85) - volunteer-written but with verified sources. | SEARCH |
| `search.engine.web.wikinews.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_REQUIRES_API_KEY` | `false` | Wikinews is free and does not require an API key. | SEARCH |
| `search.engine.web.wikinews.strengths` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_STRENGTHS` | ``["recent and historical news", "verified sources", "general coverage", "quick overviews"]`` | Advantages: Recent and historical news coverage, verified sources, general topic coverage, quick overviews. | SEARCH |
| `search.engine.web.wikinews.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_USE_IN_AUTO_SEARCH` | `true` | Include Wikinews in auto search mode | SEARCH |
| `search.engine.web.wikinews.weaknesses` | `LDR_SEARCH_ENGINE_WEB_WIKINEWS_WEAKNESSES` | ``["less coverage of specialized topics", "limited editorial oversight", "variable article quality", "not ideal for breaking news"]`` | Limitations: Less coverage of specialized topics, variable article quality, not ideal for breaking news. | SEARCH |
| `search.engine.web.wikipedia.class_name` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_CLASS_NAME` | `WikipediaSearchEngine` | Internal: Python class implementing Wikipedia search. Do not modify. | SEARCH |
| `search.engine.web.wikipedia.default_params.include_content` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_DEFAULT_PARAMS_INCLUDE_CONTENT` | `true` | Include full article content in results, not just titles and summaries. | SEARCH |
| `search.engine.web.wikipedia.default_params.max_results` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_DEFAULT_PARAMS_MAX_RESULTS` | `20` | Maximum number of Wikipedia articles to return per search query. | SEARCH |
| `search.engine.web.wikipedia.description` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_DESCRIPTION` | `Search Wikipedia articles` | Human-readable description of the search engine. | SEARCH |
| `search.engine.web.wikipedia.display_name` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_DISPLAY_NAME` | `Wikipedia` | Display name to use in the U.I. for this search engine. | SEARCH |
| `search.engine.web.wikipedia.module_path` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_MODULE_PATH` | `.engines.search_engine_wikipedia` | Internal: Python module path for this search engine implementation. Do not modify. | SEARCH |
| `search.engine.web.wikipedia.reliability` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_RELIABILITY` | `0.95` | Reliability score (0-1). Wikipedia rated very high (0.95) - well-curated, cited content with editorial review. | SEARCH |
| `search.engine.web.wikipedia.requires_api_key` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_REQUIRES_API_KEY` | `false` | Wikipedia is free and does not require an API key. | SEARCH |
| `search.engine.web.wikipedia.strengths` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_STRENGTHS` | ``["factual information", "general knowledge", "definitions", "historical facts", "biographies", "overview information"]`` | Advantages: Factual information, general knowledge, definitions, historical facts, biographies, and overviews. | SEARCH |
| `search.engine.web.wikipedia.use_in_auto_search` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_USE_IN_AUTO_SEARCH` | `true` | Include Wikipedia in auto search mode | SEARCH |
| `search.engine.web.wikipedia.weaknesses` | `LDR_SEARCH_ENGINE_WEB_WIKIPEDIA_WEAKNESSES` | ``["recent events", "specialized academic topics", "product comparisons"]`` | Limitations: May lag on recent events, less depth on specialized academic topics, no product comparisons. | SEARCH |
| `search.favorites` | `LDR_SEARCH_FAVORITES` | ``["arxiv", "searxng", "library", "openalex"]`` | List of favorite search engine IDs that appear at the top of the dropdown. Edit as JSON array, e.g. ["arxiv", "brave"] | SEARCH |
| `search.final_max_results` | `LDR_SEARCH_FINAL_MAX_RESULTS` | `100` | Maximum unique sources to include in the final research report after cross-engine deduplication and filtering. | SEARCH |
| `search.iterations` | `LDR_SEARCH_ITERATIONS` | `3` | Maximum number of sequential search cycles to perform. Each iteration generates new follow-up questions based on previous results to progressively expand and refine the research. | SEARCH |
| `search.journal_reputation.exclude_non_published` | `LDR_SEARCH_JOURNAL_REPUTATION_EXCLUDE_NON_PUBLISHED` | `false` | When enabled, excludes results without a formal journal publication reference. Use this to require only peer-reviewed academic sources. | SEARCH |
| `search.journal_reputation.max_context` | `LDR_SEARCH_JOURNAL_REPUTATION_MAX_CONTEXT` | `3000` | Maximum characters of source content to include when the LLM evaluates journal quality. Larger values provide more context but use more tokens. | SEARCH |
| `search.journal_reputation.reanalysis_period` | `LDR_SEARCH_JOURNAL_REPUTATION_REANALYSIS_PERIOD` | `265` | How often to refresh cached journal quality assessments. After this period, the LLM will re-evaluate the journal's reputation. | SEARCH |
| `search.journal_reputation.threshold` | `LDR_SEARCH_JOURNAL_REPUTATION_THRESHOLD` | `4` | Minimum journal quality score (1-10 scale) required for academic results. Sources from journals below this threshold are excluded when the filter is enabled. | SEARCH |
| `search.max_filtered_results` | `LDR_SEARCH_MAX_FILTERED_RESULTS` | `20` | Maximum results to keep after LLM-based relevance filtering. Results above this limit are discarded to focus on the most relevant sources. | SEARCH |
| `search.max_results` | `LDR_SEARCH_MAX_RESULTS` | `50` | Maximum raw search results to retrieve from the search engine per query, before any relevance filtering is applied. | SEARCH |
| `search.quality_check_urls` | `LDR_SEARCH_QUALITY_CHECK_URLS` | `true` | Validate that fetched webpage content matches the search result's title and snippet. Helps filter out broken links or mismatched content. | SEARCH |
| `search.questions_per_iteration` | `LDR_SEARCH_QUESTIONS_PER_ITERATION` | `1` | Number of search queries to generate and execute per iteration. More questions increase coverage but also increase API calls and processing time. | SEARCH |
| `search.region` | `LDR_SEARCH_REGION` | `us` | Geographic region for search results localization. Affects which results are prioritized based on regional relevance. | SEARCH |
| `search.safe_search` | `LDR_SEARCH_SAFE_SEARCH` | `true` | Filter out adult and inappropriate content from search results. Passed to search engines that support content filtering. | SEARCH |
| `search.search_language` | `LDR_SEARCH_SEARCH_LANGUAGE` | `English` | Preferred language for search results. Search engines will attempt to return and prioritize results in this language when available. | SEARCH |
| `search.search_strategy` | `LDR_SEARCH_SEARCH_STRATEGY` | `source-based` | Research methodology. 'source-based': Finds and extracts from sources (good for comprehensive research). 'focused-iteration': Uses entity-based progressive exploration (optimized for factual Q&A, ~95% SimpleQA accuracy). 'iterative-refinement': LLM-guided progressive refinement with gap analysis. | SEARCH |
| `search.searches_per_section` | `LDR_SEARCH_SEARCHES_PER_SECTION` | `2` | Number of independent searches per report section when generating structured reports. Increases coverage but also increases API usage. | SEARCH |
| `search.skip_relevance_filter` | `LDR_SEARCH_SKIP_RELEVANCE_FILTER` | `true` | Global override to skip LLM relevance filtering for ALL search engines. When false, filtering is auto-enabled for academic engines (arXiv, Semantic Scholar, etc.) that use keyword matching, and auto-disabled for generic engines (Google, Brave, etc.) that already use semantic search. | SEARCH |
| `search.snippets_only` | `LDR_SEARCH_SNIPPETS_ONLY` | `true` | Only retrieve search snippets instead of full webpage content. Reduces API usage and processing time, but provides less context for analysis. | SEARCH |
| `search.time_period` | `LDR_SEARCH_TIME_PERIOD` | `y` | Time range filter for search results. Limits results to content published within the specified period (day, week, month, year, or all time). | SEARCH |
| `search.tool` | `LDR_SEARCH_TOOL` | `searxng` | Web search engine to use for research | SEARCH |
| `security.rate_limit_default` | `LDR_SECURITY_RATE_LIMIT_DEFAULT` | `5000 per hour;50000 per day` | Default rate limit for all HTTP endpoints. Multiple limits can be separated by semicolons (e.g., '5000 per hour;50000 per day'). Format: 'N per hour/minute/day'. | APP |
| `security.rate_limit_login` | `LDR_SECURITY_RATE_LIMIT_LOGIN` | `5 per 15 minutes` | Rate limit for login attempts to prevent brute force attacks (e.g., '5 per 15 minutes'). | APP |
| `security.rate_limit_registration` | `LDR_SECURITY_RATE_LIMIT_REGISTRATION` | `3 per hour` | Rate limit for registration attempts to prevent spam (e.g., '3 per hour'). | APP |
| `security.session_remember_me_days` | `LDR_SECURITY_SESSION_REMEMBER_ME_DAYS` | `30` | Number of days a 'Remember Me' session remains valid before requiring re-login. | APP |
| `security.session_timeout_hours` | `LDR_SECURITY_SESSION_TIMEOUT_HOURS` | `2` | Session timeout in hours for sessions without 'Remember Me' checked. Only applies to internal session validation (browser sessions expire on close). | APP |
| `web.host` | `LDR_WEB_HOST` | `0.0.0.0` | The host address to listen on. | SEARCH |
| `web.port` | `LDR_WEB_PORT` | `5000` | The port to listen on. | SEARCH |
| `web.use_https` | `LDR_WEB_USE_HTTPS` | `true` | Whether to enable HTTPS for the web interface. | SEARCH |
*Generated by scripts/generate_config_docs.py*

View File

@@ -73,6 +73,8 @@
### Development Environment Variables
For the full list of all settings and environment variables, see [CONFIGURATION.md](CONFIGURATION.md).
For local development and testing, you may want to configure these environment variables:
| Variable | Default | Description |

View File

@@ -1,6 +1,8 @@
# Configuring Local Deep Research with Environment Variables
> **Note:** For most users, the **Web UI Settings** is the recommended way to configure Local Deep Research. Environment variables are primarily useful for Docker deployments, CI/CD pipelines, and server configurations where the web UI is not accessible during startup.
>
> For a complete auto-generated reference of **all** settings, defaults, and environment variables, see [CONFIGURATION.md](CONFIGURATION.md).
You can override any configuration setting in Local Deep Research using environment variables. This is useful for:

380
scripts/generate_config_docs.py Executable file
View File

@@ -0,0 +1,380 @@
#!/usr/bin/env python3
"""
Generate CONFIGURATION.md from default settings JSON files and env_definitions.
Usage:
python scripts/generate_config_docs.py # Write to docs/CONFIGURATION.md
python scripts/generate_config_docs.py --output /tmp/out # Write to custom location
python scripts/generate_config_docs.py --check # Exit 1 if docs are stale
"""
import argparse
import ast
import json
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional
def get_project_root() -> Path:
"""Return the project root directory."""
return Path(__file__).resolve().parent.parent
def get_env_var_name(key: str) -> str:
"""Convert setting key to environment variable name."""
return f"LDR_{key.replace('.', '_').upper()}"
def format_value(value: Any) -> str:
"""Format default value for markdown."""
if value is None:
return "null"
if isinstance(value, bool):
return str(value).lower()
if isinstance(value, (dict, list)):
return f"`{json.dumps(value)}`"
return str(value)
def _discover_env_definition_files(env_defs_dir: Path) -> List[Path]:
"""Auto-discover env_definitions modules, excluding __init__.py and env_settings.py."""
if not env_defs_dir.is_dir():
return []
return sorted(
p
for p in env_defs_dir.glob("*.py")
if p.name not in ("__init__.py", "env_settings.py")
)
def _category_from_filename(filename: str) -> str:
"""Derive a human-readable category name from a filename.
Example: 'db_config.py' -> 'Db Config'
"""
stem = filename.removesuffix(".py")
return stem.replace("_", " ").title()
def _extract_setting_from_call(node: ast.Call) -> Optional[Dict[str, Any]]:
"""Extract a setting dict from a *Setting() AST call node."""
if not (
isinstance(node.func, ast.Name) and node.func.id.endswith("Setting")
):
return None
keywords = {k.arg: k.value for k in node.keywords if k.arg}
if "key" not in keywords:
return None
key_node = keywords["key"]
if not isinstance(key_node, ast.Constant):
return None
key = key_node.value # gitleaks:allow
# Description — may be a simple string or a parenthesised concatenation
description = ""
if "description" in keywords:
desc_node = keywords["description"]
if isinstance(desc_node, ast.Constant):
description = desc_node.value
else:
description = ast.unparse(desc_node)
# Default
default_val = "None"
if "default" in keywords:
default_val = ast.unparse(keywords["default"])
# Env var (auto-generated unless explicitly overridden)
if "env_var" in keywords and isinstance(keywords["env_var"], ast.Constant):
env_var = keywords["env_var"].value
else:
env_var = get_env_var_name(key)
# Type from the class name (e.g. BooleanSetting -> Boolean)
setting_type = node.func.id.replace("Setting", "")
# Required
required = False
if "required" in keywords and isinstance(
keywords["required"], ast.Constant
):
required = bool(keywords["required"].value)
# Min/max value
min_value = None
if "min_value" in keywords and isinstance(
keywords["min_value"], ast.Constant
):
min_value = keywords["min_value"].value
max_value = None
if "max_value" in keywords and isinstance(
keywords["max_value"], ast.Constant
):
max_value = keywords["max_value"].value
# Allowed values (ast.Set of ast.Constant)
allowed_values = None
if "allowed_values" in keywords and isinstance(
keywords["allowed_values"], ast.Set
):
allowed_values = sorted(
elt.value
for elt in keywords["allowed_values"].elts
if isinstance(elt, ast.Constant)
)
# Deprecated env var
deprecated_env_var = None
if "deprecated_env_var" in keywords and isinstance(
keywords["deprecated_env_var"], ast.Constant
):
deprecated_env_var = keywords["deprecated_env_var"].value
return {
"key": key,
"env_var": env_var,
"description": description,
"default": default_val,
"type": setting_type,
"required": required,
"min_value": min_value,
"max_value": max_value,
"allowed_values": allowed_values,
"deprecated_env_var": deprecated_env_var,
}
def get_env_only_settings(
root_dir: Optional[Path] = None,
) -> List[Dict[str, Any]]:
"""
Extract env-only settings from env_definitions/ by auto-discovering modules.
These are settings required before database initialization.
"""
root_dir = root_dir or get_project_root()
env_defs_dir = (
root_dir
/ "src"
/ "local_deep_research"
/ "settings"
/ "env_definitions"
)
env_only: List[Dict[str, Any]] = []
for filepath in _discover_env_definition_files(env_defs_dir):
category = _category_from_filename(filepath.name)
try:
content = filepath.read_text()
tree = ast.parse(content)
except Exception as e:
print(f"Warning: Could not parse {filepath}: {e}")
continue
for node in ast.walk(tree):
if not isinstance(node, ast.Call):
continue
setting = _extract_setting_from_call(node)
if setting is None:
continue
setting["category"] = category
env_only.append(setting)
return env_only
def _format_constraints(setting: Dict[str, Any]) -> str:
"""Build a human-readable constraints string."""
parts = []
if (
setting.get("min_value") is not None
and setting.get("max_value") is not None
):
parts.append(f"{setting['min_value']}..{setting['max_value']}")
elif setting.get("min_value") is not None:
parts.append(f">={setting['min_value']}")
elif setting.get("max_value") is not None:
parts.append(f"<={setting['max_value']}")
if setting.get("allowed_values"):
parts.append(", ".join(setting["allowed_values"]))
return " | ".join(parts) if parts else ""
def generate_docs_content(root_dir: Optional[Path] = None) -> str:
"""Generate the full CONFIGURATION.md content as a string."""
root_dir = root_dir or get_project_root()
defaults_dir = root_dir / "src" / "local_deep_research" / "defaults"
settings: Dict[str, Any] = {}
# Recursively find all JSON files
for json_file in sorted(defaults_dir.rglob("*.json")):
try:
with open(json_file, "r") as f:
data = json.load(f)
settings.update(data)
except Exception as e:
print(f"Warning: Could not load {json_file}: {e}")
sorted_keys = sorted(settings.keys())
# Get env-only settings
env_only_settings = get_env_only_settings(root_dir)
# Build markdown
content = [
"# Configuration Reference",
"",
"This document is automatically generated from the application's default settings.",
"All settings can be configured via the Web UI (Settings page), or overridden via Environment Variables.",
"",
"## Environment Variables",
"",
"To override a setting using an environment variable, convert the key to uppercase, replace dots with underscores, and prefix with `LDR_`.",
"For example, `app.debug` becomes `LDR_APP_DEBUG`.",
"",
"Configuration Priority: Web UI Config > Environment Variables > Default Values",
"> Environmental Variables are used to override default values, easing installation, while allowing for adjustments to configuration via Web UI.",
"",
"### System Locking",
"There is a special environment variable `LDR_LOCKED_SETTINGS` that allows administrators to strictly enforce specific settings.",
"",
"* **Variable**: `LDR_LOCKED_SETTINGS`",
"* **Format**: Comma-separated list of setting keys (e.g., `llm.model,app.port`)",
"* **Behavior**:",
" 1. Any setting listed here **MUST** have a corresponding value defined in the environment variables (e.g., `LDR_LLM_MODEL`). If not, the application will fail to start.",
" 2. The setting becomes **read-only** in the Web UI.",
" 3. The **Environment Variable** value takes absolute precedence, ignoring any value in the database.",
"",
"**Priority for Locked Settings**: Environment Variable > Database (Ignored) > Default (Ignored)",
"",
"",
]
# Env-only section with expanded columns
if env_only_settings:
content.extend(
[
"## Pre-Database (Env-Only) Settings",
"",
"These settings are **required before database initialization** and can only be set via environment variables.",
"They are not available in the Web UI because they are needed to start the application.",
"",
"| Environment Variable | Type | Default | Required | Constraints | Description | Category | Deprecated Alias |",
"|----------------------|------|---------|----------|-------------|-------------|----------|------------------|",
]
)
for setting in sorted(env_only_settings, key=lambda x: x["env_var"]):
env_var = setting["env_var"]
stype = setting["type"]
default = setting["default"]
required = "Yes" if setting.get("required") else "No"
constraints = _format_constraints(setting).replace("|", "\\|")
desc = setting["description"].replace("|", "\\|").replace("\n", " ")
category = setting["category"]
deprecated = setting.get("deprecated_env_var") or ""
row = (
f"| `{env_var}` | {stype} | `{default}` | {required} "
f"| {constraints} | {desc} | {category} | {deprecated} |"
)
content.append(row)
content.extend(["", ""])
# Main settings list
content.extend(
[
"## Settings List",
"",
"| Key | Environment Variable | Default Value | Description | Type |",
"|-----|----------------------|---------------|-------------|------|",
]
)
for key in sorted_keys:
setting = settings[key]
env_var = setting.get("env_var") or get_env_var_name(key)
default_val = format_value(setting.get("value"))
description = (
setting.get("description", "")
.replace("\n", " ")
.replace("|", "\\|")
)
setting_type = setting.get("type", "UNKNOWN")
row = f"| `{key}` | `{env_var}` | `{default_val}` | {description} | {setting_type} |"
content.append(row)
content.append("")
content.append("*Generated by scripts/generate_config_docs.py*")
return "\n".join(content) + "\n"
def generate_docs(
output_path: Optional[Path] = None,
check: bool = False,
) -> int:
"""Generate (or check) CONFIGURATION.md.
Returns 0 on success, 1 if check finds stale docs.
"""
root_dir = get_project_root()
output_file = output_path or (root_dir / "docs" / "CONFIGURATION.md")
new_content = generate_docs_content(root_dir)
if check:
if not output_file.exists():
print(
f"FAIL: {output_file} does not exist. "
"Run 'python scripts/generate_config_docs.py' to generate it."
)
return 1
existing = output_file.read_text()
if existing == new_content:
print("OK: Configuration docs are up to date.")
return 0
else:
print(
f"FAIL: {output_file} is out of date. "
"Run 'python scripts/generate_config_docs.py' to regenerate it."
)
return 1
output_file.parent.mkdir(parents=True, exist_ok=True)
output_file.write_text(new_content)
print(f"Wrote {output_file}")
return 0
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Generate CONFIGURATION.md from defaults"
)
parser.add_argument(
"--output",
"-o",
type=Path,
help="Output file path (default: docs/CONFIGURATION.md)",
)
parser.add_argument(
"--check",
action="store_true",
help="Check if docs are up to date (exit 1 if stale)",
)
args = parser.parse_args()
sys.exit(generate_docs(output_path=args.output, check=args.check))

View File

@@ -0,0 +1,339 @@
"""
Tests for scripts/generate_config_docs.py
Validates that the configuration documentation generator correctly:
- Converts setting keys to env var names
- Formats default values for markdown
- Auto-discovers all env_definitions modules
- Extracts all fields from Setting constructors
- Generates valid markdown covering all JSON files and env_definitions
- Detects stale documentation via --check mode
- Handles errors gracefully
"""
import json
import sys
from pathlib import Path
# Ensure the scripts directory is importable
REPO_ROOT = Path(__file__).resolve().parent.parent.parent
SCRIPTS_DIR = REPO_ROOT / "scripts"
sys.path.insert(0, str(SCRIPTS_DIR))
from generate_config_docs import ( # noqa: E402
format_value,
generate_docs,
generate_docs_content,
get_env_only_settings,
get_env_var_name,
)
# ── Paths ──────────────────────────────────────────────────────────────
DEFAULTS_DIR = REPO_ROOT / "src" / "local_deep_research" / "defaults"
ENV_DEFS_DIR = (
REPO_ROOT / "src" / "local_deep_research" / "settings" / "env_definitions"
)
OUTPUT_FILE = REPO_ROOT / "docs" / "CONFIGURATION.md"
# Expected env_definitions modules (excluding __init__.py and env_settings.py)
EXPECTED_ENV_MODULES = {
"bootstrap",
"db_config",
"news_scheduler",
"security",
"testing",
}
# ═══════════════════════════════════════════════════════════════════════
# Unit tests
# ═══════════════════════════════════════════════════════════════════════
class TestGetEnvVarName:
"""Test dotted keys -> LDR_* env var conversion."""
def test_simple_key(self):
assert get_env_var_name("app.debug") == "LDR_APP_DEBUG"
def test_nested_key(self):
assert get_env_var_name("llm.openai.model") == "LDR_LLM_OPENAI_MODEL"
def test_already_uppercase(self):
assert get_env_var_name("DB.HOST") == "LDR_DB_HOST"
def test_single_segment(self):
assert get_env_var_name("port") == "LDR_PORT"
class TestFormatValue:
"""Test value formatting for markdown table cells."""
def test_none(self):
assert format_value(None) == "null"
def test_bool_true(self):
assert format_value(True) == "true"
def test_bool_false(self):
assert format_value(False) == "false"
def test_dict(self):
result = format_value({"a": 1})
assert result.startswith("`")
assert '"a"' in result
def test_list(self):
result = format_value([1, 2])
assert result.startswith("`")
assert "[1, 2]" in result
def test_string(self):
assert format_value("hello") == "hello"
def test_int(self):
assert format_value(42) == "42"
class TestGetEnvOnlySettings:
"""Test env_definitions AST extraction."""
def test_returns_all_modules(self):
"""All 5 env_definitions modules should be represented."""
settings = get_env_only_settings()
categories = {s["category"].lower().replace(" ", "_") for s in settings}
for module in EXPECTED_ENV_MODULES:
# Category is derived from filename: db_config.py -> "Db Config" -> "db_config"
assert module in categories, (
f"Module '{module}' not found in categories: {categories}"
)
def test_extracts_all_fields(self):
"""Each setting should have the core fields populated."""
settings = get_env_only_settings()
assert len(settings) > 0
required_fields = {
"key",
"env_var",
"description",
"default",
"type",
"required",
"min_value",
"max_value",
"allowed_values",
"deprecated_env_var",
}
for s in settings:
missing = required_fields - set(s.keys())
assert not missing, (
f"Setting '{s.get('key', '?')}' is missing fields: {missing}"
)
def test_bootstrap_encryption_key_present(self):
"""The bootstrap encryption key should be discovered."""
settings = get_env_only_settings()
keys = {s["key"] for s in settings}
assert "bootstrap.encryption_key" in keys
def test_db_config_has_constraints(self):
"""db_config settings should have min/max values extracted."""
settings = get_env_only_settings()
cache_size = next(
(s for s in settings if s["key"] == "db_config.cache_size_mb"),
None,
)
assert cache_size is not None, "db_config.cache_size_mb not found"
assert cache_size["min_value"] == 1
assert cache_size["max_value"] == 10000
def test_db_config_has_allowed_values(self):
"""Enum settings should have allowed_values extracted."""
settings = get_env_only_settings()
journal = next(
(s for s in settings if s["key"] == "db_config.journal_mode"),
None,
)
assert journal is not None, "db_config.journal_mode not found"
assert journal["allowed_values"] is not None
assert "WAL" in journal["allowed_values"]
def test_deprecated_env_var_extracted(self):
"""Settings with deprecated_env_var should have it extracted."""
settings = get_env_only_settings()
cache_size = next(
(s for s in settings if s["key"] == "db_config.cache_size_mb"),
None,
)
assert cache_size is not None
assert cache_size["deprecated_env_var"] == "LDR_DB_CACHE_SIZE_MB"
# ═══════════════════════════════════════════════════════════════════════
# Integration tests
# ═══════════════════════════════════════════════════════════════════════
class TestGenerateDocsIntegration:
"""Test the full generation pipeline."""
def test_produces_valid_markdown(self):
"""Generated content should have the expected markdown structure."""
content = generate_docs_content()
assert content.startswith("# Configuration Reference\n")
assert "## Pre-Database (Env-Only) Settings" in content
assert "## Settings List" in content
assert "| Key | Environment Variable |" in content
assert "*Generated by scripts/generate_config_docs.py*" in content
def test_includes_all_json_files(self):
"""All JSON config files should contribute settings to the output."""
json_files = sorted(DEFAULTS_DIR.rglob("*.json"))
assert len(json_files) >= 18, (
f"Expected at least 18 JSON files, found {len(json_files)}"
)
content = generate_docs_content()
# Collect all keys from all JSON files
all_json_keys = set()
for jf in json_files:
try:
data = json.loads(jf.read_text())
all_json_keys.update(data.keys())
except Exception:
continue
# Every key should appear in the generated doc
for key in all_json_keys:
assert f"`{key}`" in content, (
f"Setting key '{key}' not found in generated docs"
)
def test_includes_all_env_definitions(self):
"""All env_definitions modules should be represented."""
content = generate_docs_content()
settings = get_env_only_settings()
for s in settings:
assert s["env_var"] in content, (
f"Env var '{s['env_var']}' not found in generated docs"
)
def test_no_env_var_collisions(self):
"""No duplicate env var names should exist across all settings."""
settings = get_env_only_settings()
env_vars = [s["env_var"] for s in settings]
duplicates = {v for v in env_vars if env_vars.count(v) > 1}
assert not duplicates, f"Duplicate env var names found: {duplicates}"
def test_env_only_table_has_expanded_columns(self):
"""The env-only table should include Type, Required, Constraints columns."""
content = generate_docs_content()
header_line = None
for line in content.split("\n"):
if (
"Environment Variable" in line
and "Type" in line
and "Required" in line
):
header_line = line
break
assert header_line is not None, (
"Could not find expanded env-only table header"
)
assert "Constraints" in header_line
assert "Deprecated Alias" in header_line
# ═══════════════════════════════════════════════════════════════════════
# Check mode tests
# ═══════════════════════════════════════════════════════════════════════
class TestCheckMode:
"""Test the --check flag behaviour."""
def test_check_mode_passes_when_fresh(self, tmp_path):
"""Exit 0 when docs match the generated output."""
out = tmp_path / "CONFIGURATION.md"
# Generate fresh
assert generate_docs(output_path=out, check=False) == 0
# Check should pass
assert generate_docs(output_path=out, check=True) == 0
def test_check_mode_fails_when_stale(self, tmp_path):
"""Exit 1 when docs differ from the generated output."""
out = tmp_path / "CONFIGURATION.md"
# Generate, then tamper
generate_docs(output_path=out, check=False)
out.write_text("stale content\n")
assert generate_docs(output_path=out, check=True) == 1
def test_check_mode_fails_when_missing(self, tmp_path):
"""Exit 1 when the docs file doesn't exist."""
out = tmp_path / "CONFIGURATION.md"
assert generate_docs(output_path=out, check=True) == 1
# ═══════════════════════════════════════════════════════════════════════
# Error handling tests
# ═══════════════════════════════════════════════════════════════════════
class TestErrorHandling:
"""Test graceful handling of bad inputs."""
def test_handles_missing_json_gracefully(self, tmp_path):
"""Corrupted/missing JSON in defaults dir doesn't crash generation."""
# Create a minimal project structure with a bad JSON file
defaults = tmp_path / "src" / "local_deep_research" / "defaults"
defaults.mkdir(parents=True)
(defaults / "bad.json").write_text("{invalid json")
(defaults / "good.json").write_text(
json.dumps(
{
"test.key": {
"value": "hello",
"description": "A test",
"type": "STRING",
}
}
)
)
# Also create empty env_definitions dir
env_defs = (
tmp_path
/ "src"
/ "local_deep_research"
/ "settings"
/ "env_definitions"
)
env_defs.mkdir(parents=True)
content = generate_docs_content(root_dir=tmp_path)
# Good file should still be included
assert "`test.key`" in content
def test_handles_ast_parse_errors_gracefully(self, tmp_path):
"""Bad Python in env_definitions doesn't crash generation."""
env_defs = (
tmp_path
/ "src"
/ "local_deep_research"
/ "settings"
/ "env_definitions"
)
env_defs.mkdir(parents=True)
(env_defs / "broken.py").write_text("def incomplete(")
defaults = tmp_path / "src" / "local_deep_research" / "defaults"
defaults.mkdir(parents=True)
# Should not raise
settings = get_env_only_settings(root_dir=tmp_path)
assert isinstance(settings, list)