fix: set search context in scheduler so rate limiting works (#3289)

* fix: set search context in scheduler so rate limiting works

The document scheduler ran downloads without setting search context,
causing the rate limiter to disable itself (returning 0.0 wait time).
This meant scheduler-initiated downloads had no throttling, potentially
flooding target servers.

Set search context with username and password before creating
DownloadService. Cleanup is handled by the existing @thread_cleanup
decorator on _process_user_documents().

* fix: set search context before both download_pdfs and extract_text blocks

The context was only set inside the extract_text block, but download_pdfs
also creates a DownloadService that needs rate limiting. Move the
set_search_context call to before both blocks.

* Merge branch 'main' into fix/scheduler-search-context-rate-limiting

---------

Co-authored-by: Daniel Petti <djpetti@gmail.com>
This commit is contained in:
LearningCircuit
2026-04-19 12:29:48 +02:00
committed by GitHub
parent 42fc75f61d
commit 1a0d46e69c

View File

@@ -827,6 +827,21 @@ class BackgroundJobScheduler:
f"[DOC_SCHEDULER] Processing research {research.id} for user {username}"
)
# Set search context so rate limiting works in both
# download_pdfs and extract_text paths
from ...utilities.thread_context import (
set_search_context,
)
set_search_context(
{
"research_id": str(research.id),
"username": username,
"user_password": password,
"research_phase": "document_scheduler",
}
)
# Call actual processing APIs
if settings.download_pdfs:
logger.info(