fix: set search context in scheduler so rate limiting works (#3289)

* fix: set search context in scheduler so rate limiting works The document scheduler ran downloads without setting search context, causing the rate limiter to disable itself (returning 0.0 wait time). This meant scheduler-initiated downloads had no throttling, potentially flooding target servers. Set search context with username and password before creating DownloadService. Cleanup is handled by the existing @thread_cleanup decorator on _process_user_documents(). * fix: set search context before both download_pdfs and extract_text blocks The context was only set inside the extract_text block, but download_pdfs also creates a DownloadService that needs rate limiting. Move the set_search_context call to before both blocks. * Merge branch 'main' into fix/scheduler-search-context-rate-limiting --------- Co-authored-by: Daniel Petti <djpetti@gmail.com>
2026-06-16 03:51:07 +03:00 · 2026-04-19 12:29:48 +02:00
parent 42fc75f61d
commit 1a0d46e69c
1 changed files with 15 additions and 0 deletions
--- a/src/local_deep_research/scheduler/background.py
+++ b/src/local_deep_research/scheduler/background.py
@@ -827,6 +827,21 @@ class BackgroundJobScheduler:
                            f"[DOC_SCHEDULER] Processing research {research.id} for user {username}"
                        )

+                        # Set search context so rate limiting works in both
+                        # download_pdfs and extract_text paths
+                        from ...utilities.thread_context import (
+                            set_search_context,
+                        )
+
+                        set_search_context(
+                            {
+                                "research_id": str(research.id),
+                                "username": username,
+                                "user_password": password,
+                                "research_phase": "document_scheduler",
+                            }
+                        )
+
                        # Call actual processing APIs
                        if settings.download_pdfs:
                            logger.info(