Files
local-deep-research/tests/advanced_search_system
LearningCircuit b6a9ba026a fix(synthesis): strip <think> in synthesize_findings + knowledge gen; guard empty extractor answers (#4336)
Post-merge review of the #4334 citation refactor found it only half-fixed the
<think> leak it targeted, and introduced a small regression:

1. The MAIN synthesis path was never routed through get_llm_response_text, so
   reasoning models still leaked <think>...</think> into the user's final answer:
   - FindingsRepository.synthesize_findings (both timeout paths) returned raw
     response.content; its output is the final answer (current_knowledge) for the
     standard/iterdrag strategies and flows verbatim through format_findings.
   - StandardKnowledge.generate_knowledge / generate_sub_knowledge / compress_knowledge
     same raw .content.
   Route all four through get_llm_response_text (strips <think>, handles str/None).

2. Empty-answer regression from #4334: when _invoke_text returns '' (LLM None or a
   think-only response that strips to ''), several precision extractors and the forced
   _extract_direct_answer emitted '. <content>'. Guard each to fall back to its existing
   non-LLM path (first score/year/number) or to content.

Tests: +think-strip regression tests for synthesize_findings + standard_knowledge;
+empty-answer fallback tests for score/temporal/number + forced extractor; updated the
one test that asserted the old '. content' behavior.
mypy: 552 files clean. 1739 passed / 2 skipped across the 62 test files touching these
modules + tests/citation_handlers.
2026-05-25 16:37:18 +02:00
..