Files
local-deep-research/tests/database/test_cache_models.py
LearningCircuit 12160e26e1 chore(lint): add ruff rules for logging, performance, exceptions, and print detection (#3211)
* chore(lint): add ruff rules for logging, performance, exceptions, and print detection

Add wave 2 lint rules: G, PERF, RET, TRY, T20, C4, ERA. All existing
violations are suppressed via ignore/per-file-ignores so this config
change is merge-safe. Follow-up PRs will fix violations and remove the
ignore entries incrementally.

* fix(lint): exempt pre-commit hooks from T201 print rule (#3270)

Pre-commit hooks are CLI scripts where print is the intended output
interface, same as scripts/ and cli/ directories already exempted.

* fix(lint): fix all low-count ruff violations instead of suppressing them (#3275)

* fix(lint): replace manual dict-building loops with dict comprehensions (PERF403)

* fix(lint): replace bare Exception raises with specific built-in types (TRY002)

Replace all `raise Exception(...)` in production code with appropriate
built-in exception types: RuntimeError for operational/state failures,
ValueError for invalid data, and ConnectionError for HTTP errors.

* fix(lint): resolve TRY004 and PERF402 ruff violations

Use TypeError instead of ValueError for isinstance/issubclass type
checks (TRY004), and replace manual for-loop list copies with
list.extend() (PERF402).

* fix(lint): fix all low-count ruff violations instead of suppressing them

Fix all violations for 15 ruff rules that had ≤10 occurrences each,
rather than suppressing them with ignore directives:

- TRY002: raise-vanilla-class → use specific built-in exceptions
- TRY004: type-check-without-type-error → use TypeError
- C408: unnecessary-collection-call → use dict/list literals
- C401: unnecessary-generator-set → use set comprehensions
- C416: unnecessary-comprehension → use list()/set()
- C414: unnecessary-double-cast-or-process → simplify
- PERF403: manual-dict-comprehension → use dict comprehensions
- PERF102: incorrect-dict-iterator → use .values()/.keys()
- PERF402: manual-list-copy → use list.extend()
- RET503/RET506/RET507/RET508: superfluous else after return/raise/continue/break
- RET501/RET502: unnecessary/implicit return None

Adds per-file-ignores for tests/ and examples/ where these patterns
are acceptable (e.g. bare Exception in tests, dict() calls in fixtures).

* fix(lint): enforce E722, ERA001, RET505 and fix pre-commit RET503 gap (#3276)

Remove three rules from the global ignore list by fixing all violations:

E722 (bare except) — 6 violations in tests:
  Replace `except:` with `except Exception:` to avoid swallowing
  KeyboardInterrupt and SystemExit.

ERA001 (commented-out code) — 25 violations:
  Delete 18 true positives (dead variables, disabled debug logs,
  commented-out imports). Add `# noqa: ERA001` to 7 false positives
  (template instructions, type annotations, documentation comments).

RET505 (superfluous else after return) — 413 violations:
  Auto-fix all occurrences. Also fixes 5 cascading RET506/RET507
  violations exposed by the RET505 removals.

Pre-commit hooks gap:
  Add RET503 to `.pre-commit-hooks/**` per-file-ignores alongside T201.

* fix(lint): enforce RET504 and TRY301 — fix all violations (#3279)

* fix(lint): enforce RET504 — collapse unnecessary assign-before-return

Auto-fix all 46 RET504 violations via ruff unsafe-fixes: collapse
`result = expr; return result` into `return expr`.

Remove RET504 from global ignore list. Add to tests/examples
per-file-ignores where intermediate variables aid test clarity.

Also removes TRY301 from global ignore (violations fixed in next commit).

* fix(lint): enforce TRY301 — fix raises inside broad try/except blocks

Structural fixes for 65 TRY301 violations:

Security-critical fixes:
- url_validator.py: move 6 validation raises before try block,
  replace isinstance-based re-raise with specific except clause
- path_validator.py: move validation outside try block
- env_settings.py: separate parsing (try) from validation (outside)

Route/service fixes:
- research_routes.py: replace raise-then-catch with direct error return
- mcp/server.py: move all 7 tool validations before try blocks
- news/api.py: move validation before try, noqa for db-session raises
- notifications: move rate limit and URL validation before try blocks
- iterative_refinement_strategy.py: move JSON validation after try

Added noqa for intentional patterns: re-raise in except handlers,
nested function definitions, db-session-dependent checks, rate limit
re-raises for base class retry logic.

* merge: resolve conflicts between wave2 lint branch and main

Resolve 14 merge conflicts by always starting from main's version
and re-applying lint fixes on top:

- mcp_strategy.py, ollama.py, security_settings.py, delete_routes.py:
  Take main's code, re-apply RET505 (remove else: after return)
- mcp/server.py (3 conflicts): Take main's ValidationError handlers
  and set_settings_context, re-apply TRY301 fixes, fix sensitive
  data logging
- research_routes.py: Take main, fix duplicate block (merge artifact)
- settings_routes.py: Take main's default-settings fallback feature
- meta_search_engine.py, parallel_search_engine.py: Take main's
  get_available_engines delegation, delete unreachable code
- search_engine_ddg.py, search_engine_google_pse.py: Take main's
  sanitization, re-apply RET506 (if not elif after raise)
- rag_routes.py: Accept main's deletion (route moved to delete_routes)
- encryption_check.py: Accept main's deletion (dead code)
- test_storage_coverage.py: Remove broken test classes referencing
  undefined stubs
- pre-commit hooks: extend per-file-ignores for ERA001, RET504

* fix: revert ValueError→TypeError changes that break tests and API contracts

Revert TRY004 fixes in 3 files where changing ValueError to TypeError
would break existing tests and HTTP status code contracts:

- card_factory.py: 5 tests assert pytest.raises(ValueError)
- base_rater.py: flask_api.py catches ValueError for HTTP 400 responses;
  TypeError would fall through to HTTP 500
- full_search.py: test asserts pytest.raises(ValueError)

Add # noqa: TRY004 to suppress the lint rule on these lines.

* fix: move benchmark_data check back inside try block

The ValueError for missing benchmark_data must be inside the try/except
so the except handler can mark the run as FAILED in the database.
Without this, the exception propagates unhandled in a daemon thread,
leaving the benchmark run stuck in RUNNING state permanently.

* chore(lint): remove ERA rule and suppress TRY004 globally

Remove ERA (eradicate — commented-out code detection) from ruff select:
- 28% false positive rate in our codebase (7 of 25 violations)
- No major Python project enables it (Django, FastAPI, Pydantic, Airflow)
- Ruff itself doesn't use it; autofix was demoted to manual-only
- 172 noqa suppressions provided zero enforcement value

Suppress TRY004 (type-check-without-type-error) globally:
- Ruff maintainer agreed the autofix "can change functionality"
- We already had to revert 3 TypeError changes that broke tests
  and HTTP 400→500 API contracts
- Django, Flask, pandas all use isinstance + ValueError routinely
- Pylint has no equivalent rule; near-zero PyPI adoption

Remove all 173 # noqa: ERA001 and 49 # noqa: TRY004 comments
from the codebase — no longer needed with rules disabled/suppressed.

* fix: resolve mypy errors, failing MCP test, and TRY301 noqa

- search_engine_factory.py: restore typed intermediate variable to fix
  mypy no-any-return (RET504 collapse lost the type annotation)
- search_engine_pubchem.py: add explicit list[str] type annotation
- test_edge_cases.py: fix assertion that expected engine name in
  sanitized error message
- mcp/server.py: add noqa: TRY301 to validation raises inside try
  blocks (from main's new merge code)
2026-03-29 17:01:23 +02:00

354 lines
11 KiB
Python

"""Tests for cache-related database models."""
import hashlib
import json
import time
from datetime import datetime, timedelta, timezone, UTC
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from local_deep_research.database.models import Base, Cache, SearchCache
class TestCacheModels:
"""Test suite for cache-related models."""
@pytest.fixture
def engine(self):
"""Create an in-memory SQLite database for testing."""
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
yield engine
engine.dispose()
@pytest.fixture
def session(self, engine):
"""Create a database session for testing."""
Session = sessionmaker(bind=engine)
session = Session()
yield session
session.close()
def test_cache_creation(self, session):
"""Test creating basic cache entries."""
cache_entry = Cache(
cache_key="llm_response_12345",
cache_text="This is the cached LLM response for the query about quantum physics.",
cache_type="llm_response",
source="openai",
ttl_seconds=86400, # 24 hours
expires_at=datetime.now(UTC) + timedelta(hours=24),
cache_value={
"model": "gpt-4",
"temperature": 0.7,
"query": "explain quantum entanglement",
},
size_bytes=1024,
)
session.add(cache_entry)
session.commit()
# Verify cache entry
saved = session.query(Cache).first()
assert saved is not None
assert saved.cache_key == "llm_response_12345"
assert "quantum physics" in saved.cache_text
assert saved.cache_type == "llm_response"
assert saved.cache_value["model"] == "gpt-4"
assert saved.hit_count == 0
assert saved.size_bytes == 1024
def test_cache_expiration(self, session):
"""Test cache expiration functionality."""
now = datetime.now(UTC)
# Create expired cache
expired = Cache(
cache_key="expired_cache",
cache_text="Old data",
cache_type="test",
expires_at=now - timedelta(hours=1),
)
# Create valid cache
valid = Cache(
cache_key="valid_cache",
cache_text="Fresh data",
cache_type="test",
expires_at=now + timedelta(hours=1),
)
# Create non-expiring cache
permanent = Cache(
cache_key="permanent_cache",
cache_text="Never expires",
cache_type="test",
expires_at=None,
)
session.add_all([expired, valid, permanent])
session.commit()
# Test is_expired method
assert expired.is_expired() is True
assert valid.is_expired() is False
assert permanent.is_expired() is False
# Query non-expired entries
non_expired = (
session.query(Cache)
.filter((Cache.expires_at.is_(None)) | (Cache.expires_at > now))
.all()
)
assert len(non_expired) == 2
keys = [c.cache_key for c in non_expired]
assert "valid_cache" in keys
assert "permanent_cache" in keys
def test_search_cache(self, session):
"""Test search-specific cache functionality."""
query = "quantum physics research"
query_hash = hashlib.sha256(query.encode()).hexdigest()
current_time = int(time.time())
search_cache = SearchCache(
query_hash=query_hash,
query_text=query,
results=json.dumps(
[
{
"title": "Quantum Mechanics",
"url": "https://example.com/qm",
},
{"title": "Physics Today", "url": "https://example.com/pt"},
]
),
created_at=current_time,
expires_at=current_time + 21600, # 6 hours
last_accessed=current_time,
access_count=1,
)
session.add(search_cache)
session.commit()
# Verify search cache
saved = (
session.query(SearchCache).filter_by(query_hash=query_hash).first()
)
assert saved is not None
assert saved.query_text == query
results = json.loads(saved.results)
assert len(results) == 2
assert saved.access_count == 1
def test_cache_categories(self, session):
"""Test different cache categories."""
categories = [
("llm_response", "AI generated content", "openai"),
("search_result", "Search engine results", "google"),
("api_response", "External API response", "external"),
("computation", "Expensive computation result", "local"),
]
for cache_type, value, source in categories:
cache = Cache(
cache_key=f"{cache_type}_test",
cache_text=value,
cache_type=cache_type,
source=source,
expires_at=datetime.now(UTC) + timedelta(hours=1),
)
session.add(cache)
session.commit()
# Query by category
llm_caches = (
session.query(Cache).filter_by(cache_type="llm_response").all()
)
assert len(llm_caches) == 1
assert llm_caches[0].cache_text == "AI generated content"
assert llm_caches[0].source == "openai"
def test_cache_hit_tracking(self, session):
"""Test cache hit counting and access time updates."""
cache = Cache(
cache_key="hit_test",
cache_text="Test content",
cache_type="test",
hit_count=0,
)
session.add(cache)
session.commit()
# Record multiple hits
original_accessed = cache.accessed_at
for i in range(5):
cache.record_hit()
session.commit()
assert cache.hit_count == 5
assert cache.accessed_at > original_accessed
def test_search_cache_deduplication(self, session):
"""Test that identical queries produce the same hash."""
query1 = "machine learning algorithms"
query2 = "machine learning algorithms" # Same query
query3 = "Machine Learning Algorithms" # Different case
hash1 = hashlib.sha256(query1.encode()).hexdigest()
hash2 = hashlib.sha256(query2.encode()).hexdigest()
hash3 = hashlib.sha256(query3.encode()).hexdigest()
assert hash1 == hash2
assert hash1 != hash3 # Different case produces different hash
def test_cache_size_management(self, session):
"""Test tracking cache entry sizes."""
large_text = "x" * 10000
small_text = "small"
large_cache = Cache(
cache_key="large_entry",
cache_text=large_text,
cache_type="test",
size_bytes=len(large_text.encode()),
)
small_cache = Cache(
cache_key="small_entry",
cache_text=small_text,
cache_type="test",
size_bytes=len(small_text.encode()),
)
session.add_all([large_cache, small_cache])
session.commit()
# Query total cache size - sum all sizes
from sqlalchemy import func
total_size = (
session.query(func.sum(Cache.size_bytes))
.filter(Cache.size_bytes.isnot(None))
.scalar()
or 0
)
assert large_cache.size_bytes > small_cache.size_bytes
assert total_size > 10000
def test_cache_metadata_usage(self, session):
"""Test storing and retrieving cache metadata."""
metadata = {
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 1000,
"timestamp": "2024-01-01T00:00:00Z",
}
cache = Cache(
cache_key="metadata_test",
cache_text="Response text",
cache_type="llm_response",
cache_value=metadata,
)
session.add(cache)
session.commit()
saved = session.query(Cache).first()
assert saved.cache_value == metadata
assert saved.cache_value["model"] == "gpt-4"
def test_search_cache_with_filters(self, session):
"""Test search cache with various filter parameters."""
current_time = int(time.time())
# Add multiple search caches
queries = [
("python tutorials", current_time - 3600), # 1 hour ago
("javascript frameworks", current_time - 7200), # 2 hours ago
("rust programming", current_time - 86400), # 1 day ago
]
for query, created in queries:
query_hash = hashlib.sha256(query.encode()).hexdigest()
cache = SearchCache(
query_hash=query_hash,
query_text=query,
results=json.dumps([{"title": f"Result for {query}"}]),
created_at=created,
expires_at=created + 86400, # 24 hour TTL
last_accessed=created,
access_count=1,
)
session.add(cache)
session.commit()
# Query recent caches (last 3 hours)
recent_threshold = current_time - 10800
recent_caches = (
session.query(SearchCache)
.filter(SearchCache.created_at >= recent_threshold)
.all()
)
assert len(recent_caches) == 2
def test_cache_cleanup_old_entries(self, session):
"""Test cleanup of expired cache entries."""
now = datetime.now(timezone.utc)
# Create caches with different expiration times
for i in range(10):
cache = Cache(
cache_key=f"cache_{i}",
cache_text=f"Content {i}",
cache_type="test",
expires_at=now - timedelta(hours=i), # Some expired, some not
)
session.add(cache)
session.commit()
# Delete expired entries
session.query(Cache).filter(Cache.expires_at < now).delete()
session.commit()
# Verify cleanup
remaining = session.query(Cache).count()
assert remaining == 1 # Only cache_0 should remain (expires_at = now)
def test_cache_update_operations(self, session):
"""Test updating cache entries."""
cache = Cache(
cache_key="update_test",
cache_text="Original content",
cache_type="test",
ttl_seconds=3600,
)
cache.set_ttl(3600) # Set TTL
session.add(cache)
session.commit()
# Update content
cache.cache_text = "Updated content"
cache.cache_value = {"version": 2}
session.commit()
# Verify updates
saved = session.query(Cache).filter_by(cache_key="update_test").first()
assert saved.cache_text == "Updated content"
assert saved.cache_value["version"] == 2
assert saved.ttl_seconds == 3600
assert saved.expires_at is not None