mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
- Fix missing newline at end of English translation - Add Chinese version as elasticsearch_search_engine.zh.md
4.5 KiB
4.5 KiB
Elasticsearch Search Engine
This document describes how to use and configure the Elasticsearch search engine within the Local Deep Research project.
Overview
The Elasticsearch search engine allows you to search for documents within Elasticsearch indices. This is particularly useful for scenarios requiring searches across large volumes of structured or unstructured text data.
Prerequisites
- A running Elasticsearch server (local or remote).
- Elasticsearch Python client library:
elasticsearch>=8.10.0.
Installation
Ensure that you have installed the elasticsearch Python package. If you are installing Local Deep Research from source, you can run:
pip install elasticsearch>=8.10.0
Basic Usage
Usage in Code
from local_deep_research.web_search_engines.engines.search_engine_elasticsearch import ElasticsearchSearchEngine
# Create search engine instance
es_search = ElasticsearchSearchEngine(
hosts=["http://localhost:9200"], # List of Elasticsearch hosts
index_name="my_index", # Name of the index to search
username="user", # Optional: Authentication username
password="pass", # Optional: Authentication password
max_results=10 # Maximum number of results to return
)
# Execute search
results = es_search.run("Your search query")
# Process results
for result in results:
print(f"Title: {result.get('title')}")
print(f"Snippet: {result.get('snippet')}")
print(f"Content: {result.get('content')}")
Advanced Search
The ES search engine supports various advanced search methods:
# Use Elasticsearch Query String syntax
results = es_search.search_by_query_string("title:keyword AND content:search_text")
# Use Elasticsearch DSL (Domain Specific Language)
results = es_search.search_by_dsl({
"query": {
"bool": {
"must": {"match": {"content": "search term"}},
"filter": {"term": {"category": "technology"}}
}
}
})
Configuration
Search Engine Parameters
| Parameter | Type | Default Value | Description |
|---|---|---|---|
| hosts | List[str] | ["http://localhost:9200"] | List of Elasticsearch server addresses |
| index_name | str | "documents" | Name of the index to search |
| username | Optional[str] | None | Authentication username |
| password | Optional[str] | None | Authentication password |
| api_key | Optional[str] | None | API Key authentication |
| cloud_id | Optional[str] | None | Elastic Cloud ID |
| max_results | int | 10 | Maximum number of results |
| highlight_fields | List[str] | ["content", "title"] | Fields to highlight |
| search_fields | List[str] | ["content", "title"] | Fields to search |
| filter_query | Optional[Dict] | None | Optional filter query |
| llm | Optional[BaseLLM] | None | LLM for relevance filtering |
| max_filtered_results | Optional[int] | None | Maximum number of filtered results |
Indexing Data
For ease of use, we provide the ElasticsearchManager utility class to help index data:
from local_deep_research.utilities.es_utils import ElasticsearchManager
# Create ES Manager
es_manager = ElasticsearchManager(
hosts=["http://localhost:9200"]
)
# Create index
es_manager.create_index("my_index")
# Index a single document
es_manager.index_document(
index_name="my_index",
document={
"title": "Document Title",
"content": "Document content...",
}
)
# Bulk index documents
documents = [
{"title": "Document 1", "content": "Content 1"},
{"title": "Document 2", "content": "Content 2"}
]
es_manager.bulk_index_documents("my_index", documents)
# Index file (automatically extract content)
es_manager.index_file("my_index", "path/to/document.pdf")
# Index all files in a directory
es_manager.index_directory(
"my_index",
"path/to/docs",
file_patterns=["*.pdf", "*.docx", "*.txt"]
)
Examples
Refer to examples/elasticsearch_search_example.py for a complete usage example.
Running Examples
Ensure Elasticsearch is running, then execute:
python examples/elasticsearch_search_example.py
Troubleshooting
Cannot connect to Elasticsearch
- Ensure the Elasticsearch server is running.
- Verify the host address and port.
- Verify authentication credentials (if required).
- Check network connection and firewall settings.
Empty Search Results
- Ensure the index exists and contains data.
- Check if the search fields are correct.
- Try using a simpler query.
- Check Elasticsearch logs for more information.