TL;DR
ES|QL LOOKUP JOIN (GA in Elasticsearch 8.17+) brings joins to Elasticsearch, but with major constraints: 2 billion row limit on lookup indices, single-field equality joins only, no wildcards or dynamic index references, no cross-cluster use after aggregations, memory circuit breaker failures under load, and Elasticsearch-only (no external database joins). For complex or cross-source joins, use data federation platforms instead.
Table of Contents
Introduction
Elastic finally added joins to Elasticsearch. The ES|QL LOOKUP JOIN feature, released in late 2024, lets you join data across indices within Elasticsearch.
This is genuinely useful progress. But before you redesign your data architecture around LOOKUP JOIN, understand its documented limitations—some of which may be dealbreakers for your use case.
What ES|QL LOOKUP JOIN Does
LOOKUP JOIN enables enriching query results with data from a second index, similar to a SQL LEFT JOIN:
FROM logs-*
| WHERE @timestamp > NOW() - 1 DAY
| LOOKUP JOIN customers ON customer_id
| STATS count = COUNT(*) BY customer_name, plan_tier This query:
1. Queries the logs-* indices
2. Joins each row with the customers index on customer_id
3. Brings in customer_name and plan_tier from the lookup index
4. Aggregates results
For teams who’ve wanted joins in Elasticsearch, this is a significant step forward.
ES|QL LOOKUP JOIN Version Compatibility
LOOKUP JOIN isn’t available in all Elasticsearch versions. Here’s what’s supported where:
| Elasticsearch Version | LOOKUP JOIN Support | Cross-Cluster LOOKUP JOIN | Notes |
|---|---|---|---|
| 8.x and earlier | Not available | Not available | Use ENRICH policies for enrichment |
| 8.16 (Preview) | Technical preview | Not available | Initial release, not production-ready |
| 8.17 | Generally available | Not available | Single-cluster only |
| 9.0 – 9.1 | Generally available | Not available | Stable, single-cluster joins |
| 9.2.0+ | Generally available | Supported (with restrictions) | Cannot use after STATS, SORT, LIMIT |
Key version notes:
- If you’re on Elasticsearch 8.x below 8.16, LOOKUP JOIN is unavailable, use ENRICH policies or external federation instead
- The 8.16 technical preview has different syntax and behavior than the GA release – don’t rely on preview-era examples
- Cross-cluster LOOKUP JOIN (9.2.0+) has additional constraints covered below
The Documented Limitations
According to Elastic’s official documentation and Elasticsearch Labs, LOOKUP JOIN has specific constraints:
Limitation 1: Lookup Index Size Cap
The constraint: Lookup indices are limited to a single shard and 2 billion documents maximum (a Lucene limitation for one shard).
From the docs: > “Indices in lookup mode are always single-sharded. Lookup indices are limited to 2 billion documents, a Lucene limit for one shard.”
Impact:
- Your lookup index cannot exceed 2 billion rows
- For high-cardinality reference data, this may be insufficient
- No horizontal scaling of lookup indices
Workaround: None within ES|QL. If your reference data exceeds this limit, you need a different approach.
Limitation 2: Single Match Field Only
The constraint: LOOKUP JOIN only supports joining on a single field.
From the docs: > “LOOKUP JOIN can only use a single match field and a single index.”
What this means:
-- Supported: single field join
| LOOKUP JOIN customers ON customer_id
-- NOT supported: composite key join
| LOOKUP JOIN orders ON customer_id AND order_date
-- NOT supported: multiple lookup indices
| LOOKUP JOIN customers ON customer_id
| LOOKUP JOIN products ON product_id -- Can chain, but each is limited Impact:
- Composite key joins require workarounds
- Cannot join on multiple fields simultaneously
- Complex data models may not translate directly
Limitation 3: No Wildcards or Dynamic Index References
The constraint: You must specify exact index names, no wildcards, aliases, date math, or data streams.
From the docs: > “Only specific index names or aliases are supported; wildcards and remote cluster prefixes are not supported.”
What fails:
-- These don't work for lookup indices
| LOOKUP JOIN logs-* -- Wildcard: not supported
| LOOKUP JOIN logs-2024.* -- Date pattern: not supported
| LOOKUP JOIN @timestamp -- Dynamic: not supported Impact:
- Cannot join against time-partitioned lookup indices
- Must know exact index names at query time
- Rolling indices require query updates
Limitation 4: Cross-Cluster Limitations
The constraint: Cross-cluster LOOKUP JOIN has additional restrictions and version requirements.
From the docs: > “Cross cluster search is unsupported in versions prior to 9.2.0. Additionally, cross-cluster LOOKUP JOIN can not be used after aggregations (STATS), SORT and LIMIT commands, and coordinator-side ENRICH commands.”
What this means:
-- Cross-cluster LOOKUP JOIN restrictions
FROM remote_cluster:logs-*
| STATS count = COUNT(*) BY customer_id
| LOOKUP JOIN customers ON customer_id -- Fails: can't use after STATS
-- Must restructure query
FROM remote_cluster:logs-*
| LOOKUP JOIN customers ON customer_id -- Before aggregation
| STATS count = COUNT(*) BY customer_name Impact:
- Multi-cluster architectures have significant constraints
- Query order matters and limits flexibility
- Aggregations and joins don’t compose freely
Limitation 5: Equality Matching Only
The constraint: Joins only support exact equality matches.
From the docs: > “Only matching on equality is supported.”
What’s not supported:
-- NOT supported: range joins
| LOOKUP JOIN time_ranges
WHERE event_time BETWEEN start_time AND end_time
-- NOT supported: fuzzy matching
| LOOKUP JOIN products
WHERE SIMILARITY(product_name, search_term) > 0.8
-- NOT supported: inequality joins
| LOOKUP JOIN pricing WHERE tier >= customer_tier Impact:
- Time-window joins (common in event correlation) aren’t possible
- Approximate matching requires preprocessing
- Band joins for ranges need workarounds
Limitation 6: Output Ordering Not Guaranteed
The constraint: LOOKUP JOIN may not preserve row ordering.
From the docs: > “The output rows produced by LOOKUP JOIN can be in any order and may not respect preceding SORTs. To guarantee a certain ordering, place a SORT after any LOOKUP JOINs.”
What this means:
-- Ordering not preserved
FROM logs-*
| SORT @timestamp DESC
| LOOKUP JOIN customers ON customer_id -- May scramble order
| LIMIT 100 -- Wrong 100 rows?
-- Correct approach
FROM logs-*
| LOOKUP JOIN customers ON customer_id
| SORT @timestamp DESC -- Sort AFTER join
| LIMIT 100 Impact:
- Existing ES|QL queries may produce unexpected results when adding LOOKUP JOIN
- Must carefully consider query structure
- Pagination behavior changes
Limitation 7: Memory and Document Size Constraints
The constraint: Large matches or large documents can cause circuit breaker failures.
From the docs: > “The query will circuit break if there are too many matching documents in the lookup index or if documents are too large.”
“LOOKUP JOIN works in batches of normally about 10,000 rows and requires large amounts of heap space if matching documents are multiple megabytes or larger.”
Impact:
- High-cardinality joins may fail
- Large reference documents are problematic
- Must monitor and tune memory settings
- Unpredictable failures under load
What LOOKUP JOIN Can’t Replace
Given these limitations, LOOKUP JOIN doesn’t solve several common scenarios:
1. Joining Elasticsearch with External Databases
LOOKUP JOIN is Elasticsearch-to-Elasticsearch only. It cannot:
-- NOT possible with LOOKUP JOIN
FROM elasticsearch.logs
| JOIN mysql.customers ON customer_id
-- Also NOT possible
FROM elasticsearch.events
| JOIN postgresql.accounts ON account_id For cross-system joins, you need data federation or ETL.
2. Complex Multi-Table Joins
While you can chain LOOKUP JOINs, each has the single-field limitation:
-- Technically possible but increasingly constrained
FROM orders
| LOOKUP JOIN customers ON customer_id
| LOOKUP JOIN products ON product_id
| LOOKUP JOIN regions ON region_id
-- Each join adds memory pressure and latency
-- Still single-field equality only For complex analytical queries, purpose-built analytics platforms handle multi-source joins more flexibly.
3. Real-Time Enrichment at Scale
The memory constraints and circuit breaker behavior make LOOKUP JOIN risky for:
- High-throughput streaming scenarios
- Large lookup tables (approaching 2B limit)
- Documents with large nested structures
4. Time-Window Correlations
Security and observability use cases often need:
-- "Find all events within 5 minutes of login failure"
-- Cannot express in LOOKUP JOIN (no range predicates) These require either:
- Pre-computed enrichments via ENRICH policies
- External correlation engines
- Post-query processing
Alternatives for Complex Join Scenarios
Option 1: ENRICH Processor (Pre-Computed Joins)
For relatively static reference data, ENRICH policies pre-compute joins at ingest time:
PUT /_enrich/policy/customer-policy
{
"match": {
"indices": "customers",
"match_field": "customer_id",
"enrich_fields": ["customer_name", "plan_tier"]
}
} Trade-offs: – Data baked into documents at ingest – Stale when reference data changes – No dynamic filtering
Option 2: Data Federation Platforms
Platforms like Knowi join Elasticsearch with external sources at query time:
-- Cross-source federation
SELECT
es.event_type,
mysql.customer_name,
pg.account_balance
FROM elasticsearch.events es
JOIN mysql.customers mysql ON es.customer_id = mysql.id
JOIN postgresql.accounts pg ON mysql.account_id = pg.id Advantages:
- No Elasticsearch limitations
- Join any data source
- Real-time, no pre-computation
See How to Join Elasticsearch with Other Datasources for implementation details.
Option 3: Denormalization at Ingest
Embed related data directly in documents during ingestion:
{
"event_type": "purchase",
"customer": {
"id": "cust_123",
"name": "Acme Corp",
"tier": "enterprise"
},
"product": {
"id": "prod_456",
"name": "Widget Pro"
}
} Trade-offs:
- Larger documents
- Updates require re-indexing
- Works well for read-heavy, slowly-changing data
When to Use LOOKUP JOIN
LOOKUP JOIN works well when:
- Small lookup tables: Reference data well under 2B rows
- Single-field joins: Simple equality matching sufficient
- Elasticsearch-only: No need to join external databases
- Moderate scale: Memory constraints aren’t limiting
- Static references: Lookup data doesn’t change frequently
Good use case example:
-- Enrich log events with error code descriptions
FROM application_logs
| WHERE level = 'ERROR'
| LOOKUP JOIN error_codes ON error_code
| STATS count = COUNT(*) BY error_description Here, error_codes is small, static, and single-field joinable.
The Bottom Line
ES|QL LOOKUP JOIN is a welcome addition to Elasticsearch, enabling enrichment scenarios that previously required ENRICH policies or external processing.
But it’s not a general-purpose join solution:
- 2 billion row limit on lookup indices
- Single-field equality only
- No cross-cluster with aggregations
- Memory constraints can cause failures
- Elasticsearch-only—no external database joins
For complex analytical joins, multi-source queries, or high-scale enrichment, evaluate purpose-built analytics platforms that handle joins without these constraints.
Frequently Asked Questions
What is ES|QL LOOKUP JOIN in Elasticsearch?
ES|QL LOOKUP JOIN is a feature released in late 2024 that lets you enrich query results with data from a second Elasticsearch index, similar to a SQL LEFT JOIN. You specify a main query and a lookup index, and LOOKUP JOIN brings in additional fields based on a matching key. It works only within Elasticsearch, not across external databases.
What is the maximum size for a LOOKUP JOIN index?
Lookup indices are limited to a single shard with a maximum of 2 billion documents (a Lucene per-shard limitation). There’s no way to horizontally scale a lookup index across multiple shards. If your reference data exceeds this limit, you need an alternative approach like data federation or denormalization.
Can ES|QL LOOKUP JOIN join on multiple fields?
No. LOOKUP JOIN only supports joining on a single field with exact equality matching. Composite key joins (joining on two or more fields simultaneously), range joins, and fuzzy matching are not supported. If you need to join on a composite key, you’d need to create a combined field in both indices as a workaround.
Does ES|QL LOOKUP JOIN work across clusters?
Cross-cluster LOOKUP JOIN is supported starting in Elasticsearch 9.2.0, but with additional restrictions. You cannot use cross-cluster LOOKUP JOIN after aggregations (STATS), SORT, LIMIT commands, or coordinator-side ENRICH commands. This limits query flexibility in multi-cluster architectures.
Can LOOKUP JOIN replace ETL for enriching Elasticsearch data?
For simple, small-scale enrichment within Elasticsearch, yes. But LOOKUP JOIN has constraints that make it unsuitable as a general ETL replacement: the 2B row limit on lookup indices, single-field joins only, equality matching only, memory constraints that can cause circuit breaker failures, and no ability to join with external databases like MySQL or PostgreSQL.
Why does LOOKUP JOIN not preserve row ordering?
By design, LOOKUP JOIN may return rows in any order regardless of preceding SORT commands. If you need ordered results, you must place SORT after the LOOKUP JOIN, not before. This is a documented behavior that can produce unexpected results when adding LOOKUP JOIN to existing ES|QL queries.
Next Steps
Need joins beyond LOOKUP JOIN’s capabilities?
- Learn: How to Join Elasticsearch with Other Datasources
- Compare: Elasticsearch vs MySQL
- Explore: Native Elasticsearch Analytics
Request a Demo → Meet pur team to see live how you can go beyond the limitations of Elasticsearch joins.