Elasticsearch Aggregations Guide: Analytics and Reporting Queries

Q: What’s the difference between bucket and metric aggregations?

Bucket aggs group documents (by term, time, range, filters, composite). Metric aggs compute numbers inside those groups (sum, avg, min/max, percentiles, cardinality).

Q: Can Elasticsearch aggregations replace SQL analytics?

For many dashboards and time-series analyses, yes. For multi-table joins, complex window functions, and ad-hoc SQL across many sources, you’ll likely combine ES with a BI/ETL layer or denormalize data upfront.

Q: How do I aggregate across multiple indices?

Use index patterns (e.g., logs-*) when the schema matches. True joins across unrelated indices aren’t native-use denormalization, parent-child/nested models (with trade-offs), or a BI layer that joins before/after the ES query.

Q: How do I speed up slow aggregations?

Narrow the time window or add pre-filters.Prefer keyword/numeric/date fields for group-bys (not text).Tune size/shard_size (or switch to composite for huge group-bys).Use filter context (cacheable), sensible date_histogram intervals, and avoid heavy runtime scripts. Watch cardinality-use approximate cardinality metrics and pre-aggregate where possible.

Q: How do I get “Top N” accurately?

Use a terms agg with order by your metric and reasonable size (e.g., 10–50).If results seem off (skewed shards), increase shard_size or prefilter. For full lists or stable paging, use composite.

Q: What’s the best way to do anomaly detection?

Build a date_histogram, compute your metric per bucket, then add pipeline aggs:moving_fn (rolling average)derivative (rate of change)bucket_selector (flag outliers)Visualize actual vs moving average and alert on deviations.

Sanskriti Garg

TL;DR

Aggregations turn Elasticsearch from a search engine into a real-time analytics engine.

Use bucket aggs to group data
metric aggs to compute numbers,
pipeline aggs to transform results (moving averages, derivatives), and
matrix aggs for correlations.

Combine with Kibana or a BI layer (like Knowi) to build dashboards, alerts, and embedded analytics.

In this guide you’ll learn:

What aggregations are and how they compare to SQL.
The main aggregation types – bucket, metric, pipeline, and matrix.
Real reporting queries and analytics examples.
Best practices for performance and memory optimization.
How to visualize data using Kibana or third-party dashboards like Knowi.
Real-world use cases and FAQs.

What is Elasticsearch?
What Are Aggregations in Elasticsearch?

What is Elasticsearch?

Elasticsearch is a search and analytics engine – it’s often used to search through large amounts of text (like logs, documents, or records), but it’s also very powerful for analytics because it can summarize data quickly.

For example:
Imagine you have millions of sales records.
You could use Elasticsearch to answer:

“What’s the average sale amount per region?”
“How many sales happened this week?”
“Which products have the highest revenue?”

These kinds of summary questions are done using something called Aggregations.

What Are Aggregations in Elasticsearch?

Aggregations are Elasticsearch’s analytics engine – they let you group, filter, and calculate metrics on data stored in your index.

Aggregations are Elasticsearch’s version of “GROUP BY” and “SUM/AVG/COUNT” in SQL.

They let you:

Group data (like by country or category)
Calculate numbers (like totals, averages, max, min)
Build charts (like histograms, date histograms, pie charts, etc.)

In simple terms:

Search = “Find matching documents.”

Aggregation = “Summarize data across those documents.”

Why Are Aggregations Essential for Analytics and Reporting?

Because they let you:

Analyze large datasets without exporting to SQL or BI tools.
Build dashboards that auto-update as data changes.
Perform real-time analytics on logs, metrics, or business data.
Do multi-level grouping (e.g., sales by country → by product → by month).

They’re especially powerful when speed matters – Elasticsearch can compute millions of records in milliseconds.

Common Use Cases for Elasticsearch Aggregations

Dashboards

Goal: Monitor business or operational KPIs in real time.

Example:
An e-commerce company stores each order as a document:

{ "region": "US", "category": "Electronics", "revenue": 200, "status": "Delivered" }

Using terms and sum aggregations, they can track:

Total revenue by region
Average order value by category
Count of pending vs delivered orders

These results feed directly into a dashboard (like Kibana or Knowi) that updates instantly as new orders come in.

Trend Analysis

Goal: Understand how something changes over time.

Example:
A SaaS company wants to see user signups per week.

They use a date_histogram on the signup_date field to group users by week, and a count aggregation to get totals.

This reveals seasonal spikes, growth trends, or declines.

You can also combine it with avg session time or revenue metrics to correlate trends.

Anomaly Detection

Goal: Spot unexpected changes – spikes, drops, or unusual behavior.

Example:
A monitoring system tracks server errors per minute.

It uses:

date_histogram → group logs by minute
count → total errors per minute
moving_avg → rolling average over time
derivative → rate of change between buckets

If today’s error count suddenly doubles compared to the moving average, the system flags an anomaly.

Funnel Analysis

Goal: Measure how users progress through sequential stages.

Example:
An online store tracks users through stages:

Visited homepage
Added item to cart
Completed purchase

Using filters or composite aggregations, Elasticsearch counts how many users reached each stage.

By comparing stage-to-stage percentages, you can identify where users drop off (e.g., only 30% complete checkout).

Log Analytics

Goal: Make sense of application or system logs.

Example:
Each log record includes:

{ "status": 500, "service": "checkout", "response_time": 120 }

With terms and avg aggregations, you can answer:

Which services throw the most 5xx errors?
What’s the average response time per service?
How many errors occurred per hour?

This helps teams troubleshoot, spot slow APIs, or detect outages – all inside Elasticsearch, no external SQL or ETL needed.

The Four Types of Aggregations

There are four main families of aggregations in Elasticsearch.

Bucket Aggregations

Think of these as “grouping” operations.

They divide your documents into sets (buckets) based on shared criteria. Each bucket can then have its own sub-aggregations (metrics, filters, etc.).

Let’s say you have documents like this:

{ "category": "electronics", "price": 100 }
{ "category": "books", "price": 20 }
{ "category": "electronics", "price": 200 }

You want to know: What’s the average price per category?

You’d use a terms aggregation on “category”, and an avg aggregation on “price”.

Here’s the Elasticsearch query for: “average price per category.”

GET /sales/_search
{
  "size": 0,
  "aggs": {
    "avg_price_by_category": {
      "terms": {
        "field": "category.keyword"
      },
      "aggs": {
        "average_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

“size”: 0
Means: don’t return the raw documents, only the aggregated results.
(We’re doing analytics, not search results.)
“aggs”: { … }
This section defines your aggregations.
“avg_price_by_category”: { “terms”: { “field”: “category.keyword” } }
creates buckets by each unique category (like “electronics”, “books”, etc.).
- “terms” = bucket type
- “field”: “category.keyword” = the field you’re grouping by
Inside that, another “aggs”:
“average_price”: { “avg”: { “field”: “price” } }

This calculates the average of price within each category bucket.

The results might look something like:

"aggregations": {
  "avg_price_by_category": {
    "buckets": [
      { "key": "electronics", "doc_count": 2, "average_price": { "value": 150 } },
      { "key": "books", "doc_count": 1, "average_price": { "value": 20 } }
    ]
  }
}

That means:

Electronics → avg price = 150
Books → avg price = 20

Terms Aggregation

What it does: Groups by unique field values (like SQL GROUP BY).
Example use: Revenue by region or category.

"aggs": {
  "revenue_by_region": {
    "terms": { "field": "region.keyword" },
    "aggs": {
      "total_revenue": { "sum": { "field": "revenue" } }
    }
  }
}

Range Aggregation

What it does: Groups numeric or date values into defined ranges.
Example use: Price bands or age groups.

"aggs": {
  "price_ranges": {
    "range": {
      "field": "price",
      "ranges": [
        { "to": 100 },
        { "from": 100, "to": 500 },
        { "from": 500 }
      ]
    },
    "aggs": {
      "avg_price": { "avg": { "field": "price" } }
    }
  }
}

Output might show:

< 100 USD → 200 items
100 – 500 USD → 350 items
> 500 USD → 50 items

Date Histogram

What it does: Buckets by time intervals – day, week, month, etc.
Example use: Orders per week, log counts per hour.

"aggs": {
  "orders_per_week": {
    "date_histogram": {
      "field": "order_date",
      "calendar_interval": "week"
    },
    "aggs": {
      "revenue": { "sum": { "field": "price" } }
    }
  }
}

This is useful for trend analysis and time-series charts.

Filters Aggregation

What it does: Creates fixed buckets based on specific conditions.
Example use: Compare different regions or product categories in one query.

"aggs": {
  "region_comparison": {
    "filters": {
      "filters": {
        "US":  { "term": { "region.keyword": "US" } },
        "EU":  { "term": { "region.keyword": "EU" } },
        "APAC":{ "term": { "region.keyword": "APAC" } }
      }
    },
    "aggs": { "revenue": { "sum": { "field": "revenue" } } }
  }
}

Composite Aggregation

What it does: Combines multiple fields for multi-level grouping and supports pagination (fetching results in batches).
Example use: Revenue per (country, month) pair when data is huge.

"aggs": {
  "sales_by_country_month": {
    "composite": {
      "sources": [
        { "country": { "terms": { "field": "country.keyword" } } },
        { "month":   { "date_histogram": { "field": "order_date", "calendar_interval": "month" } } }
      ]
    },
    "aggs": {
      "total_revenue": { "sum": { "field": "revenue" } }
    }
  }
}

Composite aggregations are great for scrolling through large result sets without overloading memory.

RECAP: Quick Summary

Aggregation Type	Purpose / What It Groups By	Typical Use Cases	Example Output
Terms	Groups by unique field values	Top categories, users, countries	“US – 500 orders”, “UK – 200 orders”
Range	Groups numeric fields into custom ranges	Price ranges, age groups, salary bands	“0–100: 200 items”, “100–500: 350 items”
Date Histogram	Groups by time intervals	Weekly sales, monthly signups, hourly logs	“Jan: 1000 orders”, “Feb: 1500 orders”
Filters	Groups documents into predefined filters	Compare KPIs by region, plan, or environment	“US: $20k revenue”, “EU: $18k revenue”
Composite	Multi-field grouping, supports pagination	Large datasets (e.g., region + month + category)	“US-Jan: $10k”, “UK-Feb: $5k”

Metric Aggregations: “What numbers do we want?”

These aggregations calculate numeric values – sums, averages, unique counts, percentiles, and more.

They’re the building blocks for KPIs, dashboards, and summaries.

Basic Metric Aggregations

Metric Type	What It Does	Example
avg	Calculates average	Average product price
sum	Adds up values	Total revenue
min / max	Finds smallest or largest	Lowest or highest order value
value_count	Counts documents with a value in a field	Count of orders that have a price
stats / extended_stats	Gives multiple metrics (min, max, avg, sum, count, std deviation)	Full price summary

Example 1: Average Price

GET /sales/_search
{
  "size": 0,
  "aggs": {
    "average_price": {
      "avg": { "field": "price" }
    }
  }
}

Output:

"aggregations": {
  "average_price": { "value": 150.0 }
}

Example 2: Total Revenue

GET /sales/_search
{
  "size": 0,
  "aggs": {
    "total_revenue": {
      "sum": { "field": "price" }
    }
  }
}

If you had 3 sales of $10, $20, $30 → the total is 60.

So, metrics tell you “how much”, while buckets tell you “per what” (like per region, per month, per category).

Cardinality (Distinct Counts)

What it does: Counts unique values in a field – like COUNT(DISTINCT user_id) in SQL.

Use case: Count unique users, customers, or sessions.

"aggs": {
  "unique_users": {
    "cardinality": { "field": "user_id" }
  }
}

Example result:

"unique_users": { "value": 10423 }

⚠️ Note: It’s an approximate count using a hashing algorithm (HyperLogLog++). It’s fast and memory-efficient but may differ by 1–2%.

Extended Stats

What it does: Returns extra statistical values beyond the basic stats – like variance, standard deviation, and sum of squares.

Use case: Useful for detecting spread or consistency in data (e.g., revenue variance across stores).

"aggs": {
  "price_distribution": {
    "extended_stats": { "field": "price" }
  }
}

Output includes:

{
  "count": 1000,
  "min": 10,
  "max": 950,
  "avg": 420,
  "sum": 420000,
  "std_deviation": 85,
  "variance": 7200
}

Percentiles

What it does: Shows values at given percentiles (e.g., 50th, 90th, 99th). Tells you thresholds – “90% of requests took less than X ms.”

Use case: Performance monitoring, response time analytics.

"aggs": {
  "response_time_percentiles": {
    "percentiles": {
      "field": "response_time",
      "percents": [50, 90, 99]
    }
  }
}

Example Output:

"values": {
  "50.0": 120,
  "90.0": 250,
  "99.0": 900
}

Percentile Ranks

What it does: Opposite of Percentiles – given a value, it tells what percent of data falls below it.

Use case: “What percent of requests complete in under 200 ms?”

"aggs": {
  "fast_requests": {
    "percentile_ranks": {
      "field": "response_time",
      "values": [200]
    }
  }
}

Output:

"values": { "200.0": 85.7 }

Means: 85.7% of requests finish within 200ms.

To Summarize:

Metric aggregations = your numerical summaries.

Use cardinality for unique counts.
Use percentiles for performance analysis.
Use extended_stats for deeper statistical modeling.

Pipeline Aggregations

Pipeline aggregations take the output of another aggregation and perform calculations on those results. They are a bit like Excel formulas applied to an existing chart.

So instead of working directly on documents, they work on aggregated data.

Moving Average (moving_fn / moving_avg)

What it does: Smooths out fluctuations in time-series data.
Use case: Detect trends in sales, traffic, or errors over time.

"aggs": {
  "sales_over_time": {
    "date_histogram": { "field": "order_date", "calendar_interval": "day" },
    "aggs": {
      "daily_revenue": { "sum": { "field": "price" } },
      "moving_avg_revenue": {
        "moving_fn": { "buckets_path": "daily_revenue", "window": 7 }
      }
    }
  }
}

Result: A 7-day rolling average that smooths daily sales spikes.

Derivative

What it does: Calculates the rate of change between buckets.
Use case: Find growth rates or sudden jumps in metrics.

"aggs": {
  "users_over_time": {
    "date_histogram": { "field": "signup_date", "calendar_interval": "week" },
    "aggs": {
      "weekly_signups": { "value_count": { "field": "user_id" } },
      "growth_rate": {
        "derivative": { "buckets_path": "weekly_signups" }
      }
    }
  }
}

Result: Tells you how signups increased or decreased week-to-week.

Cumulative Sum

What it does: Adds up values across buckets to create a running total.
Use case: Show total sales or revenue accumulated over time.

"aggs": {
  "revenue_over_time": {
    "date_histogram": { "field": "order_date", "calendar_interval": "month" },
    "aggs": {
      "monthly_sales": { "sum": { "field": "revenue" } },
      "total_revenue": {
        "cumulative_sum": { "buckets_path": "monthly_sales" }
      }
    }
  }
}

Result: “Month 1: $5K, Month 2: $11K, Month 3: $18K…” (running total)

Bucket Script

What it does: Lets you do math between multiple aggregations.
Use case: Calculate ratios like conversion rate, profit margin, or error rate.

"aggs": {
  "orders": {
    "date_histogram": { "field": "order_date", "calendar_interval": "month" },
    "aggs": {
      "total_orders": { "value_count": { "field": "order_id" } },
      "completed_orders": {
        "filter": { "term": { "status": "completed" } }
      },
      "completion_rate": {
        "bucket_script": {
          "buckets_path": {
            "completed": "completed_orders>_count",
            "total": "total_orders"
          },
          "script": "params.completed / params.total * 100"
        }
      }
    }
  }
}

Result: % of orders completed each month.

Bucket Sort

What it does: Sorts or filters bucket results after they’ve been calculated.
Use case: Get the Top N months, regions, or users after applying metrics.

"aggs": {
  "region_revenue": {
    "terms": { "field": "region.keyword" },
    "aggs": {
      "total_sales": { "sum": { "field": "revenue" } },
      "top_regions": {
        "bucket_sort": {
          "sort": [{ "total_sales": { "order": "desc" } }],
          "size": 5
        }
      }
    }
  }
}

Result: Top 5 regions by total revenue.

To summarize

Pipeline Type	Purpose	Example
moving_fn / moving_avg	Smooth time series	Rolling 7-day sales
derivative	Calculate change	Weekly growth
cumulative_sum	Running total	Total revenue to date
bucket_script	Math between metrics	Conversion rates
bucket_sort	Sort or filter results	Top 5 by revenue

Matrix Aggregations

Matrix aggregations are used for statistical or correlation analysis when you have multiple numeric fields and want to understand how they relate to each other.

Unlike the other types (which group or summarize data), matrix aggregations compute relationships between fields – kind of like what you’d do in data science or ML preprocessing.

Matrix Stats Aggregation

What it does:
Calculates a statistical matrix including:

count
mean
variance
skewness
kurtosis
covariance and correlation between fields

Use case:
You want to see if two metrics are related – e.g., are higher ad spends correlated with higher conversions?

Example:

"aggs": {
  "campaign_stats": {
    "matrix_stats": {
      "fields": ["ad_spend", "conversions", "clicks"]
    }
  }
}

Example Output:

"aggregations": {
  "campaign_stats": {
    "fields": [
      { "name": "ad_spend", "count": 100, "mean": 2500, "variance": 50000 },
      { "name": "conversions", "count": 100, "mean": 60, "variance": 300 },
      { "name": "clicks", "count": 100, "mean": 1200, "variance": 20000 }
    ],
    "correlations": {
      "ad_spend": { "conversions": 0.87, "clicks": 0.92 }
    }
  }
}

Here, a correlation of 0.87 between ad_spend and conversions means they’re strongly positively related – when one goes up, so does the other.

When to Use Matrix Aggregations

Data science prep: To understand feature relationships before modeling.
Finance: To find correlations between stock prices or revenue streams.
Marketing analytics: To link spend, reach, clicks, and conversions.

But they’re not typically used for dashboards or business KPIs – they’re more for statistical analysis.

Summary of Aggregation Types So Far

Type	Example Use	Works On
Bucket	Group by category, date, range	Documents
Metric	Calculate totals, averages, unique counts	Document fields
Pipeline	Do math on other aggregations	Aggregated results
Matrix	Correlate multiple numeric fields	Statistical analysis

Common Reporting Queries

We’ll now see how everything you’ve learned about aggregations gets applied to real analytics questions.

We’ll start with Top N Values (one of the most common patterns for dashboards).

Top N Values – “Top Products, Users, or Regions”

Goal: Find which items contribute the most – top-selling products, busiest users, most-visited pages, etc.

Example Query: Top 5 products by total sales

GET /sales/_search
{ 
  "size": 0,
  "aggs": {
    "top_products": {
      "terms": {
        "field": "product.keyword",
        "order": { "total_sales": "desc" },
        "size": 5
      },
      "aggs": {
        "total_sales": { "sum": { "field": "price" } }
      }
    }
  }
}

Result:

"buckets": [
  { "key": "Laptop", "total_sales": { "value": 90000 } },
  { "key": "Phone",  "total_sales": { "value": 82000 } },
  ...
]

Use this pattern for “Top 10 customers by revenue,” “Top regions by traffic,” etc.

Trend Over Time – “How Does This Metric Change Each Week or Month?”

Goal: See how a number (like sales, users, or errors) changes across time periods – daily, weekly, monthly, etc.

This is where the date_histogram bucket really shines.

Example: Weekly Signups Over Time

GET /users/_search
{
  "size": 0,
  "aggs": {
    "signups_over_time": {
      "date_histogram": {
        "field": "signup_date",
        "calendar_interval": "week"
      },
      "aggs": {
        "weekly_signups": {
          "value_count": { "field": "user_id" }
        }
      }
    }
  }
}

What happens:

date_histogram groups users by week.
value_count counts how many users signed up in each week.

Sample Output:

"buckets": [
  { "key_as_string": "2025-09-01", "doc_count": 240 },
  { "key_as_string": "2025-09-08", "doc_count": 310 },
  { "key_as_string": "2025-09-15", "doc_count": 450 }
]

Perfect for plotting line charts in Kibana or Knowi.

Adding Trend Metrics

You can easily combine this with a derivative or moving average to show growth rate or smoothed trends.

Example: Weekly Growth Rate

"aggs": {
  "signups_over_time": {
    "date_histogram": { "field": "signup_date", "calendar_interval": "week" },
    "aggs": {
      "weekly_signups": { "value_count": { "field": "user_id" } },
      "growth_rate": {
        "derivative": { "buckets_path": "weekly_signups" }
      }
    }
  }
}

This shows how much signups increased or decreased compared to the previous week.

In short:

Use date_histogram for grouping by time.
Use sum / count / avg to measure something per interval.
Optionally use derivative / moving_avg for growth and trends.

Funnel or Sequential Analysis

Goal: Track how users progress through stages of a journey – like a marketing funnel or signup process.

Example:

Visitors → Signups → Purchases

Each stage filters the data differently, and you compare how many users reach each.

Simple Funnel Example – Using Filters Aggregation

Let’s say your index stores user actions like:

{ "user_id": 1, "event": "visit" }
{ "user_id": 1, "event": "add_to_cart" }
{ "user_id": 1, "event": "purchase" }

You can count how many users did each action:

GET /events/_search
{
  "size": 0,
  "aggs": {
    "funnel_stages": {
      "filters": {
        "filters": {
          "Visited": { "term": { "event.keyword": "visit" } },
          "Added_to_Cart": { "term": { "event.keyword": "add_to_cart" } },
          "Purchased": { "term": { "event.keyword": "purchase" } }
        }
      },
      "aggs": {
        "unique_users": { "cardinality": { "field": "user_id" } }
      }
    }
  }
}

Result:

"buckets": {
  "Visited":        { "unique_users": { "value": 5000 } },
  "Added_to_Cart":  { "unique_users": { "value": 2000 } },
  "Purchased":      { "unique_users": { "value": 700 } }
}

This gives you stage-by-stage counts.

Calculating Conversion Rates

You can use a bucket_script to compute the ratios:

"aggs": {
  "conversion_rate": {
    "bucket_script": {
      "buckets_path": {
        "purchased": "Purchased>unique_users",
        "visited": "Visited>unique_users"
      },
      "script": "params.purchased / params.visited * 100"
    }
  }
}

That’ll give you something like: “14% of visitors completed a purchase.”

Common Uses

Marketing funnels: ad clicks → signups → paid subscriptions
Product usage funnels: trial started → dashboard viewed → report shared
Sales funnels: leads → opportunities → closed deals

Key takeaway:

Use filters for stage-based counts.
Combine with cardinality for unique users.
Optionally add bucket_script for conversion %.

Time Series Anomaly Detection

Goal: Flag unusual spikes or dips in a metric over time (errors, latency, signups, sales).

Pattern 1 – Compare to a Moving Average

Bucket by time
Compute the metric per bucket
Smooth it with a moving window
Alert when actual deviates a lot from the smoothed value

GET /logs/_search
{
  "size": 0,
  "aggs": {
    "per_min": {
      "date_histogram": { "field": "ts", "fixed_interval": "1m" },
      "aggs": {
        "err_count": { "filter": { "range": { "status": { "gte": 500 } } } },
        "count": { "value_count": { "field": "status" } },
        "moving_avg": {
          "moving_fn": { "buckets_path": "count", "window": 15 }
        },
        "delta": {
          "bucket_script": {
            "buckets_path": { "c": "count", "m": "moving_avg" },
            "script": "params.c - params.m"
          }
        },
        "spike": {
          "bucket_selector": {
            "buckets_path": { "d": "delta", "m": "moving_avg" },
            "script": "params.m > 0 && params.d > 3 * Math.sqrt(params.m)"
          }
        }
      }
    }
  }
}

What this does

date_histogram: 1-minute buckets
count: events per minute
moving_fn: rolling average (window=15)
delta: how far current is from the rolling average
bucket_selector: keeps only points where delta is > 3×√mean (a quick Poisson-ish spike check)

Result: Only “spiky” minutes remain in per_min.buckets.

Pattern 2 – Rate of Change (Derivative)

Detect sudden changes vs. the last bucket.

{
  "aggs": {
    "per_min": {
      "date_histogram": { "field": "ts", "fixed_interval": "1m" },
      "aggs": {
        "count": { "value_count": { "field": "id" } },
        "change": { "derivative": { "buckets_path": "count" } },
        "big_jump": {
          "bucket_selector": {
            "buckets_path": { "chg": "change" },
            "script": "params.chg != null && params.chg > 100"
          }
        }
      }
    }
  }
}

Use when: Big jumps matter more than absolute values.

Pattern 3 – Standard Deviation Bands

Compare to mean ± N×std dev over a moving window.

{
  "aggs": {
    "per_5m": {
      "date_histogram": { "field": "ts", "fixed_interval": "5m" },
      "aggs": {
        "latency_p50": { "percentiles": { "field": "latency_ms", "percents": [50] } },
        "roll": {
          "moving_fn": {
            "buckets_path": "latency_p50[50.0]",
            "window": 12,
            "script": "MovingFunctions.unweightedAvg(values)"
          }
        },
        "diff": {
          "bucket_script": {
            "buckets_path": {
              "p50": "latency_p50[50.0]",
              "avg": "roll"
            },
            "script": "params.p50 - params.avg"
          }
        },
        "outlier": {
          "bucket_selector": {
            "buckets_path": { "d": "diff" },
            "script": "Math.abs(params.d) > 75"   // tune threshold
          }
        }
      }
    }
  }
}

Use when: You monitor a latency percentile (p50/p95) and want to flag big deviations.

Practical Tips

Pick the right interval: too small = noisy; too big = slow to react.
Window size: 7–20 buckets often works; tune per data’s seasonality.
Guardrails: add a min doc count per bucket to avoid false positives from sparse data.
Post-filtering: bucket_selector is your friend for alert-style queries.
Visualize: plot actual vs. moving average and highlight kept buckets.

Best Practices for Using Aggregations

High cardinality hurts memory

Prefer lower-cardinality group-bys; for uniques use cardinality with a sensible precision_threshold.
Don’t aggregate on text; use .keyword.

Filters vs post_filter

Filter/query affects hits and aggs.
post_filter narrows hits only (keep facet counts stable).

terms tuning

size = final top buckets; start with 10–50.
Increase shard_size if top terms look wrong.
Use min_doc_count to drop noise.
For “all buckets”/paging, use composite.

Error visibility

Turn on show_term_doc_count_error; inspect doc_count_error_upper_bound.
If large, increase shard_size or prefilter.

Cache & filter context

Put repeatable clauses in bool.filter (cacheable).
Request cache works best with size: 0 and steady data.

Reduce buckets first

Reasonable date_histogram intervals; avoid 1s unless needed.
Add pipelines after you’ve trimmed buckets.

Mind global limits

search.max_buckets (~65k) caps total buckets; tighten intervals or switch to composite.

Mapping hygiene

Group-by fields as keyword/numeric/date; avoid enabling fielddata on text.

Tiny checklist before you run a big agg

Is the group-by field low/medium cardinality?
Can I prefilter the time/window?
Can I reduce interval / size / sub-aggs?
Should I use composite instead of massive terms?
Do I need accurate counts, or are approximate OK (cardinality)?
Am I in filter context for cacheability?

Limitations of Elasticsearch for Analytics

1. No Joins Across Indices

Elasticsearch isn’t a relational database.
You can’t easily do queries that join data from multiple indices (tables), such as:

“Show total sales per customer where customer info is in another index.”

Why:
Each index is designed for fast, distributed searching – not for relational linking.

Workarounds:

Denormalize data: store related fields together during ingestion.
Use parent-child or nested fields, though they have limits and higher query cost.
External join tools: Some platforms (like Knowi) handle cross-index joins automatically before pushing the query to Elasticsearch.

2. Complexity in Multi-Dimensional Grouping

Elasticsearch can group data easily on one or two fields,
but multi-level “cube-style” aggregations (e.g., region → category → product → month) can explode in size.

Why:
Each level creates buckets, and the total buckets grow exponentially with dimensions – quickly hitting the search.max_buckets limit (≈65,000).

Workarounds:

Use composite aggregations to scroll through results in manageable chunks.
Pre-aggregate data at ingest time (e.g., daily summaries).
Keep dashboards simple: limit to 2–3 grouping dimensions.
Consider rollup indices for time-series compression.

3. Limited Visualization & Reporting

Out of the box, Elasticsearch only returns JSON data – no charts, dashboards, or formatted reports.

Why:
It’s an engine, not a visualization tool.

Workarounds:

Kibana: Free, native tool from Elastic for visualization and dashboards.
Third-party tools: Knowi, Grafana, Tableau (via connectors) provide richer visualization, SQL-like query layers, and alerting.
Custom dashboards: Use Elasticsearch APIs + charting libraries (D3, Plotly, etc.) if you want a bespoke setup.

Quick Recap

Limitation	What It Means	Common Fix
No joins	Can’t combine data from multiple indices like SQL joins	Denormalize or use a BI tool that handles joins
Multi-dimensional grouping	Too many buckets = heavy memory & limits	Use composite or rollups
No built-in visualization	Raw JSON output only	Use Kibana or external analytics platforms

How to Build Dashboards on Elasticsearch

Elasticsearch by itself doesn’t render charts – it just provides the data. To visualize it, you need a layer on top. You have three main choices:

Option 1: Kibana (Official Visualization Tool)

What it is:

Kibana is the native visualization tool built by Elastic, designed specifically for Elasticsearch.

Key Features:

Real-time dashboards (auto-refreshing)
Charts: bar, line, pie, maps, heatmaps, gauges
Powerful search bar using Lucene / KQL syntax
Alerting and anomaly detection via Watcher / Machine Learning plugin

How it works:

You connect Kibana to your Elasticsearch cluster.
You use “Visualize Library” or “Lens” to drag and drop fields.
You build dashboards by combining multiple visualizations.

Example Use Cases:

Log analytics dashboards (ELK Stack)
Security monitoring (SIEM)
Application performance metrics

Pros:

Tight integration with Elasticsearch
Real-time updates
Open-source and free (basic tier)

Cons:

Limited for business KPIs or multi-source data
Harder to customize for executive-style dashboards
Requires Elastic Stack setup (and some admin overhead)

Option 2: Custom Dashboards Using Elasticsearch APIs

What it is:

You can query Elasticsearch directly through its REST API and render results in your own web app using libraries like Chart.js, Plotly, or D3.js.

Example Workflow:

Send an aggregation query:

 POST /sales/_search
{
  "size": 0,
  "aggs": { "sales_by_month": { "date_histogram": { "field": "date", "calendar_interval": "month" } } }
}

Parse the JSON response.
Feed the data into your chart library in your frontend app.

Pros:

Fully customizable look and feel.
Can integrate with any front-end (React, Angular, Vue).
Ideal for embedding analytics inside a product.

Cons:

Requires engineering work.
No built-in dashboard management, sharing, or alerting.

Option 3: Third-Party Analytics Platforms

What they do:

These platforms connect to Elasticsearch (and other data sources) to provide unified analytics, joins, AI capabilities, and visualization without extra setup.

Examples:

Knowi: Purpose-built for Elasticsearch, supports cross-index joins, NLP queries (“Ask Data Anything”), and embedded dashboards.
Grafana: Great for time-series metrics and DevOps monitoring.
Tableau / Power BI: Possible via connectors, but less efficient with NoSQL or nested data.

Pros:

Multi-source integration (Elasticsearch + SQL + APIs + files)
Friendly UI for non-technical users
Built-in scheduling, alerts, and AI insights
Easier to share dashboards across teams

Cons:

May require licensing or cloud setup
Some abstraction over raw queries (less control)

Which Option to Choose?

Use Case	Best Option
Log or system monitoring	Kibana
Developer-built app or portal	Custom via APIs
Unified business dashboards or embedded analytics	Knowi
Real-time time-series metrics	Grafana

Real-World Use Cases of Elasticsearch Aggregations

1. Log Analytics: Errors Grouped by Type or Service

Goal: Identify which services or error types occur most frequently.

Data Example:

{ "timestamp": "2025-10-08", "service": "checkout", "status": 500, "error_type": "TimeoutError" }

Aggregation Query:

{
  "size": 0,
  "aggs": {
    "errors_by_service": {
      "terms": { "field": "service.keyword" },
      "aggs": {
        "error_types": { "terms": { "field": "error_type.keyword" } },
        "error_count": { "value_count": { "field": "status" } }
      }
    }
  }
}

Insight:

Checkout → 300 errors, mostly “TimeoutError”
Payments → 250 errors, mostly “AuthFailure”

Business Value: Quickly identify failing services or common error causes.

2. E-commerce: Sales by Region / Product / Time

Goal: Track revenue trends and top-performing products.

Data Example:

{ "order_id": 123, "region": "US", "category": "Electronics", "price": 450, "order_date": "2025-09-15" }

Aggregation Query:

{
  "size": 0,
  "aggs": {
    "sales_by_region": {
      "terms": { "field": "region.keyword" },
      "aggs": {
        "sales_by_month": {
          "date_histogram": { "field": "order_date", "calendar_interval": "month" },
          "aggs": { "total_sales": { "sum": { "field": "price" } } }
        }
      }
    }
  }
}

Insight:

US revenue trending up 20% month-over-month.
Electronics and Apparel are top-grossing categories.

Business Value: Track KPIs like revenue growth and product performance.

3. IoT Analytics: Sensor Summaries & Anomalies

Goal: Detect abnormal sensor readings and summarize device data.

Data Example:

{ "device_id": "A100", "temperature": 82.5, "timestamp": "2025-10-08T10:00:00Z" }

Aggregation Query:

{
  "size": 0,
  "aggs": {
    "avg_temp_per_device": {
      "terms": { "field": "device_id.keyword" },
      "aggs": {
        "avg_temp": { "avg": { "field": "temperature" } },
        "max_temp": { "max": { "field": "temperature" } }
      }
    }
  }
}

Insight:

Device A100 average = 82.5°F
Device B200 average = 97.2°F (potential overheating anomaly)

Business Value: Enables proactive maintenance and real-time alerts.

4. SaaS Product Analytics: Feature Usage Trends

Goal: Measure adoption and retention across product features.

Data Example:

{ "user_id": 22, "feature": "dashboard_export", "event": "used", "timestamp": "2025-09-20" }

Aggregation Query:

{
  "size": 0,
  "aggs": {
    "usage_over_time": {
      "date_histogram": { "field": "timestamp", "calendar_interval": "week" },
      "aggs": {
        "features": { "terms": { "field": "feature.keyword" } }
      }
    }
  }
}

Insight:

Dashboard exports up 40% in the last month.
“Alert setup” feature adoption is declining.

Business Value: Product teams can prioritize improvements or redesign underused features.

Summary of Use Cases

Domain	Key Metrics	Example Aggregations	Insights
Logs	Error count, type	terms, filters, count	Identify failing services
E-commerce	Revenue, orders	terms, date_histogram, sum	Revenue by region & product
IoT	Sensor stats	terms, avg, max, derivative	Detect anomalies
SaaS	Feature usage, retention	terms, date_histogram, cardinality	Track adoption trends

What aggregations unlock:

Elasticsearch aggregations turn raw documents into live analytics– from simple KPIs (sum, avg) to time-series insights (date_histogram + pipelines) and even multi-field stats (matrix). You can answer “how much,” “per what,” and “how it’s changing” without exporting data.

Where ES fits in the analytics stack:

Ingest + Store: Elasticsearch (fast, scalable, search + analytics in one).
Analyze: Aggregations (bucket + metric + pipeline + matrix).
Visualize/Act: Kibana, custom apps, or a BI layer (e.g., Knowi) for joins, sharing, and embedded workflows.

When to reach for something else:

Heavy cross-index joins or complex multi-dimensional cubes → consider denormalization, rollups, or a BI layer that handles joins.
Pixel-perfect reporting or blended data from many systems → purpose-built BI/analytics tools.

Elasticsearch handles search. Knowi handles analytics.

Visualize any aggregation instantly, run joins across indices, and use AI-powered NLQ to ask your Elasticsearch data anything.

From terms and date_histogram to cross-index joins, alerts, and embedded dashboards, Knowi brings complete analytics on top of Elasticsearch – without ETL or complex setup.

Book a demo here – https://www.knowi.com/lp/elasticsearch-analytics/.

If you enjoyed this guide, you’ll find these next reads helpful:

Best Tools for Elasticsearch Analytics: Kibana vs Knowi vs Grafana: A detailed comparison of the top visualization and analytics platforms for Elasticsearch.
Kibana Tutorial 2025: Elasticsearch Visualization Made Simple: A hands-on walkthrough of building dashboards in Kibana – from setting up index patterns to using visual builder and Lens for metrics and trends.

Frequently Asked Questions

What’s the difference between bucket and metric aggregations?

Bucket aggs group documents (by term, time, range, filters, composite).

Metric aggs compute numbers inside those groups (sum, avg, min/max, percentiles, cardinality).

Can Elasticsearch aggregations replace SQL analytics?

For many dashboards and time-series analyses, yes.

For multi-table joins, complex window functions, and ad-hoc SQL across many sources, you’ll likely combine ES with a BI/ETL layer or denormalize data upfront.

How do I aggregate across multiple indices?

Use index patterns (e.g., logs-*) when the schema matches.

True joins across unrelated indices aren’t native-use denormalization, parent-child/nested models (with trade-offs), or a BI layer that joins before/after the ES query.

How do I speed up slow aggregations?

Narrow the time window or add pre-filters.
Prefer keyword/numeric/date fields for group-bys (not text).
Tune size/shard_size (or switch to composite for huge group-bys).
Use filter context (cacheable), sensible date_histogram intervals, and avoid heavy runtime scripts.

Watch cardinality-use approximate cardinality metrics and pre-aggregate where possible.

How do I get “Top N” accurately?

Use a terms agg with order by your metric and reasonable size (e.g., 10–50).
If results seem off (skewed shards), increase shard_size or prefilter.

For full lists or stable paging, use composite.

What’s the best way to do anomaly detection?

Build a date_histogram, compute your metric per bucket, then add pipeline aggs:
- moving_fn (rolling average)
- derivative (rate of change)
- bucket_selector (flag outliers)
Visualize actual vs moving average and alert on deviations.

Share This Post

About the Author:

Sanskriti Garg

Sanskriti Garg is the Marketing Manager at Knowi, where she leads all marketing initiatives for the company. She oversees positioning, messaging, go-to-market strategy, and campaigns that help Knowi reach businesses looking to unify, analyze, and act on their data with powerful AI analytics. Sanskriti brings over 8 years of marketing experience, with a strong consumer-focused mindset and storytelling skills. Her expertise spans marketing, demand generation, AI, and analytics, and she’s passionate about making advanced analytics accessible and impactful for organizations of all sizes.

All Posts

Dashboards & Visualizations

Embedded Analytics

Self-Serve Analytics

AI-powered Analytics

Best In Class BI Capabilities

Data-As-A-Service

Chat with your Documents

Elasticsearch Aggregations Guide: Analytics and Reporting Queries

TL;DR

Table of Contents

What is Elasticsearch?

What Are Aggregations in Elasticsearch?

Why Are Aggregations Essential for Analytics and Reporting?

Common Use Cases for Elasticsearch Aggregations

Dashboards

Trend Analysis

Anomaly Detection

Funnel Analysis

Log Analytics

The Four Types of Aggregations

Bucket Aggregations

Terms Aggregation

Range Aggregation

Date Histogram

Filters Aggregation

Composite Aggregation

RECAP: Quick Summary

Metric Aggregations: “What numbers do we want?”

Basic Metric Aggregations

Cardinality (Distinct Counts)

Extended Stats

Percentiles

Percentile Ranks

Pipeline Aggregations

Moving Average (moving_fn / moving_avg)

Derivative

Cumulative Sum

Bucket Script

Bucket Sort

Matrix Aggregations

Matrix Stats Aggregation

When to Use Matrix Aggregations

Summary of Aggregation Types So Far

Common Reporting Queries

Top N Values – “Top Products, Users, or Regions”

Trend Over Time – “How Does This Metric Change Each Week or Month?”

Adding Trend Metrics

Funnel or Sequential Analysis

Simple Funnel Example – Using Filters Aggregation

Calculating Conversion Rates

Common Uses

Key takeaway:

Time Series Anomaly Detection

Pattern 1 – Compare to a Moving Average

Pattern 2 – Rate of Change (Derivative)

Pattern 3 – Standard Deviation Bands

Practical Tips

Best Practices for Using Aggregations

High cardinality hurts memory

Filters vs post_filter

terms tuning

Error visibility

Cache & filter context

Reduce buckets first

Mind global limits

Mapping hygiene

Tiny checklist before you run a big agg

Limitations of Elasticsearch for Analytics

1. No Joins Across Indices

2. Complexity in Multi-Dimensional Grouping

3. Limited Visualization & Reporting

Quick Recap

How to Build Dashboards on Elasticsearch

Option 1: Kibana (Official Visualization Tool)

What it is:

Key Features:

How it works:

Example Use Cases:

Pros:

Cons: