OpenSearch: The Open-Source Search and Analytics suite

OpenSearch: Overview, Deployement and Working

What is OpenSearch?

OpenSearch is an open-source, distributed search and analytics suite, forked from the final open-source version of Elasticsearch and Kibana – Elasticsearch 7.10.2. It was initiated by Amazon in response to Elastic’s licensing changes and has since been embraced by a wider community. OpenSearch is built to be compatible with Elasticsearch versions until 7.10, allowing users to leverage existing Elasticsearch skills and tools. It is designed for scalability, offering powerful full-text search capabilities and supporting various data types, including structured and unstructured data. OpenSearch has rapidly developed into a standalone platform with unique features and capabilities. This blog post aims to demystify OpenSearch, explain its workings, and explore its various applications.

How Does OpenSearch Work?

OpenSearch operates on the principle of organizing data into JSON-based documents. These documents are stored in indices, like databases, allowing for efficient data retrieval and management.

Core Components of OpenSearch

Search Engine and Data Store: At its heart, OpenSearch is a distributed search and analytics engine based on Apache Lucene. It enables full-text searches across large datasets, offering features such as field-specific searches, multi-index queries, results ranking, and data aggregation. Besides its primary function as a search engine, OpenSearch can also act as a NoSQL data store, with the database behavior primarily designed to enhance its search and analytics capabilities.

OpenSearch Dashboards: Serving as the visualization and user interface, OpenSearch Dashboards allow users to explore, visualize, and analyze their data in real-time. It supports applications like application monitoring, threat detection, incident management, and personalized search. Built using TypeScript, the dashboard enables users to construct queries using the Dashboard Query Language (DQL) and other query languages.

OpenSearch visualization Dashboard. This is built using TypeScript.

Figure 1: OpenSearch Dashboard (source: OpenSearch.org)

Interacting with OpenSearch

REST API: OpenSearch clusters can be interacted with using the REST API, which gives developers great flexibility. This API can be accessed using tools like ‘curl’ or any programming language capable of sending HTTP requests.

Query Languages: Developers can interact with OpenSearch using several query languages, including Query DSL, OpenSearch SQL, and Piped Processing Language.

Node and Cluster Management

Nodes: Nodes: In OpenSearch, a node is a single running instance of the engine. While multiple instances can technically run on a single server, best practices suggest limiting it to one instance per server.

Clusters: A cluster is a collection of one or more nodes sharing the same cluster name. Clusters handle the overall operation, track the state of the cluster, and allocate shards to nodes.

A cluster is a collection of one or more nodes sharing the same cluster name. This image is of Clusters in OpenSearch.

Figure 2: Clusters in OpenSearch (Source: OpenSearch.org)

Document and Index Management

Document: In OpenSearch, a document is a JSON file stored in a cluster. It is analogous to a database record.

Index: An index in OpenSearch is a logical namespace akin to a table in a relational database. It is linked to one or more primary shards and can have zero or more replica shards.

Sharding and Replica Mechanisms

Primary and Replica Shards: Each document is stored in a primary shard and then replicated across replica shards. Primary shards split the data, enabling parallel query processing, while replica shards provide failover and performance boosts.

Plug-ins and Extended Functionality

OpenSearch enhances its core functionalities with several features and plugins. Notable among these are:

Anomaly Detection for identifying unusual data patterns
KNN for nearest neighbor searches in vector data
Performance Analyzer for optimizing cluster performance
SQL and piped processing language for data querying
Index State Management for automating index operations
ML Commons plugin for machine learning model training and execution
Asynchronous search for background search requests
Cross-cluster replication to duplicate data across multiple OpenSearch clusters.

These features and corresponding OpenSearch Dashboards plugins provide a user-friendly and unified interface for comprehensive data management and analysis.

Deployment and Operation

OpenSearch can be self-hosted or managed as a service (SaaS), with various cloud services offering OpenSearch hosting. The architecture supports various node types, including data, coordinating, and cluster manager nodes, each serving distinct roles in the data processing and search operations.

OpenSearch Uses and Applications

OpenSearch offers a comprehensive solution for various data management and analysis needs. OpenSearch enables the following use cases:

1. Fast, scalable full-text search

2. Application and infrastructure monitoring

3. Security and event information management

4. Operational health tracking.

OpenSearch, like Elasticsearch, is versatile. Its applications range from powering website search features to analyzing large volumes of log data. Below are some of the primary use cases:

Site Search: Enhancing website search capabilities, allowing users to efficiently search through large volumes of content.
Log Analytics: Ingesting, storing, and analyzing log data from various sources, enabling real-time insights into system performance and user behavior.
Business Intelligence: Facilitating data analysis and visualization, helping businesses glean actionable insights from their data.
Security and Compliance: Monitoring and analyzing security logs for threats and vulnerabilities.

Why OpenSearch Stands Out

Open-Source Nature: One of the key appeals of OpenSearch is its commitment to being fully open source under the Apache 2.0 license, ensuring freedom to use, modify, and distribute the software.
Community-Driven Development: OpenSearch thrives on community contributions, leading to a diverse and rapidly evolving feature set.
Compatibility with Elasticsearch: OpenSearch maintains backward compatibility with versions of Elasticsearch until 7.10, easing the transition for existing Elasticsearch users.

Security

Security in OpenSearch is multifaceted, encompassing four primary features:

Encryption: OpenSearch employs encryption both in transit, using the TLS protocol for secure data movement within the cluster, and at rest, managed by the operating system of each node.
Authentication: For user authentication, it supports various methods like basic credentials and TLS certificates and allows integration with standard protocols like LDAP and SAML.
Access control: Access control in OpenSearch is role-based, enabling detailed permissions for users at the cluster, index, document, and field levels, including field masking for sensitive data.
Audit logging and compliance measures: The platform also offers comprehensive audit logging and compliance features for monitoring cluster activities and ensuring data integrity, which is essential for post-breach analysis and compliance requirements.

Collectively, these features provide robust protection for sensitive data through layered defense mechanisms and varied access levels. Additional security functionalities include multi-tenancy in OpenSearch Dashboards and cross-cluster search capabilities seamlessly integrated with OpenSearch’s access control infrastructure.

Summary

OpenSearch is a search engine that is versatile and scalable, supported by a robust open-source community. Its ability to quickly and efficiently handle large volumes of different data types makes it a formidable choice for various applications. If you want to learn more about OpenSearch and try it out for yourself, you can get started here. For more advanced use cases in which you need to join and blend your OpenSearch data across multiple indexes and other SQL/NoSQL/REST-API data sources, check out Knowi, an analytics platform that natively integrates with OpenSearch and is accessible to both technical and non-technical users. For those looking to explore OpenSearch or migrate from Elasticsearch, you can also read our blog post on Elasticsearch Vs OpenSearch.

Frequently Asked Questions

What is OpenSearch?

OpenSearch is an open-source search and analytics engine, originally forked from Elasticsearch 7.10.2 and Kibana. It is maintained by a community led by Amazon.

How is OpenSearch different from Elasticsearch?

OpenSearch remains fully open-source under the Apache 2.0 license. It is compatible with Elasticsearch up to version 7.10 but has since added its own unique features and plugins.

Who maintains OpenSearch?

Amazon initiated OpenSearch, but it is now community-driven and open to contributions from developers and organizations worldwide.

Is OpenSearch good for business intelligence and analytics?

Not natively. While OpenSearch is great for search and log analysis, it lacks advanced BI features like multi-source joins, drag-and-drop dashboards, and predictive analytics.

What are the main limitations of using OpenSearch for analytics?

No support for complex joins across indices or external data sources

Limited visualization options compared to full BI tools

Lacks native support for business metrics, KPIs, and user-friendly dashboards

Difficult for non-technical users to explore or analyze data

Manual effort needed to blend structured and unstructured data

Why is querying across multiple indices in OpenSearch difficult?

OpenSearch supports querying multiple indices but doesn't support relational joins or transformations natively, making multi-index analytics cumbersome and manual.

Can OpenSearch replace a full BI platform?

No. OpenSearch is optimized for search and log analytics but doesn’t offer the full suite of BI capabilities like scheduling, alerts, dashboard interactivity, or governed semantic layers.

What’s the workaround for analytics on OpenSearch?

Pair OpenSearch with a BI platform like Knowi that supports native OpenSearch integration, multi-source joins, AI-generated dashboards, and easy data exploration.

Unify. Analyze. Act.

Experience AI Data Analytics across any data source with Knowi

Share This Post

About the Author:

Puja Ambalgekar

Puja is a Technical content writer at Knowi with 3+ years of experience writing Technical content in the IT field. She has experience delivering product documentation, how-to guides, user manuals, developer documentation, blog posts, SRS specifications, whitepapers, copywriting content, and SEO optimization. She also has an MS in Computer Science and 4+ years of experience as a Software Developer.

All Posts

Dashboards & Visualizations

Embedded Analytics

Self-Serve Analytics

AI-powered Analytics

Best In Class BI Capabilities

Data-As-A-Service

Chat with your Documents

OpenSearch: The Open-Source Search and Analytics suite

OpenSearch: Overview, Deployement and Working

What is OpenSearch?

How Does OpenSearch Work?

Core Components of OpenSearch

Interacting with OpenSearch

Node and Cluster Management

Document and Index Management

Sharding and Replica Mechanisms

Plug-ins and Extended Functionality

Deployment and Operation

OpenSearch Uses and Applications

Why OpenSearch Stands Out

Security

Summary

Frequently Asked Questions

Unify. Analyze. Act.

Share This Post

Puja Ambalgekar

Unify. Analyze. Act.

RELATED POSTS

5 Challenges for the Modern Data Engineering Teams Today (And How to Solve Them)

OpenSearch: Challenges, Use Cases & Analytics with Knowi

Amazon DocumentDB: Challenges, Solutions & How Knowi Helps

Joining Couchbase and SQL data and doing multi-datasource analytics – Tutorial

How to Join MongoDB Data with MySQL, Elasticsearch, REST APIs, and Amazon Redshift

Is MongoDB Good for Analytics?

Platform

Solutions

Resources

About Us

Follow Us