What is OpenSearch?
OpenSearch is an open-source, distributed search and analytics suite, forked from the final open-source version of Elasticsearch and Kibana – Elasticsearch 7.10.2. It was initiated by Amazon in response to Elastic’s licensing changes and has since been embraced by a wider community. OpenSearch is built to be compatible with Elasticsearch versions until 7.10, allowing users to leverage existing Elasticsearch skills and tools. It is designed for scalability, offering powerful full-text search capabilities and supporting various data types, including structured and unstructured data. OpenSearch has rapidly developed into a standalone platform with unique features and capabilities. This blog post aims to demystify OpenSearch, explain its workings, and explore its various applications.
How Does OpenSearch Work?
OpenSearch operates on the principle of organizing data into JSON-based documents. These documents are stored in indices, like databases, allowing for efficient data retrieval and management.
Core Components of OpenSearch
Search Engine and Data Store: At its heart, OpenSearch is a distributed search and analytics engine based on Apache Lucene. It enables full-text searches across large datasets, offering features such as field-specific searches, multi-index queries, results ranking, and data aggregation. Besides its primary function as a search engine, OpenSearch can also act as a NoSQL data store, with the database behavior primarily designed to enhance its search and analytics capabilities.
OpenSearch Dashboards: Serving as the visualization and user interface, OpenSearch Dashboards allow users to explore, visualize, and analyze their data in real-time. It supports applications like application monitoring, threat detection, incident management, and personalized search. Built using TypeScript, the dashboard enables users to construct queries using the Dashboard Query Language (DQL) and other query languages.
Figure 1: OpenSearch Dashboard (source: OpenSearch.org)
Interacting with OpenSearch
REST API: OpenSearch clusters can be interacted with using the REST API, which gives developers great flexibility. This API can be accessed using tools like ‘curl’ or any programming language capable of sending HTTP requests.
Query Languages: Developers can interact with OpenSearch using several query languages, including Query DSL, OpenSearch SQL, and Piped Processing Language.
Node and Cluster Management
Nodes: Nodes: In OpenSearch, a node is a single running instance of the engine. While multiple instances can technically run on a single server, best practices suggest limiting it to one instance per server.
Clusters: A cluster is a collection of one or more nodes sharing the same cluster name. Clusters handle the overall operation, track the state of the cluster, and allocate shards to nodes.
Figure 2: Clusters in OpenSearch (Source: OpenSearch.org)
Document and Index Management
Document: In OpenSearch, a document is a JSON file stored in a cluster. It is analogous to a database record.
Index: An index in OpenSearch is a logical namespace akin to a table in a relational database. It is linked to one or more primary shards and can have zero or more replica shards.
Sharding and Replica Mechanisms
Primary and Replica Shards: Each document is stored in a primary shard and then replicated across replica shards. Primary shards split the data, enabling parallel query processing, while replica shards provide failover and performance boosts.
Plug-ins and Extended Functionality
OpenSearch enhances its core functionalities with several features and plugins. Notable among these are:
- Anomaly Detection for identifying unusual data patterns
- KNN for nearest neighbor searches in vector data
- Performance Analyzer for optimizing cluster performance
- SQL and piped processing language for data querying
- Index State Management for automating index operations
- ML Commons plugin for machine learning model training and execution
- Asynchronous search for background search requests
- Cross-cluster replication to duplicate data across multiple OpenSearch clusters.
These features and corresponding OpenSearch Dashboards plugins provide a user-friendly and unified interface for comprehensive data management and analysis.
Deployment and Operation
OpenSearch can be self-hosted or managed as a service (SaaS), with various cloud services offering OpenSearch hosting. The architecture supports various node types, including data, coordinating, and cluster manager nodes, each serving distinct roles in the data processing and search operations.
OpenSearch Uses and Applications
OpenSearch offers a comprehensive solution for various data management and analysis needs. OpenSearch enables the following use cases:
1. Fast, scalable full-text search
2. Application and infrastructure monitoring
3. Security and event information management
4. Operational health tracking.
OpenSearch, like Elasticsearch, is versatile. Its applications range from powering website search features to analyzing large volumes of log data. Below are some of the primary use cases:
- Site Search: Enhancing website search capabilities, allowing users to efficiently search through large volumes of content.
- Log Analytics: Ingesting, storing, and analyzing log data from various sources, enabling real-time insights into system performance and user behavior.
- Business Intelligence: Facilitating data analysis and visualization, helping businesses glean actionable insights from their data.
- Security and Compliance: Monitoring and analyzing security logs for threats and vulnerabilities.
Why OpenSearch Stands Out
- Open-Source Nature: One of the key appeals of OpenSearch is its commitment to being fully open source under the Apache 2.0 license, ensuring freedom to use, modify, and distribute the software.
- Community-Driven Development: OpenSearch thrives on community contributions, leading to a diverse and rapidly evolving feature set.
- Compatibility with Elasticsearch: OpenSearch maintains backward compatibility with versions of Elasticsearch until 7.10, easing the transition for existing Elasticsearch users.
Security in OpenSearch is multifaceted, encompassing four primary features:
- Encryption: OpenSearch employs encryption both in transit, using the TLS protocol for secure data movement within the cluster, and at rest, managed by the operating system of each node.
- Authentication: For user authentication, it supports various methods like basic credentials and TLS certificates and allows integration with standard protocols like LDAP and SAML.
- Access control: Access control in OpenSearch is role-based, enabling detailed permissions for users at the cluster, index, document, and field levels, including field masking for sensitive data.
- Audit logging and compliance measures: The platform also offers comprehensive audit logging and compliance features for monitoring cluster activities and ensuring data integrity, which is essential for post-breach analysis and compliance requirements.
Collectively, these features provide robust protection for sensitive data through layered defense mechanisms and varied access levels. Additional security functionalities include multi-tenancy in OpenSearch Dashboards and cross-cluster search capabilities seamlessly integrated with OpenSearch’s access control infrastructure.
OpenSearch is a search engine that is versatile and scalable, supported by a robust open-source community. Its ability to quickly and efficiently handle large volumes of different data types makes it a formidable choice for various applications. If you want to learn more about OpenSearch and try it out for yourself, you can get started here. For more advanced use cases in which you need to join and blend your OpenSearch data across multiple indexes and other SQL/NoSQL/REST-API data sources, check out Knowi, an analytics platform that natively integrates with OpenSearch and is accessible to both technical and non-technical users. For those looking to explore OpenSearch or migrate from Elasticsearch, you can also read our blog post on Elasticsearch Vs OpenSearch.