Blog â€ș Native Analytics on Apache Cassandra with Knowi
a

Native Analytics on Apache Cassandra with Knowi

Apache Cassandra

Table of Contents

Introduction

Apache Cassandra is a highly scalable, high-performance, distributed database. It is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The speed, scalability, and resiliency of Cassandra make it ideal for storing and querying large amounts of data–especially high throughput data. It’s no wonder it serves as the backbone for streaming giant Netflix and is often used in IoT analytics and real-time data analytics use cases.

Knowi is an analytics platform that natively integrates with Cassandra so you can leverage all the query power of Cassandra to visualize large amounts of data rapidly. Knowi allows you to query Cassandra directly using CQL, or use a drag-and-drop interface to build queries quickly without prior knowledge of the query syntax. Knowi also natively integrates with over 30 SQL/NoSQL/REST-API data sources, allowing you to join your Cassandra data with any combination of these on the fly to create brand new datasets that can be used for downstream analytics. From there, you can choose from a host of visualizations options to create custom interactive dashboards, run ad-hoc analysis, ask questions from your data with a Google search-like analytics feature, apply built-in machine learning algorithms, and more.

For Datastax Analytics, Knowi’s native integration also extends to Datastax Astra, DataStax’s cloud-native database-as-a-service (DBaaS) built on Apache Cassandra.

This post will walk you through the steps of using Knowi for Cassandra analytics, including setting up connectivity to your Cassandra data source and create interactive visualizations from it. In this demonstration, we’ll be analyzing sample marketing data pertaining to an email sending campaign. Among the topics we’ll cover include:

  1. Connecting your Datastax Cassandra data source to Knowi
  2. Writing your first query
  3. Joining across multiple data sources
  4. Post-processing with Cloud9QL
  5. Creating your first visualization
  6. Adding Drilldowns
  7. Using Knowi’s search-based natural language processing (NLP)

Sign up for a free Knowi account here to get started.

Connecting to Cassandra

Knowi has broad native integration to other NoSQL, SQL, REST-API and JSON/CSV data sources. To get started, select your data source and configure the connection. Your data stays in the source so there are no ETL processes to build or ODBC drivers to install.

After logging in to Knowi, we’ll start by establishing a connection to your Datastax Cassandra data source.

Steps:

  1. From the Playground dashboard, select “Data sources” on the side panel menu
  2. Select Datastax from the list of data sources
  3. Enter your data source credentials and give your data source a name
  4. Test the connection to confirm successful connection to your data source
  5. Hit “Save” to start querying your Keyspace
Connecting to Datastax (Cassandra) datasource (Source - knowi.com)
Connecting to Datastax (Cassandra) datasource (Source – knowi.com)

Writing Your First Query

Once connected to your Cassandra data source, Knowi auto-detects the tables, keys, and fields within. Knowi also supports Cassandra data types and collections (i.e. ascii, blob, counter) and collections (i.e. list, map, set). To start building your queries, Knowi gives you the option to auto-generate your queries using its drag and drop Query Builder via the UI. This is especially useful for users not as familiar with CQL. For more advanced CQL users, you also have the option to write your queries directly in the smart Query Editor, a versatile text editor specialized for editing code.

In this example, we’ll select the demo_data table (which contains email sending activity data) and select the fields we want to analyze from the auto-generated fields from the Query Builder.

Steps:

  1. In the “Edit Query” screen, navigate to the Query Builder section
  2. In the “Tables” drop-down menu, select the demo_data table
  3. In the “Metrics” drop-down menu, select the fields customer, date, message_type, sent, opened, delivered, and conversions
    1. Notice that in the Query Editor to the right, a native CQL query is being auto-generated
    2. If you already knew the query syntax, you could instead write or paste it in directly
  4. Click “Preview” to instantly preview the results, which will be returned in tabular format
  5. After previewing the results, give your query a name then hit “Save”
Writing Your First Cassandra Query (Source - knowi.com)
Writing Your First Cassandra Query (Source – knowi.com)

Joining Across Data Sources

Knowi facilitates joins across multiple data sources to process, blend, and store combined results seamlessly. Traditionally, to perform lookups from one data source to another requires engineering efforts along with mapping storage for the results. Knowi enables you to join across the same or disparate SQL or NoSQL databases to stitch the data back together into a single result along with the ability to store and track it. All join types are supported, including INNER JOIN, LEFT OUTER JOIN, FULL OUTER JOIN, RIGHT OUTER JOIN, and LOOP JOIN.

In this example, we’ll connect to another data source, a MySQL database that stores customer information. Once connected, we’ll perform a join against the MySQL database to pull customer addresses. We will then preview each section of the join, then do a final preview on the combined dataset.

Steps:

  1. Repeat the steps from the “Connecting to Cassandra” section, this time, select a MySQL database as the data source
  2. Accept the MySQL default credentials, test the connection, give it a name, then Save
  3. Return to your original Cassandra query and click “Join” near the bottom of the screen
  4. Select the MySQL data source you just saved as the data source
  5. This will populate the “Join Fields” section and another “Query Builder” and “Query Editor” sections below the first one
  6. In the “Tables” drop-down menu, select the customer table
  7. In the “Metrics” drop-down menu, select the fields customer, street, and state
  8. In the “Join Fields” section click “Join Builder”. Note that you can also type in the join free-hand in the text bar by entering “customer=customer”
  9. Once the fields are retrieved, select ‘INNER JOIN’ as the ‘Join Type. Under ‘Left Field’ (Cassandra side), select the key field customer. Under ‘Right Field’ (MySQL side), you’ll also select the key field customer then Save
  10. Optional — Use the Eye icon on the left-hand side of the page near each of the queries to preview the result of each join part separately
  11. Click “Preview’ to view the new blended dataset results
Joining Cassandra Data with a MySQL datasource (Source - knowi.com)
Joining Cassandra Data with a MySQL datasource (Source – knowi.com)

Post-Processing With Cloud9QL

Knowi also allows you to analyze and transform this new blended dataset using Cloud9QL, its proprietary SQL-like syntax that enables users to aggregate, manipulate, and calculate new data directly without the need for additional data prep tools. Cloud9QL is particularly useful for post-processing and transforming the returned data to complement native queries, generating alternative views from the same data within Know, and querying file-based data or data stores that have limited query support. Note that Cloud9QL is not a replacement for the underlying query but offers powerful analytics functions on the returned results.

Below, we’ll use Cloud9QL to calculate the conversion rate of the email marketing campaign from the delivered and conversions fields. We’ll also apply a simple Cloud9QL date operator to extract the week value from the date field.

Steps:

  1. Navigate to the “Cloud9QL Post Query” text box at the bottom of the screen. This is where you’ll write your Cloud9QL queries
  2. Enter the syntax below:
    1. select *,
    2. (conversions/delivered)*100 as conversion_rate,
    3. week(date) as week
  3. Syntax breakdown:
    1. Selects all fields from the blended dataset
    2. Returns the conversion rate and creates a new field conversion_rate
    3. Uses the date operator week to extract the week value (Truncates to a date to the beginning of the week — Monday) and creates a new field week
  4. Click “Preview’ to view the updated results
  5. Once all changes have been made, Hit “Save & Run Now”
Post-processing Combined Query Results with Cloud9QL (Source - knowi.com)
Post-processing Combined Query Results with Cloud9QL (Source – knowi.com)

Creating Your First Visualization

Once the query is saved, Knowi creates a “Virtual Dataset” from the query results and stores it in Knowi’s “Elastic Store” data warehouse that can store and track the results. Unlike traditional warehouses that require complex ETL processes and pre-defined schema, the elastic store is a flexible, scalable, schema-less warehouse. The stored virtual dataset is reusable, and will be the foundation for most of what you’ll do in Knowi, like creating visualizations, adding them to dashboards, and much more.

In this example, we will be creating a dashboard with 3 visualizations:

  • A basic data grid with the full results of the original query
  • A stacked column bar chart that shows the total delivered emails for each customer by message type
  • A time-series line chart that compares the average conversion rate for each customer by week

Creating A New Dashboard

Steps:

  1. On the left  side panel, click “Dashboards”
  2. Hit the “+” icon to create a new dashboard and give it a name then click “OK”
  3. Drag the widget/report from the query you previously created into the dashboard. By default, it will be in grid form
Creating a New Dashboard in Knowi (Source - knowi.com)
Creating a New Dashboard in Knowi (Source – knowi.com)

Creating A Stacked Column Chart

Steps:

  1. On the top-right corner of the widget, click the “Natural Language/Self Service Analytics icon. This will take you to the Analyze screen
  2. Drag customer and message_type to the “Groupings/Dimensions” section
  3. Drag delivered to the “Fields/Metrics” section. In the “Operation” dropdown, select “Sum”
  4. At the top of the screen, click the “Visualization” tab. This takes you to the visualization settings screen. We want to create a stacked column chart with customer in the x-axis and sum of delivered in the y-axis
  5. In the “Settings” section under the “Visualization Type” dropdown, select “Stacked Column”
  6. In the “Options” section under the “Grouping/Legend” dropdown, select message_type
  7. Hit the “Clone” icon at the top right to create a new widget derived from the original data grid. This allows us to keep the original widget as is while having another version that we can freely change
  8. Give the cloned widget a name, then add it to the dashboard. You now have a new widget that visualizes the total emails delivered by message type for each customer
 Apache Cassandra
Creating a Stacked Column Chart Visualization in Knowi (Source – knowi.com)

Creating A Time-Series Line Chart

Steps:

  1. Replicating the previous steps, go back to the Analyze screen of the data grid widget by clicking the “Natural Language/Self Service Analytics” icon
  2. Drag week and customer to the “Groupings/Dimensions” section
  3. Drag conversion_rate to the “Fields/Metrics” section. In the “Operation” dropdown, select “Avg”
  4. At the top of the screen, click the “Visualization” tab. This takes you to the visualization settings screen. We want to create a line chart with week in the x-axis and avg of conversion_rate in the y-axis
  5. In the “Settings” section under the “Visualization Type” dropdown, select “Line Chart”
  6. In the “Options” section under the “Grouping/Legend” dropdown, select customer
  7. In the “Y-Axis Tick Interval” text box, enter .10
  8. Hit the “Clone” icon at the top right to create a another widget derived from the original data grid
  9. Give the cloned widget a name, then add it to the dashboard. You now have another widget that visualizes the conversion rate trend for each customer by week
 Apache Cassandra
Crating a Time-Series Line Chart Visualization in Knowi (Source – knowi.com)

Adding Drilldowns

Drilldowns allow you to visually navigate and analyze data in powerful ways. They can be set into another widget, another dashboard, or the same dashboard. Drilldowns can be many levels deep with support for combining different drilldown modes. Data from the parent widget can be used as keys into the drilldown widget or dashboard to filter the data specifically for the point selected. Drilldowns can be configured using the ‘Drilldowns’ menu option on each widget in the dashboard.

In this example, we’ll set up a “Connected Widgets” drilldown from the stacked column chart (Parent) widget that filters the data on the dashboard when you click on a specific customer.

Steps:

  1. On the top-right corner of the stacked column widget, click the ‘More Settings’ icon then select ‘Drilldowns’. The drilldown menu box will appear
  2. In the “Drilldown type” dropdown, select “Connected Widgets”
  3. In the “When clicked” dropdown, select customer
  4. For “Optional Drilldown Filters” select customer = customer
  5. Hit Save and Close
  6. Go to the stacked column chart widget and click on any of the bars representing each customer (i.e. Wells Fargo)
  7. This will filter all of the widgets on the dashboard based on the customer value “Wells Fargo”
  8. To return the dashboard to its default state, click the “Back” button on the top-left corner of the dashboard
Adding Drilldowns to Your Dashboard (Source - knowi.com)
Adding Drilldowns to Your Dashboard (Source – knowi.com)

Knowi’s Natural Language capabilities is a powerful way to enable self-service analytics to non-technical users, by asking questions in plain English to drive insights and visualizations quickly. This gives anyone the ability to make better, data-driven decisions, any time. The technology works by translating your natural language query to a Cloud9QL statement to resolve your request. This search-based analytics feature is available for use across all datasets and widgets within a dashboard. It can be accessed using the Natural Language/Self Serve Analytics icon on a widget.

Below, we’ll use this feature to ask a simple question from the data: What is the total number of emails sent for each customer? We will then create a brand new pie chart widget from the answer.

Steps:

  1. In the NLP text bar at the top of the dashboard, type “total sent by customer”
  2. Notice that as you type, Knowi makes auto-suggestions about the question you are asking
  3. When the results are returned, you can check to confirm that the “Sum” operation was performed on the sent field
  4. At the top of the screen, click the “Visualization” tab. You’ll see that Knowi already selected a pie chart as the default visualization. You may change it as needed
  5. Click “Create Widget”, give it a name, then click “+Create” and it to the dashboard
Ask Questions From Your Data in Plain-English with Knowi's Search-based Analytics (Source - knowi.com)
Ask Questions From Your Data in Plain-English with Knowi’s Search-based Analytics (Source – knowi.com)

By simply typing in questions, you were able to instantly analyze and create a visualization from your data. You can try asking the questions below on your own to get a better feel for it.

Simple field selection:

"bounced, sent, customer"
"Show me all for Wells Fargo"
"bounced and sent for Wells Fargo"

Aggregations:

"Sum of sent sum of open by customer"
"Total sent by customer by week"

Dates:

"sum delivered weekly"
"average sent monthly"
"conversion rate by date by customer"

Summary

In summary, we used Knowi to connect to your Cassandra (or Datastax) data source and query it using a dynamic query builder. We demonstrated how to blend data on the fly between Cassandra and a MySQL database, apply post-processing to the blended dataset with Cloud9QL, create multiple visualizations from the dataset, and use search-based analytics to ask questions and gain insights from the data instantly. Visit Knowi to learn more about how its analytics capabilities can leverage the strength of your Cassandra implementation.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
About the Author:

RELATED POSTS