Table of Contents
- Introduction
- Connecting to Cassandra
- Writing Your First Query
- Joining Across Data Sources
- Post-Processing With Cloud9QL
- Creating Your First Visualization
- Adding Drilldowns
- Search-Based Analytics & Self-Service Analytics
- Summary
Introduction
Apache Cassandra is a highly scalable, high-performance, distributed database. It is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The speed, scalability, and resiliency of Cassandra make it ideal for storing and querying large amounts of data–especially high throughput data. Itâs no wonder it serves as the backbone for streaming giant Netflix and is often used in IoT analytics and real-time data analytics use cases.
Knowi is an analytics platform that natively integrates with Cassandra so you can leverage all the query power of Cassandra to visualize large amounts of data rapidly. Knowi allows you to query Cassandra directly using CQL, or use a drag-and-drop interface to build queries quickly without prior knowledge of the query syntax. Knowi also natively integrates with over 30 SQL/NoSQL/REST-API data sources, allowing you to join your Cassandra data with any combination of these on the fly to create brand new datasets that can be used for downstream analytics. From there, you can choose from a host of visualizations options to create custom interactive dashboards, run ad-hoc analysis, ask questions from your data with a Google search-like analytics feature, apply built-in machine learning algorithms, and more.
For Datastax Analytics, Knowiâs native integration also extends to Datastax Astra, DataStaxâs cloud-native database-as-a-service (DBaaS) built on Apache Cassandra.
This post will walk you through the steps of using Knowi for Cassandra analytics, including setting up connectivity to your Cassandra data source and create interactive visualizations from it. In this demonstration, weâll be analyzing sample marketing data pertaining to an email sending campaign. Among the topics weâll cover include:
- Connecting your Datastax Cassandra data source to Knowi
- Writing your first query
- Joining across multiple data sources
- Post-processing with Cloud9QL
- Creating your first visualization
- Adding Drilldowns
- Using Knowiâs search-based natural language processing (NLP)
Sign up for a free Knowi account here to get started.
Connecting to Cassandra
Knowi has broad native integration to other NoSQL, SQL, REST-API and JSON/CSV data sources. To get started, select your data source and configure the connection. Your data stays in the source so there are no ETL processes to build or ODBC drivers to install.
After logging in to Knowi, weâll start by establishing a connection to your Datastax Cassandra data source.
Steps:
- From the Playground dashboard, select âData sourcesâ on the side panel menu
- Select Datastax from the list of data sources
- Enter your data source credentials and give your data source a name
- Test the connection to confirm successful connection to your data source
- Hit âSaveâ to start querying your Keyspace
Writing Your First Query
Once connected to your Cassandra data source, Knowi auto-detects the tables, keys, and fields within. Knowi also supports Cassandra data types and collections (i.e. ascii, blob, counter) and collections (i.e. list, map, set). To start building your queries, Knowi gives you the option to auto-generate your queries using its drag and drop Query Builder via the UI. This is especially useful for users not as familiar with CQL. For more advanced CQL users, you also have the option to write your queries directly in the smart Query Editor, a versatile text editor specialized for editing code.
In this example, weâll select the demo_data table (which contains email sending activity data) and select the fields we want to analyze from the auto-generated fields from the Query Builder.
Steps:
- In the âEdit Queryâ screen, navigate to the Query Builder section
- In the âTablesâ drop-down menu, select the demo_data table
- In the âMetricsâ drop-down menu, select the fields customer, date, message_type, sent, opened, delivered, and conversions
- Notice that in the Query Editor to the right, a native CQL query is being auto-generated
- If you already knew the query syntax, you could instead write or paste it in directly
- Click âPreviewâ to instantly preview the results, which will be returned in tabular format
- After previewing the results, give your query a name then hit âSaveâ
Joining Across Data Sources
Knowi facilitates joins across multiple data sources to process, blend, and store combined results seamlessly. Traditionally, to perform lookups from one data source to another requires engineering efforts along with mapping storage for the results. Knowi enables you to join across the same or disparate SQL or NoSQL databases to stitch the data back together into a single result along with the ability to store and track it. All join types are supported, including INNER JOIN, LEFT OUTER JOIN, FULL OUTER JOIN, RIGHT OUTER JOIN, and LOOP JOIN.
In this example, weâll connect to another data source, a MySQL database that stores customer information. Once connected, weâll perform a join against the MySQL database to pull customer addresses. We will then preview each section of the join, then do a final preview on the combined dataset.
Steps:
- Repeat the steps from the âConnecting to Cassandraâ section, this time, select a MySQL database as the data source
- Accept the MySQL default credentials, test the connection, give it a name, then Save
- Return to your original Cassandra query and click âJoinâ near the bottom of the screen
- Select the MySQL data source you just saved as the data source
- This will populate the âJoin Fieldsâ section and another âQuery Builderâ and âQuery Editorâ sections below the first one
- In the âTablesâ drop-down menu, select the customer table
- In the âMetricsâ drop-down menu, select the fields customer, street, and state
- In the âJoin Fieldsâ section click âJoin Builderâ. Note that you can also type in the join free-hand in the text bar by entering âcustomer=customerâ
- Once the fields are retrieved, select âINNER JOINâ as the âJoin Type. Under âLeft Fieldâ (Cassandra side), select the key field customer. Under âRight Fieldâ (MySQL side), youâll also select the key field customer then Save
- Optional — Use the Eye icon on the left-hand side of the page near each of the queries to preview the result of each join part separately
- Click âPreviewâ to view the new blended dataset results
Post-Processing With Cloud9QL
Knowi also allows you to analyze and transform this new blended dataset using Cloud9QL, its proprietary SQL-like syntax that enables users to aggregate, manipulate, and calculate new data directly without the need for additional data prep tools. Cloud9QL is particularly useful for post-processing and transforming the returned data to complement native queries, generating alternative views from the same data within Know, and querying file-based data or data stores that have limited query support. Note that Cloud9QL is not a replacement for the underlying query but offers powerful analytics functions on the returned results.
Below, weâll use Cloud9QL to calculate the conversion rate of the email marketing campaign from the delivered and conversions fields. Weâll also apply a simple Cloud9QL date operator to extract the week value from the date field.
Steps:
- Navigate to the âCloud9QL Post Queryâ text box at the bottom of the screen. This is where youâll write your Cloud9QL queries
- Enter the syntax below:
- select *,
- (conversions/delivered)*100 as conversion_rate,
- week(date) as week
- Syntax breakdown:
- Selects all fields from the blended dataset
- Returns the conversion rate and creates a new field conversion_rate
- Uses the date operator week to extract the week value (Truncates to a date to the beginning of the week — Monday) and creates a new field week
- Click âPreviewâ to view the updated results
- Once all changes have been made, Hit âSave & Run Nowâ
Creating Your First Visualization
Once the query is saved, Knowi creates a âVirtual Datasetâ from the query results and stores it in Knowiâs âElastic Storeâ data warehouse that can store and track the results. Unlike traditional warehouses that require complex ETL processes and pre-defined schema, the elastic store is a flexible, scalable, schema-less warehouse. The stored virtual dataset is reusable, and will be the foundation for most of what youâll do in Knowi, like creating visualizations, adding them to dashboards, and much more.
In this example, we will be creating a dashboard with 3 visualizations:
- A basic data grid with the full results of the original query
- A stacked column bar chart that shows the total delivered emails for each customer by message type
- A time-series line chart that compares the average conversion rate for each customer by week
Creating A New Dashboard
Steps:
- On the left side panel, click âDashboardsâ
- Hit the â+â icon to create a new dashboard and give it a name then click âOKâ
- Drag the widget/report from the query you previously created into the dashboard. By default, it will be in grid form
Creating A Stacked Column Chart
Steps:
- On the top-right corner of the widget, click the âNatural Language/Self Service Analytics icon. This will take you to the Analyze screen
- Drag customer and message_type to the âGroupings/Dimensionsâ section
- Drag delivered to the âFields/Metricsâ section. In the âOperationâ dropdown, select âSumâ
- At the top of the screen, click the âVisualizationâ tab. This takes you to the visualization settings screen. We want to create a stacked column chart with customer in the x-axis and sum of delivered in the y-axis
- In the âSettingsâ section under the âVisualization Typeâ dropdown, select âStacked Columnâ
- In the âOptionsâ section under the âGrouping/Legendâ dropdown, select message_type
- Hit the âCloneâ icon at the top right to create a new widget derived from the original data grid. This allows us to keep the original widget as is while having another version that we can freely change
- Give the cloned widget a name, then add it to the dashboard. You now have a new widget that visualizes the total emails delivered by message type for each customer
Creating A Time-Series Line Chart
Steps:
- Replicating the previous steps, go back to the Analyze screen of the data grid widget by clicking the âNatural Language/Self Service Analyticsâ icon
- Drag week and customer to the âGroupings/Dimensionsâ section
- Drag conversion_rate to the âFields/Metricsâ section. In the âOperationâ dropdown, select âAvgâ
- At the top of the screen, click the âVisualizationâ tab. This takes you to the visualization settings screen. We want to create a line chart with week in the x-axis and avg of conversion_rate in the y-axis
- In the âSettingsâ section under the âVisualization Typeâ dropdown, select âLine Chartâ
- In the âOptionsâ section under the âGrouping/Legendâ dropdown, select customer
- In the âY-Axis Tick Intervalâ text box, enter .10
- Hit the âCloneâ icon at the top right to create a another widget derived from the original data grid
- Give the cloned widget a name, then add it to the dashboard. You now have another widget that visualizes the conversion rate trend for each customer by week
Adding Drilldowns
Drilldowns allow you to visually navigate and analyze data in powerful ways. They can be set into another widget, another dashboard, or the same dashboard. Drilldowns can be many levels deep with support for combining different drilldown modes. Data from the parent widget can be used as keys into the drilldown widget or dashboard to filter the data specifically for the point selected. Drilldowns can be configured using the ‘Drilldowns’ menu option on each widget in the dashboard.
In this example, weâll set up a âConnected Widgetsâ drilldown from the stacked column chart (Parent) widget that filters the data on the dashboard when you click on a specific customer.
Steps:
- On the top-right corner of the stacked column widget, click the âMore Settingsâ icon then select âDrilldownsâ. The drilldown menu box will appear
- In the âDrilldown typeâ dropdown, select âConnected Widgetsâ
- In the âWhen clickedâ dropdown, select customer
- For âOptional Drilldown Filtersâ select customer = customer
- Hit Save and Close
- Go to the stacked column chart widget and click on any of the bars representing each customer (i.e. Wells Fargo)
- This will filter all of the widgets on the dashboard based on the customer value âWells Fargoâ
- To return the dashboard to its default state, click the âBackâ button on the top-left corner of the dashboard
Search-Based Analytics & Self-Service Analytics
Knowiâs Natural Language capabilities is a powerful way to enable self-service analytics to non-technical users, by asking questions in plain English to drive insights and visualizations quickly. This gives anyone the ability to make better, data-driven decisions, any time. The technology works by translating your natural language query to a Cloud9QL statement to resolve your request. This search-based analytics feature is available for use across all datasets and widgets within a dashboard. It can be accessed using the Natural Language/Self Serve Analytics icon on a widget.
Below, weâll use this feature to ask a simple question from the data: What is the total number of emails sent for each customer? We will then create a brand new pie chart widget from the answer.
Steps:
- In the NLP text bar at the top of the dashboard, type âtotal sent by customerâ
- Notice that as you type, Knowi makes auto-suggestions about the question you are asking
- When the results are returned, you can check to confirm that the âSumâ operation was performed on the sent field
- At the top of the screen, click the âVisualizationâ tab. Youâll see that Knowi already selected a pie chart as the default visualization. You may change it as needed
- Click âCreate Widgetâ, give it a name, then click â+Createâ and it to the dashboard
By simply typing in questions, you were able to instantly analyze and create a visualization from your data. You can try asking the questions below on your own to get a better feel for it.
Simple field selection:
"bounced, sent, customer" "Show me all for Wells Fargo" "bounced and sent for Wells Fargo"
Aggregations:
"Sum of sent sum of open by customer"
"Total sent by customer by week"
Dates:
"sum delivered weekly" "average sent monthly" "conversion rate by date by customer"
Summary
In summary, we used Knowi to connect to your Cassandra (or Datastax) data source and query it using a dynamic query builder. We demonstrated how to blend data on the fly between Cassandra and a MySQL database, apply post-processing to the blended dataset with Cloud9QL, create multiple visualizations from the dataset, and use search-based analytics to ask questions and gain insights from the data instantly. Visit Knowi to learn more about how its analytics capabilities can leverage the strength of your Cassandra implementation.