Multi Data Source Joins - Cross-Database Analytics

Imagine you need access to two unrelated datasources ? one that stores data about the supplier, with a customer identifier, and another that stores data about the customer and billing information. When left in separate datasources it is difficult to combine the information and understand what the data means. Knowi enables you to use joins to stitch and track the results together.

Overview

Increasingly, enterprises are adopting polyglot persistence architectures into their infrastructure that allows them to choose the right database for the right job. Often, you'll need to correlate/combine data from one source to another. Knowi enables join queries against multiple disparate datasources, including NoSQL Databases, without the need for large scale data wrangling efforts.

UI

Step 1:

An example on how to query datasets stored within the warehouse: 1. Create a new query from the Mongo Datasource (if one does not exist): 2. In this case we will use data the Query Generator to dynamically generate the queries.

a. select "sendingActivity" from the collections drop down menu. b. select "sent", "customer" from the metrics drop down menu. c. select "date" from the dimensions drop down menu. Once data has been selected, once again click on date and select the "date" option from the date grouping drop down menu. 3. Click "preview" to view the results.

Step 2:

Once step 1 has been completed:

An example on how to combine query datasets stored within the warehouse into one chart:

  1. Click "join"
  2. Add a new MySQL datasource from the dropdown menu
  3. Use data the Query Generator to dynamically generate the queries.

    a. select "customer" from the collections drop down menu

  4. Map fields from previous queries to the current one using "join fields" a. enter in "customer=customer" into the join fields box
  5. Click "preview" to view the results

Notes: Additional Cloud9QL section appears at the bottom of the page as soon as a joined data source is selected. This section can be used to apply a Cloud9QL query as the very last step of processing.

Notes: Fish eye icon appears on the left hand side of the page near each of the Query Builder section as soon as a joined data source is selected. This icon can be used to preview the result of each join part separately. Preview per join

Cloud9Agent Configuration

As an alternative to the UI based connectivity above, you can use Cloud9Agent inside your network to pull from Couchbase securely. See Cloud9Agent to download your agent along with instructions to run it.

Highlights * Pull data using N1QL. * Execute queries on a schedule, or, one time.

The agent contains a datasourceexamplecouchbase.json and queryexamplecouchbase.json under the examples folder of the agent installation to get you started. * Edit those to point to your database and modify the queries to pull your data. * Move it into the config directory (datasource_XXX.json files first if the Agent is running).

Datasource Configuration:

Parameter Comments
name Unique Datasource Name.
datasource Set value to couchbase
url URL to connect to, where applicable for the datasource. Example for Couchbase: localhost:3306/test
userId User id to connect, where applicable.
Password Password, where applicable
userId User id to connect, where applicable.

Query Configuration:

Query Config Params Comments
entityName Dataset Name Identifier
identifier A unique identifier for the dataset. Either identifier or entityName must be specified.
dsName Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required.
queryStr Couchbase N1QL query to execute. Required.
frequencyType One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved)
frequency Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes
startTime Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm
c9QLFilter Optional post processing of the results using Cloud9QL. Typically uncommon against SQL based datastores.
overrideVals This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present.

Ordering of join elements

As an option you can control the order of join elements by clicking on the up/down arrow buttons on the left hand side bar. These buttons become available as soon as at least one join dataset is selected.

Ordering of join elements

Examples

Datasource Example:

 [
     {
      "name":"demoMySQL",
       "url":"localhost:3306/test",
      "datasource":"mysql",
       "userId":"a",
      "password":"b"
      }
  ]

Query Examples:

[
   {
      "entityName": "Couchbase Demo",
      "queryStr": "select `brewery_id`, avg(`abv`), count(`name`) from `beer-sample`               where `type` = \"beer\" group by `brewery_id` limit 10000",
      "c9QLFilter": "",
      "dsName": "demoCouchbase",
      "overrideVals": {
          "replaceAll": true
      },
      "frequencyType":"minute",
      "frequency":10
    }
 ]

The query is run every 10 minutes at the top of the hour and replaces all data for that dataset in Knowi.