Multi Data Source Joins


Knowi facilitates joins across multiple datasources to process, blend and store combined results seamlessly.

For example, if one datasource stores data about a supplier with a customer identifier, and another that stores data about the customer and billing information. Traditionally, to perform look ups from one to another requires engineering efforts along with mapping storage for the results. Knowi enables you to join across the same or disparate SQL or NoSQL databases to stitch the data back together into a single result along with the ability to store and track it.

Notes: Joins can be across disparate datasources or within the same datasource.

Joins can be setup via the UI or using the Cloud9Agent (inside your network).


In the following example, we'll connect to a MongoDB database to query some data that includes a customer name and then perform a look up against a MySQL database to pull customer addresses.

  1. From the datasource menu on the left hand side bar, click on MongoDB. Accept the defaults (points to a MongoDB instance hosted by us).

  2. Set up a MySQL datasource and accept the defaults.

  3. Generate a Mongo Query. In this case we will use data the Query Generator to dynamically generate the queries.

    a. select "sendingactivity" from the collections drop down menu. b. select "sent", "customer" from the metrics drop down menu. Multi Datasource

  4. Click "preview" to view the results.

  5. Click "join".

  6. Add a new MySQL datasource from the dropdown menu.

  7. Use Query Generator to dynamically generate the queries.

    a. select "customer" from the collections drop down menu

  8. Map fields from previous queries to the current one using "join fields".

    a. enter in "customer=customer" into the join fields box

  9. Click "preview" to view the results.

    Multi Datasource Join

Note: To specify multiple keys use 'AND': customer=customer AND joinAfield=joinBField

The default join type is INNER JOIN when a join field is present. When a join field is not present, the data is blended together across the two datasets.

Other Join types supported: LEFT OUTER JOIN, FULL OUTER JOIN, RIGHT OUTER JOIN, LOOP JOIN. To specify a different join type, add the join type into the Join Fields. Example: FULL OUTER JOIN customer=customer

Notes: Additional Cloud9QL section appears at the bottom of the page as soon as a joined data source is selected. This section can be used to apply a Cloud9QL query as the very last step of processing.

Notes: Fish eye icon appears on the left hand side of the page near each of the Query Builder section as soon as a joined data source is selected. This icon can be used to preview the result of each join part separately. Preview per join


Loop Join is a special join type where results that implements a for..loop join where the first query results is fed into the second query one row at a time based on token(s) defined by the join condition. For each row of the first query's result, the second query will be executed and its result will be combined with the current row.

For example, consider 2 datasets:

id | user                      user | sex
---------                      ----------
 1 | xxx                       xxx  |  M
 2 | yyy                       yyy  |  F

First query:

select *

Second query:

select * where user = USER_TOKEN

Now the LOOP JOIN condition can be defined as:


With the above definition, the execution will be as follows:

  1. Execute the first query which gives 2 rows as result.
  2. For each row of the above result, execute the second query with USER_TOKEN token replaced with actual value of user field of that row and join with it. >

        select * where user = xxx 
        select * where user = yyy

    The final result of the join will be:

    id | user | sex
    1 | xxx  |  M
    2 | yyy  |  F

    Note: For REST API datasources, the any tokens defined in the end point or headers will be replaced.

Large Datasets: Joins are performed on the results of the two queries, where the join is performed in memory within Knowi, with a disk swap for larger datasets. Note that data will be transferred over the network (into Knowi for cloud based modes, or on the agent if using an agent) for the results of each query before the join is performed. For larger dataset joins that spans millions of records, use the agent.


As an alternative to the UI based approach, joins can be specified on your agent directly.

For more details on agent setup and configuring datasource and query files, see Cloud9Agent.

Example query_xxx.json with joins:

        "entityName":"Join Example",
        "queryStr":"db.pagehits.find({lastAccessTime: { $exists: true}}).limit(10)",
        "c9SQLFilter":"select date(lastAccessTime) as date, count(*) as hits_count group by date(lastAccessTime)",
        "joinQueries": [
                "queryStr":"db.pagehits.find({lastAccessTime: { $exists: true}}).limit(10)",
                "c9SQLFilter":"select date(lastAccessTime) as date, sum(hits) as hits_total group by date(lastAccessTime)",
                "joinKeys": { "date": "date" }