Queries & Data Exploration
The Query section provides a powerful set of features that helps you build visual insights quickly:
- Data exploration & query generation capabilities
- Execute native queries on your database
- Run Cloud9QL queries on top of the returned result set
- Ability to join multiple datasources and query against them
- Ability to save query results into the ElasticStore or run directly against the database in real-time
- Option to schedule query executions
- Drag & Drop analytics on the results
- Click on Datasources on the left toolbar. Choose the datasource you want to work with. By default, Knowi provides live databases for demo purposes to test end to end flow.
- Connectivity: We provide multiple connectivity options
- Once the datasource is saved, configure queries by clicking on the link to enter the query page.
You have the option to either write the queries directly into the given query box or use the Query Generator to generate queries for you.
Schema on Read and Field Exploration
Depending on the datasource, a list of collections/tables along with field samples (or column names in case of relational databases) will be automatically detected.
There are no data models to define or schema definitions required upfront.
For unstructured/semi-structured sources, Knowi uses native APIs for the database to infer fields where supported by the database, or fall back to sampling the data to determine fields.
Query Builder & Writing Queries
Choose Filters, Metrics and dimensions to build your query. The queries are automatically generated on the right as your filters/metrics/dimensions are applied. Filters provide the ability to build a complex query by using logic operators 'And', 'Or'. Also, the 'In' filter condition allows to use several values for each filter. Click on Preview button to preview the data.
Instead of using the query generator, queries can directly be created via the Query box on the right.
Cloud9QL can be used to post process/transform the return data, to complement native queries. This is not a replacement to the underlying query, but offers powerful analytics functions on the results returned.
The query editor is a versatile text editor specialized for editing code and comes with a number of language modes and add-ons that implement more advanced editing functionality. Some of the supported functionality include key-maps such as:
- Start searching: (Ctrl-F / Cmd-F)
- Find next: (Ctrl-G / Cmd-G)
- Find previous: (Shift-Ctrl-G / Shift-Cmd-G)
- Replace: (Shift-Ctrl-F / Cmd-Option-F)
- Replace all: (Shift-Ctrl-R / Shift-Cmd-Option-F)
- Persistent search (dialog doesn't autoclose, enter to find next, Shift-Enter to find previous): (Alt-F)
- Jump to line: (Alt-G)
Use the Preview button to preview the results of your query. From here you can fine-tune your desired output, configure
visualizations, and when done, select "Save and Run" to save and apply your changes.
Click on Join Query button (or the icon) to join queries from different datasources.
- Choose a datasource to join with
- Generate or Write Queries
- Specify Join Keys and the type of Join. If there are no join keys applicable if you are blending in data across two datasources, leave it empty. Otherwise, specify the keys to join with. Join Keys must be
a field returned in the queries.
customer=customer AND joinAfield=joinBField
The default join type is INNER JOIN when a join field is present. Other Join types supported: LEFT OUTER JOIN, FULL OUTER JOIN, RIGHT OUTER JOIN. To specify
a different join type, add the join type into the Join Fields. Example: FULL OUTER JOIN customer=customer. See Multi-Datasource Joins for more details.
Direct Query vs ElasticStore
Queries can either be "Direct" or the results of a query can be cached. Direct mode will always execute the query directly against the database live when an associated visualization is displayed. The associated
load time is directly proportional to the time it takes for the query to execute.
Alternatively, the query results can be stored into the Knowi ElasticStore seamlessly. This is particularly useful in the following cases:
- Long running queries.
- Reduce load on your database for reporting workloads by offloading it to the ElasticStore.
- The results can be used as a parent dataset to other derived queries off this dataset. Useful in cases where the resulting dataset tracked in the ElasticStore runs into the millions of records.
- In cases with large raw datasets in the underlying database, this can be used to incrementally update the ElasticStore for that dataset without running the entire historical query.
In Warehousing Strategy section, check Direct Query for direct. By default, the results will be stored into the ElasticStore. If ElasticStore is use, specify a schedule for asynchronous query execution.
For more details, see Elastic Store documentation.
Once ready, click on Save to just save the query or hit Save & Run Now button to immediately execute the query.
With ElasticStore usage, Overwrite strategy provides you powerful control on how the data is updated for the dataset when the query is run. The values can be one of the following:
All will replace the existing values in the store for this dataset with new incoming values during query execution.
Empty will leave the existing results as is and append incoming data to it.
One or more field names, comma separated: Will update the existing dataset with new values for the same values of the fields, insert new records when a match is not found and leave existing recrods as is. This is useful for incrementally updating the dataset.
For example, let's say the raw data that a query operates on has 100's of millions of events, and say the query is doing a grouping by count, date and event type. In this case, trolling through the entire 100's of millions of records each time the query runs can induce loads into your system. You can structure the query in the
to something like the following:
select count(*) as Totals, date(eventTime) as Date, EventType
where eventTime >= $c9_yesterday
group by EventType, date(eventTime)
with Overwrite Strategy of Date, EventType and scheduled every few hours. In this example, it'll query only data from yesterday and the results are then upserted into the dataset.
<date field name>-<retention period>: Retention overwrite strategy for a date type field can be defined by
specifying how far back in time we would like to keep our data. The retention period can be defined in the
Time Unit format. For example, Date-1m will remove all existing rows in our
elastic-store which have Date field's value less than 1 month before the MIN value of the incoming data's Date field.
This can also be used in conjunction with our existing field value overwrite (#3) above. For example, to add
retention of 3 months to the example in #3 above, we just need to make the Overwrite Strategy to be Date, EventType, Date-3m
ForceAll : When a query returns no data, the default behaviour is that any previous data that exists in the ElasticStore is left as is. ForceAll is an alternative to All that removes all data
associated to that dataset if no data was found for the most recent query.
Queries can be templated with runtime parameters that can be passed in dynamically (from embedded applications to filters). For more details, see Runtime Parameters documentation.
Once report is saved, it will be listed in the Queries section.
Each query listing comes with several options on the right:
Run Now: This option runs the query. By default, all the queries are saved in the ElasticStore so every time, this option is chosen, the query is executed against the warehouse. If chosen otherwise, the query is executed real-time against the database.
Edit: This option lets you edit the queries and save them again.
Data Config: This option tells you about the underlying structure of the query. For example, the figure below gives the following information:
Query Report named TestReport is executed on the dataset named TestReport and is represented by the widget called TestReport
Add a Derived Dataset: This option allows you to create new derived queries and datasets from the data. The original dataset will be the parent and the new one will be derived from it. Applicable for data stored in ElasticStore.
Share: Lets you share the query report with other user groups or users.
Clone Lets you build a query, using the existing query as a baseline.
Delete: Deletes a query. You can opt to remove all widgets/visualizations associated with the particular query.
History: This option provides you with the list of all historical changes made for this query. All the changes are tracked and you can easily get back to them in case of necessity.
Notes: Additional revert button is available as soon as any historical item is expanded. You can use this button in order to revert the query state to the one that you selected.
Click on Download Queries button if you want to download the queries.
Cancel: Cancel a currently running query. The behavior of this action depends on datasource type of the running query.
Widgets/Visualizations/Adding to Dashboards
When a report is created, a visualization is automatically created for it.
To add the visualization into to a dashboard as a widget, go to a dashboard (or create a new one), drag and drop the widget into the dashboard.