Cloud9QL is an optional SQL-like syntax to transform data in powerful ways when needed.
Transform, clean and post-process data.
Complement existing API's/query syntax of datastores.
For example, if your data is in MongoDB, Cassandra or any other datasource that we support, Cloud9QL can be used a post processor to native query syntax.
Generate alternate views from the same data within Knowi.
Query file based data or datastores with no/limited query support.
Paste your own data below and write your own queries to experiment (or use the default data and queries):
Keyword 'select' is optional.
Select specific fields:
select Sent, Date
select Sent as Sent Messages, Date
select * where opened > 100000 select * where campaign_name like artists select * where Message Type = Transactional and sent > 200000
Supported Operators: >, >=, <, <=, !=, like, not like
Logical Operators: and, or
select * where opened > 100000 order by opened desc
Supported order types: asc, desc
select * where opened > 100000 order by opened desc limit 1
select distinct * select distinct customer
Aggregations functions enable grouping/dimensions from the data.
Without GROUP BY
select sum(sent) select sum(sent), avg(sent), count(*), median(sent), max(sent), min(sent)
Supported: sum, count, avg, median, max, min
With GROUP BY
Enables aggregations based on one or more groups/dimensions.
select sum(sent) as Total Sent, Customer group by Customer
Combines values on multiple rows of a given field into an array based on group by field(s).
ARRAY(<field>, <remove-duplicates-flag>, <filter-out-null-flag>)
select Stock, array(Price) as Trends group by Stock select Stock, array(Price, true, true) as Trends group by Stock
Arithmetic operations can be used within the query
select (opened/sent)*100 as Open Rate, Customer
Useful to determine variance of a set of values;
select sd(opened) as Std Deviation, Customer group by customer
Useful to access data from a previous row with an optional row offset;
select LAG([, offset[, default]]) select LAG(customer, 3)
Cloud9QL will automatically attempt to parse various date formats.
Use str_to_date(<date>,<format>) for unsupported formats.
Converts an Epoch number of seconds to a readable date format.
select epoch_to_date(date) as datetime
Truncates a date to midnight. When used within group by performs aggregation by date.
select date(date), sent select date(date) as Sent Date, sum(sent) as Total Sent group by date(date)
Day name of the week (Sunday, Monday etc)
select day_of_week(date), sum(sent) as Total Sent group by day_of_week(date)
Day of the month (1, 2, 3 etc)
select day_of_month(date), sum(sent) as Total Sent group by day_of_month(date)
Length of the month in days (28, 29, 30, 31 etc)
select days_in_month(date), sum(sent) as Total Sent group by days_in_month(date)
Truncates to a date to the beginning of the week (Sunday). When used within group by performs aggregation by week.
select week(date) as Sent Week, sum(sent) as Total Sent group by week(date)
Week Number integer for the input date
select week_of_year(date) as Sent Week, sum(sent) as Total Sent group by week_of_year(date)
Truncates to the 1st of the month. Aggregates data on a monthly basis when used within group by.
select month(date) as Sent Month, sum(sent) as Total Sent group by month(date)
Month of the year (1, 2, 7, 12 etc)
select month_of_year(date), sum(sent) as Total Sent group by month_of_year(date)
Truncates to the beginning of the quarter. Aggregates data on a quarterly basis when used within group by.
select quarter(date) as Sent Quarter, sum(sent) as Total Sent group by quarter(date)
Truncates to the 1st of the year. Aggregates data on a yearly basis when used within group by.
select year(date) as Sent Quarter, sum(sent) as Total Sent group by year(date)
Truncates to the hour for dates with timestamps
select HOUR(date) as Sent Hour, sum(sent) as Total Sent group by hour(date)
Truncates/Groups to the minute for dates with timestamps
select MINUTE(timestamp) as Sent Hour, sum(sent) as Total Sent group by MINUTE(timestamp)
Converts a date into another format
select date_format(date,dd-MMM) as Display Format
|w||Week of Year|
|W||Week in month|
|D||Day in Year|
|d||Day in Month|
|F||Day of Week in Month|
|E||Day name in week. Example: Tuesday,Tue|
|H||Hour in day (0-23)|
|h||Hour in am/pm (1-12)|
|m||Minute in hour|
|s||Second in minute|
Date conversion from a String Converts a string into date using a provided format
select str_to_date(date,dd-MMM-yy HH:mm) as Converted Date
The following reserved tokens enable date queries based on current date/time:
|$c9_thishour||00:00 of the Current hour|
|$c9_today||Midnight of the current date|
|$c9_thisweek||Start of the current week (Sunday midnight)|
|$c9_lastweek||Start of last week (Sunday midnight)|
|$c9_thismonth||Midnight of the 1st of the current month|
|$c9_lastmonth||Midnight of the 1st of the last month|
|$c9_thisquarter||Midnight of the 1st of the current quarter (Jan, April, July, Oct)|
|$c9_lastquarter||Midnight of the 1st of the last quarter (Jan, April, July, Oct)|
|$c9_thisyear||Midnight, Jan 1, of the current year|
|$c9_lastyear||Midnight, Jan 1, of the last year|
select * where date > $c9_thisyear
In addition, these can be further manipulated with +/- operands along with time unit identifiers. For example:
select * where date > $c9_thisyear+2m
Gets data from March onwards
select * where date > $c9_yesterday+2h
Data from 2:00 AM yesterday
The following are the list of supported time units:
Default timezone is US/Pacific for date display within Knowi. On-premise agents inherit the server timezone.
Custom Timezones can be set in the query using:
Full list of Timezones here.
This calculates the amount of time between two date-times objects to a date/time unit. The result will be a positive whole number, even if the end is before the start.
For example, number of minutes between two date-times:
MINUTES_DELTA(<date field>,<date field>)
select minutes_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as minutes_delta; select minutes_delta(now(), date) as minutes_delta;
Number of hours:
HOURS_DELTA(<date field>,<date field>)
select hours_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as hours_delta; select hours_delta(now(), date) as hours_delta;
DAYS_DELTA(<date field>,<date field>)
select days_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as days_delta; select days_delta(now(), date) as days_delta;
MONTHS_DELTA(<date field>,<date field>)
select months_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as months_delta; select months_delta(now(), date) as months_delta;
Rolling & Cumulative
Cloud9QL provides a set of operations which can be utilized to calculate rolling and cumulative operations such as accumulate, growth, delta, simple moving average, cumulative moving average, and time moving average.
The standard usecase for these operations is to allow computation <operation> of a <value field> across a set of <dimension field(s)> and optionally grouping by a set of <grouping field(s)>.
<operation>(<value field>[, <grouping field(s)>]);
For example, compute the DELTA of Sent across Week grouping by Customer. In this example:
<operation>: DELTA <value field>: Sent <dimension field(s)>: Week <grouping field(s)>: Customer
select Customer, delta(Sent, Customer) as SentDelta
There is one important restriction when using these Cloud9 QL functions: the input data need to be ordered by the <grouping field(s)> and <dimension field(s)> in that order.
Creates cumulative totals for a field between records, given a sorted dataset.
accumulate(<value field>[, <grouping field(s)>]);
select accumulate(sent), date
The above example returns a cumulative sum of sent count for a pre-sorted date order.
Calculates a growth percentage for a field between records, for a sorted dataset.
growth(<value field>[, <grouping field(s)>]);
select growth(sent), date
Calculates a difference for a field between records, for a sorted dataset.
delta(<value field>[, <grouping field(s)>]);
select delta(sent), date
Simple moving average based on a field and a window size for it. Assumes a sorted dataset.
SMA(<value field>, <window size>[, <grouping field(s)>]);
select sma(sent, 10)
Cumulative moving average returns the moving average of all data up to the current data point.
CMA(<value field>[, <grouping field(s)>]);
Time moving average based on a field, date field, and a window time unit size for it. See Time Units for all available time units. Assumes a sorted dataset
TMA(<value field>, <date field>, <time unit window>[, <grouping field(s)>]);
select tma(sent, date, 1w)
For more details on moving average definitions, see http://en.wikipedia.org/wiki/Moving_average
String & Number Operators
Specify the number of decimal points to display
ROUND(<field>, <decimal points>)
Substring between start and end indexes.
SUBSTRING(<field to check against>, < startIndex>,< endIndex>)
select substring(Message Type,0,10)
Substring before the first occurrence of a delimiter for a field value.
SUBSTRING_BEFORE(<field to check against>, < delimiter>)
select substring_before(Message Type,someDelim)
Substring after the first occurrence of a delimiter for a field value.
SUBSTRING_AFTER(<field to check against>, < delimiter>)
select substring_after(Message Type,someDelim)
Concatenates multiple columns together. When a field name does not exist in the current dataset, a fixed string is used.
CONCAT(<field name>, < anotherfield>, < yetanotherfield>,...)
select concat(Customer, for Week of, Week)
Join elements of an array value together separated by separator. When a field name does not exist in the current dataset, a fixed string is used.
ARRAYTOSTRING(<field name>, <separator>)
select array_to_string(Customer, ", ")
Upper cases a string
Lower cases a string
Removes leading and trailing spaces from a string
Returns the length of a string.
Formats a number to a locale specific currency format. Defaults to US currency format (en_US) if locale is not specified.
CURRENCY_FORMAT(<field name>, <locale>)
CURRENCY_FORMAT(<field name>, <decimal points>)
CURRENCY_FORMAT(<field name>, <locale>, <decimal points>)
Example with Locale:
This function allows you to control the display of leading and trailing zeros, prefixes and suffixes, grouping (thousands) separators, and the decimal separator.
select number_format(clicks,##,###.00) as Number of clicks
The output for the preceding lines of code is described in the following table. The value is the number, a double , that is to be formatted. The pattern is the String that specifies the formatting properties. The output, which is a String, represents the formatted number:
|123456.789||###,###.###||123,456.789||The pound sign (#) denotes a digit, the comma is a placeholder for the grouping separator, and the period is a placeholder for the decimal separator.|
|123456.789||###.##||123456.79||The value has three digits to the right of the decimal point, but the pattern has only two. The format method handles this by rounding up.|
|123.78||000000.000||000123.780||The pattern specifies leading and trailing zeros, because the 0 character is used instead of the pound sign (#).|
|12345.67||$###,###.###||$12,345.67||The first character in the pattern is the dollar sign ($). Note that it immediately precedes the leftmost digit in the formatted output.|
|12345.67||\u00A5###,###.###||¥12,345.67||The pattern specifies the currency sign for Japanese yen (¥) with the Unicode value 00A5.|
Replaces each substring of this string that matches the given regular expression with the given replacement.
In case replacement parameter is not provided an empty string value "" is used as default replacement.
REGEX_REPLACE(<field name>, < regex>) REGEX_REPLACE(<field name>, < regex>, < replacement>)
For example, to replace all occurrences of white spaces in a string
select regex_replace('Morris Park Bake Shop', '\s') as regex_replaced; ==> MorrisParkBakeShop
Extract and return all matches (non-overlapped) for the regular expression from the given input string field.
In case there is no match, NULL will be returned.
REGEX_EXTRACT(<field name>, <regex>, [<extract groups>])
For example, to extract all string occurrences between (and include) '%' characters
select regex_extract("|Morris Park| |Bake Shop|", "\|([^|]*)\|"); ==> ["|Morris Park|","|Bake Shop|"] select regex_extract("|Morris Park| |Bake Shop|", "\|(([^|]*))\|", true); ==> [ ["|Morris Park|","Morris Park"],["|Bake Shop|","Bake Shop"] ]
Returns an alternate value to be used in case the specified field does not exist or the value is NULL.
IFNULL(<field name>, <alternate value>)
select IFNULL(CustomerName, "N/A")
You can also specify an alternate column in place of an alternate value
IFNULL(<field name>, <another field name>)
select IFNULL(CustomerName, CustomerId)
CASE WHEN (IF..ELSE)
CASE WHEN statements provide great flexibility when dealing with buckets of results or when you need to find a way to filter out certain results. Another way to think of it is it's a conditional logic similar to IF-THEN statements in other programming languages.
When using a CASE WHEN statement, it's important to remember you need a condition, what to do when that condition is met, and an END clause. A simple example is below:
CASE WHEN condition THEN result ELSE other_result END
SELECT CASE WHEN country = 'USA' THEN 'North America' WHEN country = 'Mexico' THEN 'North America' ELSE country END AS 'West Region'
Returns the value of the field for the specified percentile rank.
Pivots row values for a field to columns
TRANSPOSE(<field to transpose>, < current column name>)
select transpose(Message Type,Sent)
To collapse based on a key field, use:
TRANSPOSE(<field to transpose>, <current column name>, <key column name>)
select transpose(Message Type, Sent, Customer)
Opposite of Transpose, folds columns into rows.
REVERSE_TRANSPOSE(<New ID column>, <New Value column>, <Value column 1>, ...., <Value column N>)(, , , ...., )
As you can see, I have initial data with 5 columns. After executing: select reversetranspose(NEWID, NEWV, V1, V2, V3) I get them value columns folded into the new column, where NEWID's value is the old value column names (V1, V2, V3 one for each new row) and the NEW_V contains the corresponding value.
When you have multiple columns that you want to fold in, specify all the columns you want to pin it by, specify the "pin" columns at the front, followed by new ID and the value column, followed by a *.
REVERSE_TRANSPOSE(Customer, Campaign, State, <New ID column>, <New Value column>, *)
This will fold in all the columns except for the columns in the first section.
Injects last value records in for a date range when the values are not present for that date.
For example, if a sensor emits data point 100 for 01/01/2016 and and the next change of value is at 200 10 days later, you can use the inject function to inject 100 into all dates in between that range.
INJECT(<Date Field>, <Start Date for Injecting>, <End Date for Injecting>, <Injection Frequency> [, <Select Fields>])
[group by <Dimension 1>[, ..., <Dimension N>]]
The optional <Select Field> can either be * (for all fields) or a comma separated list of selected fields from input data.
select inject(date, start_range_date, end_range_date, 1d, Name, Division, Score) group by Name, Division
Combines the results of two queries into the same dataset
(select sum(sent) as Facebook where customer=Facebook) append (select sum(sent) as LinkedIn Sent where customer=Linkedin)
Use the dot notation to query nested elements and the array notation for selecting items in an array. The example below uses a JSON nested string and uses Cloud9QL to parse it.
To unwind/expand an array, use the expand syntax.
select customer, nestedObj.secondLevel.y as Nested; select expand(Nested);
Note that expand must be specified on its own, without any other elements within the select.
Multiple statements can be chained one after the other using a semi-colon delimiter, where the results of the first statement is passed in to the second and so on.
select sum(sent) as Sent, sum(opened) as opened, customer, month(date) as Month group by customer, month(date); select (opened/sent)*100 as Open Rate, Opened, Sent, Customer, Month; select round(Open Rate,1) as Open Rate, Opened, Sent, Customer, Month where Month > $c9_thisyear+2m and Open Rate > 20 order by Open Rate asc
a) gets the Total Sent and Opens on a monthly basis for each customer,
b) then calculates the open rate based on the data from previous step since March,
c) adds additional criteria along with rounding the open rate.
IP to Geo-Location
IP to GEO function enables city level geo location from IP addresses.
Note: This uses GeoLite2 data created by MaxMind.
select ip_to_geo(IP), Some Field
- Queries the MaxMind database to determine location fields.
- The fields are added as separate columns to the result.
Geocoding - Lat/Long from Address
Retrieves lat/longs from addresses using geocod.io.
Note: This requires your own API key from geocod.io.
- Issues a batch geocoding request to geocod.io.
- The fields are added as separate columns to the result.
Cloud9QL supports the ability to forecast/predict future data points based on previous data history for any dataset.
To predict 1 point in the future:
select PREDICT(<field to predict>, <date field>, <start date for prediction>[, <prediction model>])
To predict multiple data point in the future:
select PREDICT(<Field to Predict>, <Date Field>, <Start Date for Prediction>, <Prediction Frequency>, <Prediction Data Points>[, <prediction model>])
To predict multiple data point in the future based on secondary dimension(s) (ie: grouping(s)):
select PREDICT(<Field to Predict>, <Date Field>, <Start Date for Prediction>, <Prediction Frequency>, <Prediction Data Points>[, <prediction model>]) group by <Dimension 1>, ..., <Dimension N>
- Loads the data points from input data.
- Determines the optimum prediction model.
- Predicts future data point(s).
- <prediction frequency> is in the format of <Number><TimeUnits> for example 1d means daily 2m specifies every 2 months.
- <prediction model> you can choose a specific model to be applied from one of the following supported models (case-sensitive):
Cohorts are useful to determine a grouping of data across a date dimension. For example, how many users remain active n months after initial signup, etc.
We currently support 2 types of input data.
select COHORT(<Date Column>, <Cohort Date Definition>, <Cohort Period Definition>), <Cohort Operation> group by <Cohort Date>, <Cohort Period>
Input data needs to be sorted by Date ascending order.
Cohort Period returns a number (ie: the period) or a date. Example:
a. 1m: Cohort Period as number
b. (1m): Cohort Period as Date
Example 1: If we already have the cohort date populated
Example 2: If we only have transactional events like the following example:
Example 3: Cohorts can be used in combination with transpose to flatten the result based on date
Example 4: A common cohort is retention in percentage format which can be computed as follows:
Unwinds nested field values: This function will expand array, map, or array of maps data structure into rows.
select EXPAND(<Column with Nested Value>)
Example: In this example, Name field's value is an array of map of Last Name and First Name.
Note that this function must be used in isolation, i.e., cannot be used in combination with others. Use query chaining to manipulate the results:
select expand(Name); select * where First Name like John
In case your data is a JSON string, PARSE function can be used to convert it to object which can be further manipulated and processed.
select PARSE(<JSON String Column>)
Example: In this example, Name field's value is an array of map of Last Name and First Name.
In case your data needs to be encrypted/obfuscated for storing, you can use the ENCRYPT and DECRYPT functions to achieve this goal.
select ENCRYPT(<Column>) select DECRYPT(<Column>)