Cloud9QL

Cloud9QL is an optional SQL-like syntax to transform data in powerful ways when needed.

Uses:

Basics

Paste your own data below and write your own queries to experiment (or use the default data and queries):

Select everything:

select *

Keyword 'select' is optional.

Select specific fields:

select Sent, Date

Field Aliases:

select Sent as Sent Messages, Date

Conditions:

select * where opened > 100000
select * where campaign_name like artists
select * where Message Type = Transactional and sent > 200000

Supported Operators: >, >=, <, <=, !=, like, not like

Logical Operators: and, or

Ordering:

select * where opened > 100000 order by opened desc

Supported order types: asc, desc

Limit:

select * where opened > 100000 order by opened desc limit 1

Unique records:

 select distinct *
 select distinct customer

Aggregations

Aggregations functions enable grouping/dimensions from the data.

Without GROUP BY

 select sum(sent)
 select sum(sent), avg(sent), count(*), median(sent), max(sent), min(sent)

Supported: sum, count, avg, median, max, min

With GROUP BY

Enables aggregations based on one or more groups/dimensions.

  select sum(sent) as Total Sent, Customer group by Customer

ARRAY

Combines values on multiple rows of a given field into an array based on group by field(s).

ARRAY(<field>)
ARRAY(<field>, <remove-duplicates-flag>)
ARRAY(<field>, <remove-duplicates-flag>, <filter-out-null-flag>)

select Stock, array(Price) as Trends group by Stock
select Stock, array(Price, true, true) as Trends group by Stock

Functions

Arithmetic

Arithmetic Operations

Arithmetic operations can be used within the query

select (opened/sent)*100 as Open Rate, Customer

Supported operators:

+
-
*
+
/
^
%
abs
acos
asin
atan
cbrt
ceil
cos
cosh
floor
sqrt
tan
Standard Deviation

Useful to determine variance of a set of values;

select sd(opened) as Std Deviation, Customer group by customer

Date Operations

Cloud9QL will automatically attempt to parse various date formats.

Use str_to_date(<date>,<format>) for unsupported formats.

EPOCH_SECS

Allows for the differentiation between the epoch with milliseconds vs without milliseconds. This is specifically for REST queries

For example: date={$c9today,epochsecs} will format today into epoch seconds format

EPOCH_TO_DATE

Converts an Epoch number of seconds to a readable date format.

select epoch_to_date(date) as datetime
DATE

Truncates a date to midnight. When used within group by performs aggregation by date.

select date(date), sent
select date(date) as Sent Date, sum(sent) as Total Sent group by date(date)
DAY_OF_WEEK

Day name of the week (Sunday, Monday etc)

select day_of_week(date), sum(sent) as Total Sent group by day_of_week(date)
DAY_OF_MONTH

Day of the month (1, 2, 3 etc)

select day_of_month(date), sum(sent) as Total Sent group by day_of_month(date)
DAYS_IN_MONTH

Length of the month in days (28, 29, 30, 31 etc)

select days_in_month(date), sum(sent) as Total Sent group by days_in_month(date)
WEEK

Truncates to a date to the beginning of the week (Sunday). When used within group by performs aggregation by week.

select week(date) as Sent Week, sum(sent) as Total Sent group by week(date)

By default, WEEK(<date-field>) returns the Sunday date at midnight of a given week but with WEEK(<date-field>, offset), you can offset the days and alter the day of the week returned.

Offset Reference:

1 - Monday

2 - Tuesday

3 - Wednesday

4 - Thursday

5 - Friday

6 - Saturday

Example: WEEK(<date-field>, 1): returns the Monday date at midnight of a given week.

WEEK_OF_YEAR

Week Number integer for the input date

select week_of_year(date) as Sent Week, sum(sent) as Total Sent group by week_of_year(date)

NOTES:

  1. A week is defined from Monday-Sunday regardless of month or year.

  2. All weeks are 7 days long

  3. Weeks are not dependent on the month but it is possible to have 5 weeks associated with a day in a month depending on how the days are aligned, For example, if the 1st of a month falls on a Saturday, then that day will belong to the week starting on Monday which falls in the previous month.

  4. The first week of a year will always follow the last week in December of the previous year

MONTH

Truncates to the 1st of the month. Aggregates data on a monthly basis when used within group by.

select month(date) as Sent Month, sum(sent) as Total Sent group by month(date)
MONTH_OF_YEAR

Month of the year (1, 2, 7, 12 etc)

select month_of_year(date), sum(sent) as Total Sent group by month_of_year(date)
QUARTER

Truncates to the beginning of the quarter. Aggregates data on a quarterly basis when used within group by.

select quarter(date) as Sent Quarter, sum(sent) as Total Sent group by quarter(date)
YEAR

Truncates to the 1st of the year. Aggregates data on a yearly basis when used within group by.

select year(date) as Sent Quarter, sum(sent) as Total Sent group by year(date)
HOUR

Truncates to the hour for dates with timestamps

select HOUR(date) as Sent Hour, sum(sent) as Total Sent group by hour(date)
MINUTE

Truncates/Groups to the minute for dates with timestamps

select MINUTE(timestamp) as Sent Hour, sum(sent) as Total Sent group by MINUTE(timestamp)
NOW

Current date/time

select now()
DATE_FORMAT

Converts a date into another format

DATE_FORMAT(<date>,<format>)

select date_format(date,dd-MMM) as Display Format

Options:

y Year
M Month
w Week of Year
W Week in month
D Day in Year
d Day in Month
F Day of Week in Month
E Day name in week. Example: Tuesday,Tue
a Am/PM marker
H Hour in day (0-23)
h Hour in am/pm (1-12)
m Minute in hour
s Second in minute
S Millisecond
z Time zone
Z Time zone
   
DATE_ADD

Add a datetime amount to a date

DATE_ADD(<date>,<amount>)

select date_add(date,+1y) as Date
STR_TO_DATE

Date conversion from a String Converts a string into date using a provided format

STR_TO_DATE(<date>,<format>)

select str_to_date(date,dd-MMM-yy HH:mm) as Converted Date
Date Tokens

The following reserved tokens enable date queries based on current date/time:

$c9_now Current Time
$c9_thishour 00:00 of the Current hour
$c9_today Midnight of the current date
$c9_yesterday Midnight, yesterday
$c9_thisweek Start of the current week (Sunday midnight)
$c9_lastweek Start of last week (Sunday midnight)
$c9_thismonth Midnight of the 1st of the current month
$c9_lastmonth Midnight of the 1st of the last month
$c9_thisquarter Midnight of the 1st of the current quarter (Jan, April, July, Oct)
$c9_lastquarter Midnight of the 1st of the last quarter (Jan, April, July, Oct)
$c9_thisyear Midnight, Jan 1, of the current year
$c9_lastyear Midnight, Jan 1, of the last year

select * where date > $c9_thisyear

In addition, these can be further manipulated with +/- operands along with time unit identifiers. For example:

select * where date > $c9_thisyear+2m

Gets data from March onwards

select * where date > $c9_yesterday+2h

Data from 2:00 AM yesterday

Time Units

The following are the list of supported time units:

min Minutes
h Hours
d Days
w Weeks
m Months
q Quarters
y Years

Timezones

Default timezone is US/Pacific for date display within Knowi. On-premise agents inherit the server timezone.

Custom Timezones can be set in the query using:

set time_zone=US/Eastern;

Full list of Timezones here.

Example:

Date Deltas

This calculates the amount of time between two date-times objects to a date/time unit. The result will be a positive whole number, even if the end is before the start.

For example, number of minutes between two date-times:

MINUTES_DELTA(<date field>,<date field>)

select minutes_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as minutes_delta;
select minutes_delta(now(), date) as minutes_delta;

Number of hours:

HOURS_DELTA(<date field>,<date field>)

select hours_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as hours_delta;
select hours_delta(now(), date) as hours_delta;

Days:

DAYS_DELTA(<date field>,<date field>)

select days_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as days_delta;
select days_delta(now(), date) as days_delta;

Months:

MONTHS_DELTA(<date field>,<date field>)

select months_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as months_delta;
select months_delta(now(), date) as months_delta;

Rolling & Cumulative

Cloud9QL provides a set of operations which can be utilized to calculate rolling and cumulative operations such as accumulate, growth, delta, simple moving average, cumulative moving average, and time moving average.

The standard usecase for these operations is to allow computation <operation> of a <value field> across a set of <dimension field(s)> and optionally grouping by a set of <grouping field(s)>.

<operation>(<value field>[, <grouping field(s)>]);

For example, compute the DELTA of Sent across Week grouping by Customer. In this example:

<operation>: DELTA
<value field>: Sent
<dimension field(s)>: Week
<grouping field(s)>: Customer

Example:

select Customer, delta(Sent, Customer) as SentDelta

There is one important restriction when using these Cloud9 QL functions: the input data need to be ordered by the <grouping field(s)> and <dimension field(s)> in that order.

ACCUMULATE

Creates cumulative totals for a field between records, given a sorted dataset.

accumulate(<value field>[, <grouping field(s)>]);

select accumulate(sent), date

The above example returns a cumulative sum of sent count for a pre-sorted date order.

GROWTH

Calculates a growth percentage for a field between records, for a sorted dataset.

growth(<value field>[, <grouping field(s)>]);

 select growth(sent), date
DELTA

Calculates a difference for a field between records, for a sorted dataset.

delta(<value field>[, <grouping field(s)>]);

 select delta(sent), date
SMA

Simple moving average based on a field and a window size for it. Assumes a sorted dataset.

SMA(<value field>, <window size>[, <grouping field(s)>]);

select sma(sent, 10)

CMA

Cumulative moving average returns the moving average of all data up to the current data point.

CMA(<value field>[, <grouping field(s)>]);

select cma(sent)

TMA

Time moving average based on a field, date field, and a window time unit size for it. See Time Units for all available time units. Assumes a sorted dataset

TMA(<value field>, <date field>, <time unit window>[, <grouping field(s)>]);

select tma(sent, date, 1w)

For more details on moving average definitions, see http://en.wikipedia.org/wiki/Moving_average

TMS

Time moving sum based on a field, date field, and window time unit size.

TMS(<value field>, <date field>, <time unit window>[, <grouping field(s)>]);

select tms(sent, date, 1w)
RANK

Rank of records, given a sorted dataset.

rank([<grouping field(s)>]);

select rank(), date

The above example returns the rank (increment by 1) of each row for a pre-sorted date order.

String & Number Operators

ROUND

Specify the number of decimal points to display

ROUND(<field>, <decimal points>)

select round(sent,1)
SUBSTRING

Substring between start and end indexes.

SUBSTRING(<field to check against>, < startIndex>,< length>)

 select substring(Message Type,0,10)
SUBSTRING_BEFORE

Substring before the first occurrence of a delimiter for a field value.

SUBSTRING_BEFORE(<field to check against>, < delimiter>)

 select substring_before(Message Type,someDelim)
SUBSTRING_AFTER

Substring after the first occurrence of a delimiter for a field value.

SUBSTRING_AFTER(<field to check against>, < delimiter>)

  select substring_after(Message Type,someDelim)
CONCAT

Concatenates multiple columns together. When a field name does not exist in the current dataset, a fixed string is used.

CONCAT(<field name>, < anotherfield>, < yetanotherfield>,...)

 select concat(Customer, for Week of, Week)
CONV

For converting hex strings into integers.

CONV (<field name>, <field radix>, <desired radix>)

 select conv(hexfield, 16, 10)
SPLIT

Split a string of elements separated by separator into an array. If separator is not specified, comma will be used.

SPLIT(<field name>, <separator>)

 select split(Customer, ",")
ARRAY_TO_STRING

Join elements of an array value together separated by separator. When a field name does not exist in the current dataset, a fixed string is used.

ARRAY_TO_STRING(<field name>, <separator>)

 select array_to_string(Customer, ", ")
UPPER

Upper cases a string

UPPER(<field name>)

 select upper(Customer)
LOWER

Lower cases a string

LOWER(<field name>)

 select lower(Customer)
TRIM

Removes leading and trailing spaces from a string

TRIM(<field name>)

 select trim(address)
LENGTH

Returns the length of a string.

LENGTH(<field name>)

 select length(address)
CURRENCY_FORMAT

Formats a number to a locale specific currency format. Defaults to US currency format (en_US) if locale is not specified.

CURRENCY_FORMAT(<field name>, <locale>)

CURRENCY_FORMAT(<field name>, <decimal points>)

CURRENCY_FORMAT(<field name>, <locale>, <decimal points>)

  select currency_format(revenue)

Example with Locale:

  select currency_format(revenue,en-GBP)
NUMBER_FORMAT

This function allows you to control the display of leading and trailing zeros, prefixes and suffixes, grouping (thousands) separators, and the decimal separator.

NUMBER_FORMAT(<number>,<format>)

select number_format(clicks,##,###.00) as Number of clicks

The output for the preceding lines of code is described in the following table. The value is the number, a double , that is to be formatted. The pattern is the String that specifies the formatting properties. The output, which is a String, represents the formatted number:

value pattern output explanation
123456.789 ###,###.### 123,456.789 The pound sign (#) denotes a digit, the comma is a placeholder for the grouping separator, and the period is a placeholder for the decimal separator.
123456.789 ###.## 123456.79 The value has three digits to the right of the decimal point, but the pattern has only two. The format method handles this by rounding up.
123.78 000000.000 000123.780 The pattern specifies leading and trailing zeros, because the 0 character is used instead of the pound sign (#).
12345.67 $###,###.### $12,345.67 The first character in the pattern is the dollar sign ($). Note that it immediately precedes the leftmost digit in the formatted output.
12345.67 \u00A5###,###.### ¥12,345.67 The pattern specifies the currency sign for Japanese yen (¥) with the Unicode value 00A5.
REGEX_REPLACE

Replaces each substring of this string that matches the given regular expression with the given replacement.

In case replacement parameter is not provided an empty string value "" is used as default replacement.

REGEX_REPLACE(<field name>, < regex>) REGEX_REPLACE(<field name>, < regex>, < replacement>)

For example, to replace all occurrences of white spaces in a string

  select regex_replace('Morris Park Bake Shop', '\s') as regex_replaced; 
  ==> MorrisParkBakeShop
REGEX_EXTRACT

Extract and return all matches (non-overlapped) for the regular expression from the given input string field.

In case there is no match, NULL will be returned.

REGEX_EXTRACT(<field name>, <regex>, [<extract groups>])

For example, to extract all string occurrences between (and include) '%' characters

  select regex_extract("|Morris Park| |Bake Shop|", "\|([^|]*)\|");
  ==> ["|Morris Park|","|Bake Shop|"]
  select regex_extract("|Morris Park| |Bake Shop|", "\|(([^|]*))\|", true); 
  ==> [ ["|Morris Park|","Morris Park"],["|Bake Shop|","Bake Shop"] ]
IFNULL

Returns an alternate value to be used in case the specified field does not exist or the value is NULL.

IFNULL(<field name>, <alternate value>)

  select IFNULL(CustomerName, "N/A")

You can also specify an alternate column in place of an alternate value

IFNULL(<field name>, <another field name>)

  select IFNULL(CustomerName, CustomerId)

Other

LAG

Useful to access data from a previous row with an optional row offset;

select LAG([, offset[, default]])
select LAG(customer, 3)

Users can also group data in the LAG function and compare the grouped results with the previous values. Use the keyword Group by for grouping.

select LAG([, offset[, default]]) Group by (field_name)
select LAG(cuisine, 3) Group by region
CASE WHEN (IF..ELSE)

CASE WHEN statements provide great flexibility when dealing with buckets of results or when you need to find a way to filter out certain results. Another way to think of it is it's a conditional logic similar to IF-THEN statements in other programming languages.

When using a CASE WHEN statement, it's important to remember you need a condition, what to do when that condition is met, and an END clause. A simple example is below:

CASE 
    WHEN condition
        THEN result
    ELSE other_result
END

For example,

SELECT
    CASE
      WHEN country = 'USA'
        THEN 'North America'
      WHEN country = 'Mexico'
        THEN 'North America'
      ELSE country
END AS 'West Region'
PERCENTILE

Returns the value of the field for the specified percentile rank.

PERCENTILE(<field>, <percentile>)

select percentile(sent,75)
TRANSPOSE

Pivots row values for a field to columns

TRANSPOSE(<field to transpose>, < current column name>)

select transpose(Message Type,Sent)

To collapse based on a key field, use:

TRANSPOSE(<field to transpose>, <current column name>, <key column name>)

select transpose(Message Type, Sent, Customer)
REVERSE_TRANSPOSE

REVERSE TRANSPOSE

Opposite of Transpose, folds columns into rows

Syntax:

REVERSE_TRANSPOSE(<New ID column>, <New Value column>, <Value column 1>, ...., <Value column N>)(, , , ...., )

Example: http://recordit.co/8HRx7aJtmB

As you can see, I have initial data with 5 columns. After executing: select reversetranspose(NEWID, NEWV, V1, V2, V3) I get them value columns folded into the new column, where NEWID's value is the old value column names (V1, V2, V3 one for each new row) and the NEW_V contains the corresponding value.

When you have multiple columns that you want to fold in, specify all the columns you want to pin it by, specify the "pin" columns at the front, followed by new ID and the value column, followed by a *.

Syntax:

REVERSE_TRANSPOSE(Customer, Campaign, State, <New ID column>, <New Value column>, *)

This will fold in all the columns except for the columns in the first section.

INJECT

Injects last value records in for a date range when the values are not present for that date.

For example, if a sensor emits data point 100 for 01/01/2016 and and the next change of value is at 200 10 days later, you can use the inject function to inject 100 into all dates in between that range.

INJECT(<Date Field>, <Start Date for Injecting>, <End Date for Injecting>, <Injection Frequency> [, <Select Fields>])
[group by <Dimension 1>[, ..., <Dimension N>]]

The optional <Select Field> can either be * (for all fields) or a comma separated list of selected fields from input data.

select inject(date, start_range_date, end_range_date, 1d, Name, Division, Score)
group by Name, Division
APPEND

Combines the results of two queries into the same dataset

(select sum(sent) as Facebook where customer=Facebook) append (select sum(sent) as LinkedIn Sent where customer=Linkedin)


Nested

Use the dot notation to query nested elements and the array notation for selecting items in an array. The example below uses a JSON nested string and uses Cloud9QL to parse it.

To unwind/expand an array, use the expand syntax.

Example:

select customer, nestedObj.secondLevel.y as Nested;
select expand(Nested);

Note that expand must be specified on its own, without any other elements within the select.


Chaining Statements

Multiple statements can be chained one after the other using a semi-colon delimiter, where the results of the first statement is passed in to the second and so on.

Example:

select sum(sent) as Sent, sum(opened) as opened, customer, month(date) as Month group by customer, month(date);
select (opened/sent)*100 as Open Rate, Opened, Sent, Customer, Month;
select round(Open Rate,1) as Open Rate, Opened, Sent, Customer, Month where Month > $c9_thisyear+2m and Open Rate > 20 order by Open Rate asc

This:

a) gets the Total Sent and Opens on a monthly basis for each customer,

b) then calculates the open rate based on the data from previous step since March,

c) adds additional criteria along with rounding the open rate.


IP to Geo-Location

IP to GEO function enables city level geo location from IP addresses.

Note: This uses GeoLite2 data created by MaxMind.

Example:

select ip_to_geo(IP), Some Field

This:


Geocoding - Lat/Long from Address

Retrieves lat/longs from addresses using geocod.io.

Note: This requires your own API key from geocod.io.

Example:

select geocode(<fullAddress>,<apiKey>)

This:


Forecasting/Predictions

Cloud9QL supports the ability to forecast/predict future data points based on previous data history for any dataset.

To predict 1 point in the future:

select PREDICT(<field to predict>, <date field>, <start date for prediction>[, <prediction model>])

To predict multiple data point in the future:

select PREDICT(<Field to Predict>, <Date Field>, <Start Date for Prediction>,
    <Prediction Frequency>, <Prediction Data Points>[, <prediction model>])

To predict multiple data point in the future based on secondary dimension(s) (ie: grouping(s)):

select PREDICT(<Field to Predict>, <Date Field>, <Start Date for Prediction>,
    <Prediction Frequency>, <Prediction Data Points>[, <prediction model>])
group by <Dimension 1>, ..., <Dimension N>

This:

Example:


Cohort

Cohorts are useful to determine a grouping of data across a date dimension. For example, how many users remain active n months after initial sign up, etc.

We currently support 2 types of input data.

select COHORT(<Date Column>, <Cohort Date Definition>, <Cohort Period Definition>), <Cohort Operation>
group by <Cohort Date>, <Cohort Period>

Note:

  1. Input data needs to be sorted by Date ascending order.

  2. Cohort Period returns a number (ie: the period) or a date. Example:

    a. 1m: Cohort Period as number

    b. (1m): Cohort Period as Date

Example 1: If we already have the cohort date populated

Example 2: If we only have transactional events like the following example:

Example 3: Cohorts can be used in combination with transpose to flatten the result based on date

Example 4: A common cohort is retention in percentage format which can be computed as follows:


Expand

Unwinds nested field values: This function will expand array, map, or array of maps data structure into rows.

select EXPAND(<Column with Nested Value>)

Example: In this example, Name field's value is an array of map of Last Name and First Name.

Note that this function must be used in isolation, i.e., cannot be used in combination with others. Use query chaining to manipulate the results:

select expand(Name);
select * where First Name like John


EXPAND_ARRAYS

Unwinds multiple nested field values: This function can expand multiple arrays, maps, or arrays of maps data structures into rows.

select EXPAND_ARRAYS(<Column with Nested Value 1>, ..., <Column with Nested Value N>)

Example: In this example, there are two nested objects Grade and Address. Grade field's value is an array of map of three fields date, grade, and score. Address field's value is a map of four fields building, coord, street, zipcode.

Note that this function must be used in isolation, i.e., cannot be used in combination with others. Use query chaining to manipulate the results:

select nestedObj1 as Nested1, nestedObj2.secondLevel.y as Nested2;
select expand_arrays(Nested1, Nested2);

Nested fields before unwinding with EXPAND_ARRAYS() function:

expandarray

To unwind/expand multiple arrays, use the expand_arrays syntax.

Example:

select nestedObj1 as Nested1, nestedObj2.secondLevel.y as Nested2;
select expand_arrays(Nested1, Nested2);

Nested fields after unwinding with EXPAND_ARRAYS() function: afterexpandarray

Note that expand_arrays must be specified on its own, without any other elements within the select.

EXPANDARRAYSWITH_DEFAULT

A more powerful version of EXPAND_ARRAYS. This function can expand multiple arrays, maps, or arrays of maps data structures into rows. It also allows you to fill in blank fields of expanded arrays with a default value like nulls or a chosen value if the arrays are different in size.

EXPAND_ARRAYS_WITH_DEFAULTS(<field 1>, 0, <field 2>, now(), <field 3>, LAST, <field 4>, <value from another field on the same row>, ..., <field N>, NULL)

Example: In this example, there are two nested objects Grade and Address. Grade field's value is an array of map of three fields date, grade, and score. Address field's value is a map of four fields building, coord, street, zipcode.

When you unwind grade and coord with EXPAND_ARRAYS(), you find that grade has more rows than coord.

Expand Array

With this function, you can choose a default value to fill in the blank spaces. This value will follow that object in the function. Below, we chose to fill in the blank spaces for the coord field with 0.

Note that a default value is required for all fields. If there is no particular default value you wish to add, simply enter null after the field as seen below for the field grade.

select grade, address.coord as coord;
select expand_arrays_with_defaults(grade, null, coord, 0);

The Result:

Expand Array

To unwind/expand a multiple arrays and fill in values in the function, use the expand_arrays_with_defaults syntax.

Example:

select nestedObj1 as Nested1, nestedObj2.secondLevel.y as Nested2;
select expand_arrays_with_defaults(Nested1,null, Nested2,0);

Note that expand_arrays_with defaults must be specified on its own, without any other elements within the select.

Parse

In case your data is a JSON string, PARSE function can be used to convert it to object which can be further manipulated and processed.

select PARSE(<JSON String Column>)

Example: In this example, Name field's value is an array of map of Last Name and First Name.


Encryption/Obfuscation

In case your data needs to be encrypted/obfuscated for storing, you can use the ENCRYPT and DECRYPT functions to achieve this goal.

select ENCRYPT(<Column>)
select DECRYPT(<Column>)