asdfasdfsadf

Cloud9QL

Cloud9QL is an optional SQL-like syntax to transform data in powerful ways when needed.

Uses:

  • Transform, clean and post-process data.

  • Complement existing API's/query syntax of datastores.

    For example, if your data is in MongoDB, Cassandra or any other datasource that we support, Cloud9QL can be used a post processor to native query syntax.

  • Generate alternate views from the same data within Knowi.

  • Query file based data or datastores with no/limited query support.

Basics

Paste your own data below and write your own queries to experiment (or use the default data and queries):

Select everything:

select *

Keyword 'select' is optional.

Select specific fields:

select Sent, Date

Field Aliases:

select Sent as Sent Messages, Date

Conditions:

select * where opened > 100000
select * where campaign_name like artists
select * where Message Type = Transactional and sent > 200000

Supported Operators: >, >=, <, <=, !=, like, not like

Logical Operators: and, or

Ordering:

select * where opened > 100000 order by opened desc

Supported order types: asc, desc

Limit:

select * where opened > 100000 order by opened desc limit 1

Unique records:

 select distinct *
 select distinct customer

Aggregations

Aggregations functions enable grouping/dimensions from the data.

Without GROUP BY

 select sum(sent)
 select sum(sent), avg(sent), count(*), median(sent), max(sent), min(sent)

Supported: sum, count, avg, median, max, min

With GROUP BY

Enables aggregations based on one or more groups/dimensions.

  select sum(sent) as Total Sent, Customer group by Customer
ARRAY

Combines values on multiple rows of a given field into an array based on group by field(s).

ARRAY(<field>)
ARRAY(<field>, <remove-duplicates-flag>)
ARRAY(<field>, <remove-duplicates-flag>, <filter-out-null-flag>)

select Stock, array(Price) as Trends group by Stock
select Stock, array(Price, true, true) as Trends group by Stock

Functions

Arithmetic

Arithmetic Operations

Arithmetic operations can be used within the query

select (opened/sent)*100 as Open Rate, Customer

Supported operators:

+
-
*
+
/
^
%
abs
acos
asin
atan
cbrt
ceil
cos
cosh
floor
sqrt
tan
Standard Deviation

Useful to determine variance of a set of values;

select sd(opened) as Std Deviation, Customer group by customer

Date Operations

Cloud9QL will automatically attempt to parse various date formats.

Use str_to_date(<date>,<format>) for unsupported formats.

DATE

Truncates a date to midnight. When used within group by performs aggregation by date.

select date(date), sent
select date(date) as Sent Date, sum(sent) as Total Sent group by date(date)
DAY_OF_WEEK

Day name of the week (Sunday, Monday etc)

select day_of_week(date), sum(sent) as Total Sent group by day_of_week(date)
WEEK

Truncates to a date to the beginning of the week (Monday). When used within group by performs aggregation by week.

select week(date) as Sent Week, sum(sent) as Total Sent group by week(date)
WEEK_OF_YEAR

Week Number integer for the input date

select week_of_year(date) as Sent Week, sum(sent) as Total Sent group by week_of_year(date)
MONTH

Truncates to the 1st of the month. Aggregates data on a monthly basis when used within group by.

select month(date) as Sent Month, sum(sent) as Total Sent group by month(date)
QUARTER

Truncates to the beginning of the quarter. Aggregates data on a quarterly basis when used within group by.

select quarter(date) as Sent Quarter, sum(sent) as Total Sent group by quarter(date)
YEAR

Truncates to the 1st of the year. Aggregates data on a yearly basis when used within group by.

select year(date) as Sent Quarter, sum(sent) as Total Sent group by year(date)
HOUR

Truncates to the hour for dates with timestamps

select HOUR(date) as Sent Hour, sum(sent) as Total Sent group by hour(date)
MINUTE

Truncates/Groups to the minute for dates with timestamps

select MINUTE(timestamp) as Sent Hour, sum(sent) as Total Sent group by MINUTE(timestamp)
NOW()

Current date/time

select now()
DATE_FORMAT

Converts a date into another format

DATE_FORMAT(<date>,<format>)

select date_format(date,dd-MMM) as Display Format

Options:

y Year
M Month
w Week of Year
W Week in month
D Day in Year
d Day in Month
F Day of Week in Month
E Day name in week. Example: Tuesday,Tue
a Am/PM marker
H Hour in day (0-23)
h Hour in am/pm (1-12)
m Minute in hour
s Second in minute
S Millisecond
z Time zone
Z Time zone
   
STR_TO_DATE

Date conversion from a String Converts a string into date using a provided format

STR_TO_DATE(<date>,<format>)

select str_to_date(date,dd-MMM-yy HH:mm) as Converted Date
Date Tokens

The following reserved tokens enable date queries based on current date/time:

$c9_now Current Time
$c9_thishour 00:00 of the Current hour
$c9_today Midnight of the current date
$c9_yesterday Midnight, yesterday
$c9_thisweek Start of the current week (Monday midnight)
$c9_lastweek Start of last week (Monday midnight)
$c9_thismonth Midnight of the 1st of the current month
$c9_lastmonth Midnight of the 1st of the last month
$c9_thisquarter Midnight of the 1st of the current quarter (Jan, April, July, Oct)
$c9_lastquarter Midnight of the 1st of the last quarter (Jan, April, July, Oct)
$c9_thisyear Midnight, Jan 1, of the current year
$c9_lastyear Midnight, Jan 1, of the last year

select * where date > $c9_thisyear

In addition, these can be further manipulated with +/- operands along with time unit identifiers. For example:

select * where date > $c9_thisyear+2m

Gets data from March onwards

select * where date > $c9_yesterday+2h

Data from 2:00 AM yesterday

Time Units

The following are the list of supported time units:

min Minutes
h Hours
d Days
w Weeks
m Months
q Quarters
y Years

Timezones

Default timezone is US/Pacific for date display within Knowi. On-premise agents inherit the server timezone.

Custom Timezones can be set in the query using:

set time_zone=US/Eastern;

Full list of Timezones here.

Example:

Rolling & Cumulative

Cloud9QL provides a set of operations which can be utilized to calculate rolling and cumulative operations such as accumulate, growth, delta, simple moving average, cumulative moving average, and time moving average.

The standard usecase for these operations is to allow computation <operation> of a <value field> across a set of <dimension field(s)> and optionally grouping by a set of <grouping field(s)>.

<operation>(<value field>[, <grouping field(s)>]);

For example, compute the DELTA of Sent across Week grouping by Customer. In this example:

<operation>: DELTA
<value field>: Sent
<dimension field(s)>: Week
<grouping field(s)>: Customer

Example:

select Customer, delta(Sent, Customer) as SentDelta

There is one important restriction when using these Cloud9 QL functions: the input data need to be ordered by the <grouping field(s)> and <dimension field(s)> in that order.

ACCUMULATE

Creates cumulative totals for a field between records, given a sorted dataset.

accumulate(<value field>[, <grouping field(s)>]);

select accumulate(sent), date

The above example returns a cumulative sum of sent count for a pre-sorted date order.

GROWTH

Calculates a growth percentage for a field between records, for a sorted dataset.

growth(<value field>[, <grouping field(s)>]);

 select growth(sent), date
DELTA

Calculates a difference for a field between records, for a sorted dataset.

delta(<value field>[, <grouping field(s)>]);

 select delta(sent), date
SMA

Simple moving average based on a field and a window size for it. Assumes a sorted dataset.

SMA(<value field>, <window size>[, <grouping field(s)>]);

select sma(sent, 10)

CMA

Cumulative moving average returns the moving average of all data up to the current data point.

CMA(<value field>[, <grouping field(s)>]);

select cma(sent)

TMA

Time moving average based on a field, date field, and a window time unit size for it. See Time Units for all available time units. Assumes a sorted dataset

TMA(<value field>, <date field>, <time unit window>[, <grouping field(s)>]);

select tma(sent, date, 1w)

For more details on moving average definitions, see http://en.wikipedia.org/wiki/Moving_average

String & Number Operators

ROUND

Specify the number of decimal points to display

ROUND(<field>, <decimal points>)

select round(sent,1)
SUBSTRING

Substring between start and end indexes.

SUBSTRING(<field to check against>, < startIndex>,< endIndex>)

 select substring(Message Type,0,10)
SUBSTRING_BEFORE

Substring before the first occurrence of a delimiter for a field value.

SUBSTRING_BEFORE(<field to check against>, < delimiter>)

 select substring_before(Message Type,someDelim)
SUBSTRING_AFTER

Substring after the first occurrence of a delimiter for a field value.

SUBSTRING_AFTER(<field to check against>, < delimiter>)

  select substring_before(Message Type,someDelim)
CONCAT

Concatenates multiple columns together. When a field name does not exist in the current dataset, a fixed string is used.

CONCAT(<field name>, < anotherfield>, < yetanotherfield>,...)

 select concat(Customer, for Week of, Week)
UPPER

Upper cases a string

UPPER(<field name>)

 select upper(Customer)
LOWER

Lower cases a string

LOWER(<field name>)

 select lower(Customer)
CURRENCY_FORMAT

Formats a number to a locale specific currency format. Defaults to US currency format (en_US) if locale is not specified.

CURRENCY_FORMAT(<field name>, <locale>)

CURRENCY_FORMAT(<field name>, <decimal points>)

CURRENCY_FORMAT(<field name>, <locale>, <decimal points>)

  select currency_format(revenue)

Example with Locale:

  select currency_format(revenue,en-GBP)
NUMBER_FORMAT

This function allows you to control the display of leading and trailing zeros, prefixes and suffixes, grouping (thousands) separators, and the decimal separator.

NUMBER_FORMAT(<number>,<format>)

select number_format(clicks,##,###.00) as Number of clicks

The output for the preceding lines of code is described in the following table. The value is the number, a double , that is to be formatted. The pattern is the String that specifies the formatting properties. The output, which is a String, represents the formatted number:

value pattern output explanation
123456.789 ###,###.### 123,456.789 The pound sign (#) denotes a digit, the comma is a placeholder for the grouping separator, and the period is a placeholder for the decimal separator.
123456.789 ###.## 123456.79 The value has three digits to the right of the decimal point, but the pattern has only two. The format method handles this by rounding up.
123.78 000000.000 000123.780 The pattern specifies leading and trailing zeros, because the 0 character is used instead of the pound sign (#).
12345.67 $###,###.### $12,345.67 The first character in the pattern is the dollar sign ($). Note that it immediately precedes the leftmost digit in the formatted output.
12345.67 \u00A5###,###.### ¥12,345.67 The pattern specifies the currency sign for Japanese yen (¥) with the Unicode value 00A5.
IFNULL

Provide an alternate value to be used in case the specified field doesn't exist or the value is NULL.

IFNULL(<field name>, < alternate value>)

  select IFNULL(Customer, "N/A")

Other

CASE (IF..ELSE)

Conditional processing

case when <fieldName> <condition> <value> then <displayvalue else <displayelsevalue> end

  select case when sent > 1000000 then Large Sent when sent < 1000000 then Small Sent else NA end as Category, customer
PERCENTILE

Returns the value of the field for the specified percentile rank.

PERCENTILE(<field>, <percentile>)

select percentile(sent,75)
TRANSPOSE

Pivots row values for a field to columns

TRANSPOSE(<field to transpose>, < current column name>)

select transpose(Message Type,Sent)

To collapse based on a key field, use:

TRANSPOSE(<field to transpose>, <current column name>, <key column name>)

select transpose(Message Type, Sent, Customer)
INJECT

Injects last value records in for a date range when the values are not present for that date.

For example, if a sensor emits data point 100 for 01/01/2016 and and the next change of value is at 200 10 days later, you can use the inject function to inject 100 into all dates in between that range.

INJECT(<Date Field>, <Start Date for Injecting>, <End Date for Injecting>, <Injection Frequency> [, <Select Fields>])
[group by <Dimension 1>[, ..., <Dimension N>]]

The optional <Select Field> can either be * (for all fields) or a comma separated list of selected fields from input data.

select inject(date, start_range_date, end_range_date, 1d, Name, Division, Score)
group by Name, Division
APPEND

Combines the results of two queries into the same dataset

(select sum(sent) as Facebook where customer=Facebook) append (select sum(sent) as LinkedIn Sent where customer=Linkedin)


Nested

Use the dot notation to query nested elements and the array notation for selecting items in an array. The example below uses a JSON nested string and uses Cloud9QL to parse it.

To unwind/expand an array, use the expand syntax.

Example:

select customer, nestedObj.secondLevel.y as Nested;
select expand(Nested);

Note that expand must be specified on its own, without any other elements within the select.


Chaining Statements

Multiple statements can be chained one after the other using a semi-colon delimiter, where the results of the first statement is passed in to the second and so on.

Example:

select sum(sent) as Sent, sum(opened) as opened, customer, month(date) as Month group by customer, month(date);
select (opened/sent)*100 as Open Rate, Opened, Sent, Customer, Month;
select round(Open Rate,1) as Open Rate, Opened, Sent, Customer, Month where Month > $c9_thisyear+2m and Open Rate > 20 order by Open Rate asc

This:

a) gets the Total Sent and Opens on a monthly basis for each customer,

b) then calculates the open rate based on the data from previous step since March,

c) adds additional criteria along with rounding the open rate.


IP to Geo-Location

IP to GEO function enables city level geo location from IP addresses.

Note: This uses GeoLite2 data created by MaxMind.

Example:

select ip_to_geo(IP), Some Field

This:

  • Queries the MaxMind database to determine location fields.
  • The fields are added as separate columns to the result.

Geocoding - Lat/Long from Address

Retrieves lat/longs from addresses using geocod.io.

Note: This requires your own API key from geocod.io.

Example:

select geocode(<fullAddress>,<apiKey>)

This:

  • Issues a batch geocoding request to geocod.io.
  • The fields are added as separate columns to the result.

Forecasting/Predictions

Cloud9QL supports the ability to forecast/predict future data points based on previous data history for any dataset.

To predict 1 point in the future:

select PREDICT(<field to predict>, <date field>, <start date for prediction>[, <prediction model>])

To predict multiple data point in the future:

select PREDICT(<Field to Predict>, <Date Field>, <Start Date for Prediction>,
    <Prediction Frequency>, <Prediction Data Points>[, <prediction model>])

To predict multiple data point in the future based on secondary dimension(s) (ie: grouping(s)):

select PREDICT(<Field to Predict>, <Date Field>, <Start Date for Prediction>,
    <Prediction Frequency>, <Prediction Data Points>[, <prediction model>])
group by <Dimension 1>, ..., <Dimension N>

This:

  • Loads the data points from input data.
  • Determines the optimum prediction model.
  • Predicts future data point(s).
  • <prediction frequency> is in the format of <Number><TimeUnits> for example 1d means daily 2m specifies every 2 months.
  • <prediction model> you can choose a specific model to be applied from one of the following supported models (case-sensitive):
    • Regression
    • PolynomialRegression
    • MultipleLinearRegression

Example:


Cohort

Cohorts are useful to determine a grouping of data across a date dimension. For example, how many users remain active n months after initial signup, etc.

We currently support 2 types of input data.

select COHORT(<Date Column>, <Cohort Date Definition>, <Cohort Period Definition>), <Cohort Operation>
group by <Cohort Date>, <Cohort Period>

Note:

  1. Input data needs to be sorted by Date ascending order.

  2. Cohort Period returns a number (ie: the period) or a date. Example:

    a. 1m: Cohort Period as number

    b. (1m): Cohort Period as Date

Example 1: If we already have the cohort date populated

Example 2: If we only have transactional events like the following example:

Example 3: Cohorts can be used in combination with transpose to flatten the result based on date

Example 4: A common cohort is retention in percentage format which can be computed as follows:


Expand

Unwinds nested field values: This function will expand array, map, or array of maps data structure into rows.

select EXPAND(<Column with Nested Value>)

Example: In this example, Name field's value is an array of map of Last Name and First Name.

Note that this function must be used in isolation, i.e., cannot be used in combination with others. Use query chaining to manipulate the results:

select expand(Name);
select * where First Name like John

Parse

In case your data is a JSON string, PARSE function can be used to convert it to object which can be further manipulated and processed.

select PARSE(<JSON String Column>)

Example: In this example, Name field's value is an array of map of Last Name and First Name.


Encryption/Obfuscation

In case your data needs to be encrypted/obfuscated for storing, you can use the ENCRYPT and DECRYPT functions to achieve this goal.

select ENCRYPT(<Column>)
select DECRYPT(<Column>)