Cloud9Agent

For data inside your network, where a direct connection from Knowi is not feasbile, use the Cloud9Agent to facilitate secure connectivity into your data sources.

Highlights:

  • Connectivity to datasources securely and privately inside your network.
  • Does not require port/firewall opens
  • Query data using native query syntax for that datasource.
  • Optionally transform data, along with additional processing or calculations.
  • Query Scheduling.
  • Sends the resulting data to Knowi, with optional warehousing strategies.
  • Queries can be configured within the agent directly, or, directly from the UI in our cloud.

The agent is a headless process that establishes a persistent connection to our servers; datasources and queries defined within the Knowi portal is synced up real-time by the agent.

Install

To download the agent, select the Cloud9Agent icon from the left hand menu settings, then select Download.

Filter

Requirements:

  • The location of the install needs to be able to connect to the datasources you want to pull data from.
  • Java Runtime Environment 1.7 or above:

    Type java -version on the command line to check java version. If required, download it here.

Install & Run:
  1. Unzip the file.
  2. Navigate to the unzipped base directory and execute the run script:

    ./run.sh  (or run.bat in Windows)
    To run as a background process in unix based systems: ./run.sh &
    

    By default, the process the allocates a minimum memory of 128MB and maximum of 2048MB. If required, modify the values from -Xms128m -Xmx2024m to appropriate values within the JAVA_OPTS variable inside the script.

The distribution is pre-configured with your API key.

Basics

The Agent can be operated in two modes:
  1. Automated sync from the UI: Datasources and Queries can be set directly via our UI in the cloud. The agent establishes connectivity to our servers upon startup (by default) and will immediately pull configuration and execute queries immediately or on a schedule. Includes support for previewing query results directly from the UI.

  2. StandAlone: If you do not wish to use the UI to manage the datasource and query configurations, they can be directly added into the Agent. In this mode, datasources and queries are self-contained within the agent, where the UI has no knowledge of the datasources and the queries - only the results of query executions are sent to Knowi.

UI To Agent Sync

The following example uses UI-to-Agent sync to configure and execute queries against your datasource inside the network.

  1. Select Datasources from the Settings left menu option and select a datasource.
  2. Specify Datasource parameters: a. Enter Database connectivity parameters that the agent will use to connect against. b. Check the Internal Datasource checkbox to assign it to the agent. c. Click Save.

    This will be synced by the Agent immediately.

    Filter

  3. Specify a query and select preview. This will be synced by the agent real-time to display the results. Filter
  4. Save (with or without schedule) to save, then add it to the dashboard.

StandAlone

Notes:

  • File names that start with datasource in the config directory are treated as a datasource configuration. Similarly, file names that start with query in the config directory are treated as a query configuration.

  • Datasource or Query Files placed in the config directory are automatically picked up - No restart necessary.

  • The examples directory contains datasource and query examples against various datasources. Move appropriate files to the config directory (datasource_XXX.json files first if the Agent is running).

Example datasource config:

[
  {
    "name":"demoMySQL",
    "url":"localhost:3306/test",
    "datasource":"mysql",
    "userId":"userA",
    "password":"passA"
  },
  {
    "name":"demoMongo",
    "url":"dharma.mongohq.com:10071/mongoA",
    "datasource":"mongo",
    "userId":"mongoA",
    "password":"mongoPassA"
  }
]

There are two databases configured above: One pointing to MySQL and another to MongoDB.

Example Query config:

[
  {
    "entityName":"Weekly Sent By Message Type",
    "dsName":"demoMySQL",
    "queryStr":"select sum(sent) as Total Sent, message_type,Week from demo_sent group by message_type,week order by week asc",
    "overrideVals":{
      "replaceAll":true
    }
  },
  {
    "entityName":"Page Hits Over Time",
    "dsName":"demoMongo",
    "queryStr":"db.pagehits.find({hits: { $gte: 1}})",
    "c9QLFilter":"select date(lastAccessTime) as Date, count(*) as Page Hits group by date(lastAccessTime) order by Date asc",
    "frequencyType":"daily",
    "frequency":1,
    "startTime": "04:00",
    "overrideVals":{
      "replaceValuesForKey":["Date"]
    }
  }
]

Agent Configuration

Note: Unless you wish to tweak any settings, no changes are required out of the box.

The file config.json contains global settings for the Agent. It's pre-configured with your API and Agent ID.

The top level Agent installation directory contains the following three directories:

  • config : JSON based Datasource configurations and Query configuration files that are dropped here will be picked up and executed.
  • examples : Example Demo Datasource and Query file configurations.
  • lib : Libraries used for the Agent to connect to various datasources.

    Under config dir, the config.json contains the following core configuration parameters for the agent.

    Property Comments
    apiKey Unique API key to send. Do not modify.
    realtimeUpdate Default: true. Enables connectivity to our servers. Enable real-time query execution, for queries configured in the UI.
    autoUpdate Default: true. Checks for new Cloud9Agent versions once a day and if set to true, the agent process will automatically update and restart
    autoDownloadLib Default: true. Will download necessary libraries from Knowi when a new datasource is used. For example, when connecting to MongoDB, if the drivers are not present, it'll download the necessary files from us.
    maxThreads Default:40 Maximum number of concurrent Query Threads (Queries will be queued until a thread becomes available for larger workloads)
    connectorId Agent ID. Do not modify, unless recommended by our team
    loadedDataSources Datasources currently used and libraries downloaded for
    configDir Directory used for datasources and queries configurations

Datasources

NoSQL Databases

MongoDB Cloudant Cassandra/DataStax MarkLogic Aerospike


Relational Databases

MySQL Oracle PostgreSQL SQL Server


Data Warehouses

Redshift
Snowflake Cloud9 Data Warehouse


File Based Data

CSV JSON Excel Files on S3 FTP HTTP Email Attachments


Other

Google Analytics BigQuery Salesforce


Combining Data - Multiple Datasources

Data from multiple datasources can be merged into a single dataset within Knowi. Simply send the data into the same entityName or identifier in the query file.

Example:

  [
      {
        "entityName":"Visitor Data",
        "dsName":"demoGA",
        "gaMetrics":"ga:visitors,ga:newVisits",
        "gaDimensions":"ga:date",
        "gaDateRange":"10d",
        "gaMaxResults":"1000",
        "gaSort":"-ga:date",
        "c9QLFilter":"select ga:visitors as Visitors, ga:newVisits as New Visits, ga:date as Date, \"GA Data\" as Type",
        "frequencyType":"hour",
        "frequency":2
        "overrideVals":{
          "replaceValuesForKey":["Date","Type"]
        }
      },
      {
        "entityName":"Visitor Data",
        "dsName":"demoMongo",
        "queryStr":"db.pagehits.find({hits: { $gte: 1}})",
        "c9QLFilter":"select date(lastAccessTime) as Date, count(*) as Page Hits, \"Mongo Data\" as Type group by date(lastAccessTime) order by Date asc",
        "frequencyType":"daily",
        "frequency":1,
        "startTime": "04:00",
        "overrideVals":{
          "replaceValuesForKey":["Date","Type"]
        }
      }
  ]

The above example gets data from Google Analytics and MongoDB (on a schedule) and upserts data into the Visitor Data dataset based on Date and Type.

IP To Geo-Location Support

IP to Geo conversion can be done at the Agent or via the UI. More on IP to Geo location, see here.

The default Cloud9Agent distribution does not bundle the MaxMind database. To enable it:

  1. Download MaxMind database from http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz.
  2. Unzip the file and place into the lib folder.
  3. Restart.

Example:

[
  {
    "entityName":"Location Data",
    "dsName":"demoMongo",
    "queryStr":"db.users.find()",
    "c9QLFilter":"select ip_to_geo(ipAddress); select distinct * where latitude is not null and city is not null",
    "frequencyType":"daily",
    "frequency":1,
    "startTime": "04:00",
    "overrideVals":{
      "replaceAll":true
    }
  }
]

The above example gets data from MongoDB and for the ipAddress field, executes ip_to_geo to obtain location information. The second statement further manipulates the data to filter out empty city and latitude data.

Custom Logic

In the uncommon scenario where you need custom processing of data beyond the native query support and Cloud9QL, custom processors can be plugged in to datasource and query files to enable custom logic for manipulating/processing data from your datasources. See Custom Processing for more details.