Machine Learning (Beta)

Buddha once said "To reach Enlightenment, you must turn data into insight and insight into action". Actually, no he didn't say that, but
Knowi goes beyond enlightenment to blend your hindsights with foresights and drive actions from your data.

Currently, the Knowi supports Classification type Machine Learning use cases, with regression, clustering and deep learning coming soon.


In this document, we will quickly demonstrate how to train/build, create, and apply a classification model to data.

For demonstration purposes, we will be using "Default of Credit Card Clients Data Set". To follow along, we'll use a dataset from UCI Machine Learning Repository, contains 30,000 client Credit Card data with 24 attributes including:

  1. Personal characteristics such as age, education, gender, and marital status.
  2. Credit line limit information.
  3. Billing/payment history for the 6 months period from April to September of 2005.

The dataset in CSV form can be downloaded here.

The goal is to predict whether a client will default on next payment period based on the input attributes.

Key steps:

  1. Data Prep and Training
  2. Training Models
  3. Optimization and Saving the Model
  4. Applying Saved Model to Data


Data Preparation

  1. Login to Knowi and click on Machine Learning icon from the left menu bar.
  2. On the next page, Click on New Workspace button on top right, this will create a new Workspace for us to work with.
  3. The Workspace gives you option to either select the training dataset from your existing datasets or upload a new one from CSV file.
  4. After selecting your dataset, click on Prepare Data button to populate the workspace.

    a. We can view our training data by clicking on Preview Data button.
    b. The algorithm list is populated with supported algorithms.
    c. The list of fields from our training data is populated as well.

Following is an animated GIF to illustrate the steps above (hover to play):

Preparation Workspace and Training Data

Training Models

  1. Select the algorithm(s) from the algorithm list. For the first run, we will select all with default settings.
  2. Select attributes to be included in the training from the field list.
  3. Select the predict (class) field from the dropdown at the bottom of the field list. In our case, we select "default payment next month" for this.
  4. Click on Train button and wait for the training to be completed to view the result.

    a. The Results panel will be populated with your run results (one per algorithm). You can expand each entry to see more detailed results by clicking on the + symbol. You can also compare the output of your model vs the training value by clicking on the eye symbol (more on this in the next section). Last but not least, you can publish your model by clicking on the save icon. b. At the bottom of the page is the History section which lists all the runs that you have done on this workspace. Selecting these will update the Results panel with detailed information for that run.

Following is an animated GIF to illustrate the steps above (hover to play):

Training Models

Optimization and Saving the Model

  1. From the Results panel we can gather important performance information for the selected models. In our example, we can see that Decision Tree algorithm yields the best result.
  2. By clicking on the Result Data (eye symbol), we can compare the input training result and the result output by your model for each row of the training data (next to last and last columns respectively) to identify any combinations of attributes mis-classified by our model.
  3. Armed with information, we can now pick a different subset of algorithms, fine-tuning algorithm's parameters by clicking on the corresponding settings symbol and rerun the training. For our example, I choose to rerun only Decision Tree with Maximum Number of Leaf Nodes decreased from 100 to 90.
  4. We can use the History panel at the bottom of the page to view and compare results between runs. For our example's sake, we can see that our last run has a slightly better result of 81.39% accuracy compared to 81.3267% of the previous run (too small to matter).
  5. Once satisfied with the model, we publish (save) the latest Decision Tree model. This can then be applied to any future queries.

Following is an animated GIF to illustrate the steps above (hover to play):

Optimization and Save Model

Applying Saved Model to Data

  1. Go to our query page by clicking on Data Feeds / Queries icon from the left menu bar and create a new (or edit an existing) query.
  2. At the bottom of query editing page, there is an "Apply Model" button. Clicking on this shows a dropdown with a list of published models.
  3. Select our newly created model to be applied to the data. Few important notes:

    a. The model is applied after the query has been executed. This allows us to perform all necessary data massaging using the query section as long as the result of our query contains all the attribute fields required by the model.

    b. The predict (class) "default payment next month" attribute is automatically added to the result after applying the model.

    c. The model is applied before the adhoc (grid) query. This allows further manipulation on the output data.

    d. Existing query functionality remains as is.

Following is an animated GIF to illustrate the steps above (hover to play):

Applying Saved Model to Data

Trigger Notification and Actions

Triggers and actions can be applied on the results. For example, you can send an alert or a webhook into your application for the users with high risk of default for the use case above. The process for setting up triggers and alerts on a query with machine learning remains the same as a normal dataset/query. For more details, see Alerts.