a

Trino Docker Setup Guide: Connect PostgreSQL, MySQL & Run Queries

Trino Set up guide

TL;DR

Trino is a distributed SQL engine you can get up and running with Docker in just minutes. This guide walks you through a single-node setup, configuring multiple data sources (PostgreSQL + MySQL), loading sample data, running cross-source queries, and integrating with Knowi for visual analytics. Perfect for local development, testing, or proof of concept.

Table of Contents

Introduction

Trino is a distributed SQL query engine designed for fast, interactive analytics across multiple data sources, including both structured (SQL) and semi-structured or unstructured (NoSQL, object storage) data – all without requiring ETL or data movement.

Why use Trino: Key Advantages 

1. Federated SQL Querying Across Data Sources

Trino allows engineers and backend systems to query multiple sources, such as PostgreSQL, MySQL, Hive, MongoDB, Cassandra, S3, and more, through a single SQL interface. This is especially useful for powering data pipelines or backend services that require direct access to distributed data without ETL.

2. Full ANSI SQL Support

Trino provides strong SQL coverage, including joins, aggregations, subqueries, and window functions, making it ideal for technical teams who prefer standardized query languages across all their data sources.

3. Distributed Query Execution at Scale

Trino is optimized for large-scale, in-memory processing across clusters. It supports parallel execution for faster backend querying on large or complex datasets.

4. Stateless and Lightweight

As a stateless engine, Trino executes queries without persisting data or maintaining intermediate storage. This makes it a good fit for ephemeral workloads and elastic environments.

5. Extensible with 30+ Connectors

Trino supports a wide range of structured and semi-structured data sources through its modular connector architecture, ideal for backend systems that need data federation.

6. Open and Vendor-Agnostic

Trino is fully open-source, with active development from its original creators. Its pluggable design makes it compatible with a variety of architectures, including lakehouses, cloud-native stacks, and custom pipelines.

Challenges with Trino

1. No Native Storage or Caching

Trino doesn’t persist or cache query results. This limits its use for repeated access patterns or dashboarding without pairing it with a caching or visualization layer like Knowi.

2. Memory-Intensive Workloads

As an in-memory engine, Trino requires careful tuning to avoid out-of-memory errors when dealing with large joins, aggregations, or skewed data distributions.

3. Minimal Security and Governance Features

Trino has basic authentication and authorization but lacks robust user management, role-based access, or row-level security, which are often required for business-facing use.

4. Uneven Connector Maturity

Not all Trino connectors offer equal performance or pushdown support. Query performance can vary significantly depending on the source and connector implementation.

5. Operational Overhead for Production Use

Running Trino in production requires DevOps ownership to manage worker scaling, fault tolerance, upgrades, and system monitoring.

6. No Built-in Visualization or BI Layer

Trino does not offer dashboards, visualizations, or reporting capabilities. For business insights, it must be integrated with a BI platform like Knowi.

When to Use Trino 

1. Backend Federated Query Engine

Trino is ideal for teams building data platforms or services that need to query multiple data sources without moving or transforming the data.

2. Low-Latency SQL on Data Lakes

For lakehouse architectures, Trino provides fast SQL access over formats like Iceberg, Hive, and Delta Lake, especially when paired with S3, GCS, or HDFS.

3. Semantic Query Layer for BI Tools

Trino can serve as a virtual SQL endpoint across systems, enabling BI tools like Knowi, Superset, or Tableau to query without managing multiple data connections.

4. Data Science Exploration 

Technical users can use Trino to run ad-hoc queries across raw datasets from different sources, without having to ingest or model the data first.

5. Query Acceleration Layer

Trino can serve as a high-performance engine to accelerate BI workloads by pre-processing complex logic or joins before passing results to downstream tools like Knowi.

6. Transitional Access Layer During Migrations

When migrating from legacy to modern systems, Trino allows unified access to both, enabling teams to maintain continuity in analytics workflows.

Prerequisites

Before starting, ensure you have:

  • Docker Desktop installedDownload Docker
  • At least 4GB of free memory
  • Basic familiarity with SQL and the command line

Step 1: Set Up a Trino Server with Docker

1. Simple Single-Node Trino

Create a docker-compose.yml file:

version: '3.8'

services:
  trino:
    image: trinodb/trino:latest
    container_name: trino-server
    ports:
      - "8080:8080"
    environment:
      - JAVA_TOOL_OPTIONS=-Xmx4G

Start Trino:

docker-compose up -d

Access Trino:

2. Trino with Built-in Catalogs

Create an enhanced docker-compose.yml:

version: '3.8'

services:
  trino:
    image: trinodb/trino:latest
    container_name: trino-server
    ports:
      - "8080:8080"
    environment:
      - JAVA_TOOL_OPTIONS=-Xmx4G
    volumes:
      - ./trino-config:/etc/trino/catalog
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/v1/info"]
      interval: 30s
      timeout: 10s
      retries: 5

Create catalog configurations:

TPC-H Catalog (trino-config/tpch.properties):

connector.name=tpch

TPC-DS Catalog (trino-config/tpcds.properties):

connector.name=tpcds

Memory Catalog (trino-config/memory.properties):

connector.name=memory

Step 2: Connect PostgreSQL & MySQL as Trino Catalogs 

Complete Setup with PostgreSQL and MySQL

Create a comprehensive docker-compose.yml:

version: '3.8'

networks:
  trino-network:
    driver: bridge

services:
  trino:
    image: trinodb/trino:latest
    container_name: trino-server
    ports:
      - "8080:8080"
    environment:
      - JAVA_TOOL_OPTIONS=-Xmx4G
    volumes:
      - ./trino-config:/etc/trino/catalog
      - ./trino-data:/var/trino/data
    networks:
      - trino-network
    depends_on:
      - postgres
      - mysql
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/v1/info"]
      interval: 30s
      timeout: 10s
      retries: 5

  postgres:
    image: postgres:14
    container_name: trino-postgres
    environment:
      POSTGRES_USER: trino
      POSTGRES_PASSWORD: trino123
      POSTGRES_DB: analytics
    ports:
      - "5432:5432"
    volumes:
      - ./postgres-init:/docker-entrypoint-initdb.d
      - postgres-data:/var/lib/postgresql/data
    networks:
      - trino-network

  mysql:
    image: mysql:8.0
    container_name: trino-mysql
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
      MYSQL_DATABASE: sales
      MYSQL_USER: trino
      MYSQL_PASSWORD: trino123
    ports:
      - "3306:3306"
    volumes:
      - ./mysql-init:/docker-entrypoint-initdb.d
      - mysql-data:/var/lib/mysql
    networks:
      - trino-network

volumes:
  postgres-data:
  mysql-data:

Catalog Configurations

PostgreSQL Catalog (trino-config/postgresql.properties):

connector.name=postgresql
connection-url=jdbc:postgresql://postgres:5432/analytics
connection-user=trino
connection-password=trino123

MySQL Catalog (trino-config/mysql.properties):

connector.name=mysql
connection-url=jdbc:mysql://mysql:3306/sales
connection-user=trino
connection-password=trino123

Step 3: Load Sample Data into PostgreSQL & MySQL

PostgreSQL Sample Data

Create postgres-init/01-create-tables.sql:

CREATE SCHEMA IF NOT EXISTS ecommerce;

CREATE TABLE ecommerce.customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    country VARCHAR(50),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE ecommerce.products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(200),
    category VARCHAR(100),
    price DECIMAL(10, 2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE ecommerce.orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES ecommerce.customers(customer_id),
    order_date DATE,
    total_amount DECIMAL(10, 2),
    status VARCHAR(50)
);

INSERT INTO ecommerce.customers (first_name, last_name, email, country) VALUES
    ('John', 'Doe', 'john.doe@email.com', 'USA'),
    ('Jane', 'Smith', 'jane.smith@email.com', 'Canada'),
    ('Bob', 'Johnson', 'bob.j@email.com', 'UK');

INSERT INTO ecommerce.products (product_name, category, price) VALUES
    ('Laptop Pro', 'Electronics', 1299.99),
    ('Wireless Mouse', 'Electronics', 29.99),
    ('Office Chair', 'Furniture', 249.99);

INSERT INTO ecommerce.orders (customer_id, order_date, total_amount, status) VALUES
    (1, '2024-01-15', 1329.98, 'Completed'),
    (2, '2024-01-16', 249.99, 'Completed'),
    (3, '2024-01-17', 29.99, 'Processing');

MySQL Sample Data

Create mysql-init/01-create-tables.sql:

CREATE TABLE IF NOT EXISTS sales_transactions (
    transaction_id INT AUTO_INCREMENT PRIMARY KEY,
    store_id INT,
    product_id INT,
    quantity INT,
    sale_date DATE,
    amount DECIMAL(10, 2)
);

CREATE TABLE IF NOT EXISTS stores (
    store_id INT PRIMARY KEY,
    store_name VARCHAR(100),
    city VARCHAR(50),
    state VARCHAR(50)
);

INSERT INTO stores VALUES
    (1, 'Downtown Store', 'New York', 'NY'),
    (2, 'Mall Store', 'Los Angeles', 'CA'),
    (3, 'Airport Store', 'Chicago', 'IL');

INSERT INTO sales_transactions (store_id, product_id, quantity, sale_date, amount) VALUES
    (1, 1, 2, '2024-01-15', 2599.98),
    (2, 2, 5, '2024-01-15', 149.95),
    (3, 3, 1, '2024-01-16', 249.99);

Start Everything

# Create necessary directories
mkdir -p trino-config trino-data postgres-init mysql-init

# Start all services
docker-compose up -d

# View service status
docker-compose ps

# Check logs
docker-compose logs trino

Step 4: Visualize Trino Data with Knowi

Connection Parameters

  1. For Local Development:
  • Host: localhost
  • Port: 8080
  • Catalog: tpch (or postgresql, mysql, etc.)
  • Schema: sf1 (for TPC-H) or appropriate schema
  • Username: (leave empty or any username)
  • Password: (leave empty)
  1. For Docker-to-Docker Connection: If Knowi is also running in Docker:
  • Host: trino-server (container name)
  • Port: 8080
  • Other parameters remain the same

Sample Queries to Test

Query across multiple catalogs:

-- Join PostgreSQL and MySQL data
SELECT 
    s.store_name,
    s.city,
    SUM(t.amount) AS total_sales
FROM 
    mysql.sales.sales_transactions t
JOIN 
    postgresql.analytics.ecommerce.orders o ON t.transaction_id = o.order_id
JOIN 
    mysql.sales.stores s ON t.store_id = s.store_id
GROUP BY 
    s.store_name, s.city
ORDER BY 
    total_sales DESC;

TPC-H Sample Query:

-- Top customers by order value
SELECT 
    c.name,
    SUM(o.totalprice) as total_spent
FROM 
    tpch.sf1.customer c
JOIN 
    tpch.sf1.orders o ON c.custkey = o.custkey
GROUP BY 
    c.name
ORDER BY 
    total_spent DESC
LIMIT 10;

5. Common Errors & How to Fix Them

Common Issues and Solutions

Container won’t start

Check logs

docker-compose logs trino

Ensure enough memory

docker system info | grep Memory

Connection refused

Check if Trino is running

curl http://localhost:8080/v1/info

 Check container status

docker ps

Out of memory errors

Increase memory allocation

docker-compose down

Edit docker-compose.yml, change JAVA_TOOL_OPTIONS to -Xmx8G

docker-compose up -d

Cannot connect to PostgreSQL/MySQL

Ensure containers are on same network

docker network ls
docker network inspect trino-network

6. Performance Optimization Tips

Memory Configuration

Create trino-config/config.properties:

coordinator=true

node-scheduler.include-coordinator=true

http-server.http.port=8080

discovery.uri=http://localhost:8080

query.max-memory=5GB

query.max-memory-per-node=1GB

JVM Configuration

Create trino-config/jvm.config:

-server

-Xmx8G

-XX:+UseG1GC

-XX:G1HeapRegionSize=32M

-XX:+UseGCOverheadLimit

-XX:+ExplicitGCInvokesConcurrent

-XX:+HeapDumpOnOutOfMemoryError

-XX:+ExitOnOutOfMemoryError

Query Optimization Tips

  1. Use appropriate data types
  2. Partition large tables
  3. Use predicate pushdown
  4. Monitor query performance via Web UI

Next Steps

  1. Security: Configure authentication and SSL
  2. Monitoring: Set up Prometheus/Grafana
  3. Scale: Add worker nodes for distributed queries
  4. Production: Use Kubernetes for orchestration

Useful Commands

Stop all containers

docker-compose down

Stop all containers

docker-compose down

Remove all data (fresh start)

docker-compose down -v

View real-time logs

docker-compose logs -f trino

Execute Trino CLI

docker exec -it trino-server trino

Check catalog list

docker exec trino-server trino --execute "SHOW CATALOGS"

Check schemas

docker exec trino-server trino --execute "SHOW SCHEMAS FROM tpch"

Additional Resources

Ready to Visualize Trino Data?

Once your Trino instance is up and running, it’s time to unlock real insights.

Knowi connects natively to Trino and lets you:

  • Build dashboards directly on top of your Trino queries
  • Run cross-source analytics with zero ETL
  • Use Natural Language Queries to ask questions in plain English
  • Blend Trino results with MongoDB, REST APIs, or spreadsheets
  • Share, embed, and automate insights across your teams

Try Knowi for free or request a 15-minute demo to see how AI-powered analytics on Trino works in action.

Frequently Asked Questions

What is Trino and why should I use it?

Trino is an open-source distributed SQL query engine that lets you run federated queries across multiple data sources—including relational databases, NoSQL, and object stores—without needing to move or transform the data. It’s ideal for interactive, large-scale analytics.

Can I run Trino locally using Docker?

Yes. This blog walks you through setting up a single-node Trino server using Docker, which is perfect for local development, testing, or POCs (proof of concepts).

What types of data sources can I connect to Trino?

Trino supports over 30 connectors. In this guide, we cover PostgreSQL and MySQL, but you can also connect to MongoDB, Cassandra, S3, Hive, Elasticsearch, and more.

Do I need to load data into Trino?

No. Trino doesn’t store data—it connects to external systems where your data lives and queries it in place. However, for this tutorial, we show how to load sample data into PostgreSQL and MySQL to test queries.

What are some common issues I might face during setup?

Typical issues include Docker containers failing to start (usually memory-related), Trino not recognizing catalogs (misconfigured files), or connection errors between Trino and databases (network or credential problems). The blog has a dedicated troubleshooting section to help with these.

How do I visualize Trino data?

You can use Knowi, a BI platform with native Trino integration. Knowi lets you run SQL queries on Trino, build dashboards, and even ask questions in plain English using Natural Language Queries—no ETL required.

Is this setup production-ready?

No. This is a development/test setup. For production, you’ll need to configure security, scaling, authentication, monitoring, and possibly run Trino in a Kubernetes or cloud-native environment.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
About the Author:

RELATED POSTS