TL;DR
Trino is a distributed SQL engine you can get up and running with Docker in just minutes. This guide walks you through a single-node setup, configuring multiple data sources (PostgreSQL + MySQL), loading sample data, running cross-source queries, and integrating with Knowi for visual analytics. Perfect for local development, testing, or proof of concept.
Table of Contents
- Introduction
- Why use Trino: Key Advantages
- Challenges with Trino
- When to Use Trino
- Prerequisites
- Step 1: Set Up a Trino Server with Docker
- Step 2: Connect PostgreSQL & MySQL as Trino Catalogs
- Step 3: Load Sample Data into PostgreSQL & MySQL
- Step 4: Visualize Trino Data with Knowi
- 5. Common Errors & How to Fix Them
- 6. Performance Optimization Tips
- Next Steps
- Useful Commands
- Additional Resources
- Ready to Visualize Trino Data?
- Frequently Asked Questions
Introduction
Trino is a distributed SQL query engine designed for fast, interactive analytics across multiple data sources, including both structured (SQL) and semi-structured or unstructured (NoSQL, object storage) data – all without requiring ETL or data movement.
Why use Trino: Key Advantages
1. Federated SQL Querying Across Data Sources
Trino allows engineers and backend systems to query multiple sources, such as PostgreSQL, MySQL, Hive, MongoDB, Cassandra, S3, and more, through a single SQL interface. This is especially useful for powering data pipelines or backend services that require direct access to distributed data without ETL.
2. Full ANSI SQL Support
Trino provides strong SQL coverage, including joins, aggregations, subqueries, and window functions, making it ideal for technical teams who prefer standardized query languages across all their data sources.
3. Distributed Query Execution at Scale
Trino is optimized for large-scale, in-memory processing across clusters. It supports parallel execution for faster backend querying on large or complex datasets.
4. Stateless and Lightweight
As a stateless engine, Trino executes queries without persisting data or maintaining intermediate storage. This makes it a good fit for ephemeral workloads and elastic environments.
5. Extensible with 30+ Connectors
Trino supports a wide range of structured and semi-structured data sources through its modular connector architecture, ideal for backend systems that need data federation.
6. Open and Vendor-Agnostic
Trino is fully open-source, with active development from its original creators. Its pluggable design makes it compatible with a variety of architectures, including lakehouses, cloud-native stacks, and custom pipelines.
Challenges with Trino
1. No Native Storage or Caching
Trino doesn’t persist or cache query results. This limits its use for repeated access patterns or dashboarding without pairing it with a caching or visualization layer like Knowi.
2. Memory-Intensive Workloads
As an in-memory engine, Trino requires careful tuning to avoid out-of-memory errors when dealing with large joins, aggregations, or skewed data distributions.
3. Minimal Security and Governance Features
Trino has basic authentication and authorization but lacks robust user management, role-based access, or row-level security, which are often required for business-facing use.
4. Uneven Connector Maturity
Not all Trino connectors offer equal performance or pushdown support. Query performance can vary significantly depending on the source and connector implementation.
5. Operational Overhead for Production Use
Running Trino in production requires DevOps ownership to manage worker scaling, fault tolerance, upgrades, and system monitoring.
6. No Built-in Visualization or BI Layer
Trino does not offer dashboards, visualizations, or reporting capabilities. For business insights, it must be integrated with a BI platform like Knowi.
When to Use Trino
1. Backend Federated Query Engine
Trino is ideal for teams building data platforms or services that need to query multiple data sources without moving or transforming the data.
2. Low-Latency SQL on Data Lakes
For lakehouse architectures, Trino provides fast SQL access over formats like Iceberg, Hive, and Delta Lake, especially when paired with S3, GCS, or HDFS.
3. Semantic Query Layer for BI Tools
Trino can serve as a virtual SQL endpoint across systems, enabling BI tools like Knowi, Superset, or Tableau to query without managing multiple data connections.
4. Data Science Exploration
Technical users can use Trino to run ad-hoc queries across raw datasets from different sources, without having to ingest or model the data first.
5. Query Acceleration Layer
Trino can serve as a high-performance engine to accelerate BI workloads by pre-processing complex logic or joins before passing results to downstream tools like Knowi.
6. Transitional Access Layer During Migrations
When migrating from legacy to modern systems, Trino allows unified access to both, enabling teams to maintain continuity in analytics workflows.
Prerequisites
Before starting, ensure you have:
- Docker Desktop installed – Download Docker
- At least 4GB of free memory
- Basic familiarity with SQL and the command line
Step 1: Set Up a Trino Server with Docker
1. Simple Single-Node Trino
Create a docker-compose.yml file:
version: '3.8'
services:
trino:
image: trinodb/trino:latest
container_name: trino-server
ports:
- "8080:8080"
environment:
- JAVA_TOOL_OPTIONS=-Xmx4G
Start Trino:
docker-compose up -d
Access Trino:
- Web UI: http://localhost:8080
- Default user: any username (no password required)
2. Trino with Built-in Catalogs
Create an enhanced docker-compose.yml:
version: '3.8'
services:
trino:
image: trinodb/trino:latest
container_name: trino-server
ports:
- "8080:8080"
environment:
- JAVA_TOOL_OPTIONS=-Xmx4G
volumes:
- ./trino-config:/etc/trino/catalog
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/v1/info"]
interval: 30s
timeout: 10s
retries: 5
Create catalog configurations:
TPC-H Catalog (trino-config/tpch.properties):
connector.name=tpch
TPC-DS Catalog (trino-config/tpcds.properties):
connector.name=tpcds
Memory Catalog (trino-config/memory.properties):
connector.name=memory
Step 2: Connect PostgreSQL & MySQL as Trino Catalogs
Complete Setup with PostgreSQL and MySQL
Create a comprehensive docker-compose.yml:
version: '3.8'
networks:
trino-network:
driver: bridge
services:
trino:
image: trinodb/trino:latest
container_name: trino-server
ports:
- "8080:8080"
environment:
- JAVA_TOOL_OPTIONS=-Xmx4G
volumes:
- ./trino-config:/etc/trino/catalog
- ./trino-data:/var/trino/data
networks:
- trino-network
depends_on:
- postgres
- mysql
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/v1/info"]
interval: 30s
timeout: 10s
retries: 5
postgres:
image: postgres:14
container_name: trino-postgres
environment:
POSTGRES_USER: trino
POSTGRES_PASSWORD: trino123
POSTGRES_DB: analytics
ports:
- "5432:5432"
volumes:
- ./postgres-init:/docker-entrypoint-initdb.d
- postgres-data:/var/lib/postgresql/data
networks:
- trino-network
mysql:
image: mysql:8.0
container_name: trino-mysql
environment:
MYSQL_ROOT_PASSWORD: rootpass
MYSQL_DATABASE: sales
MYSQL_USER: trino
MYSQL_PASSWORD: trino123
ports:
- "3306:3306"
volumes:
- ./mysql-init:/docker-entrypoint-initdb.d
- mysql-data:/var/lib/mysql
networks:
- trino-network
volumes:
postgres-data:
mysql-data:
Catalog Configurations
PostgreSQL Catalog (trino-config/postgresql.properties):
connector.name=postgresql
connection-url=jdbc:postgresql://postgres:5432/analytics
connection-user=trino
connection-password=trino123
MySQL Catalog (trino-config/mysql.properties):
connector.name=mysql
connection-url=jdbc:mysql://mysql:3306/sales
connection-user=trino
connection-password=trino123
Step 3: Load Sample Data into PostgreSQL & MySQL
PostgreSQL Sample Data
Create postgres-init/01-create-tables.sql:
CREATE SCHEMA IF NOT EXISTS ecommerce;
CREATE TABLE ecommerce.customers (
customer_id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
country VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE ecommerce.products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(200),
category VARCHAR(100),
price DECIMAL(10, 2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE ecommerce.orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES ecommerce.customers(customer_id),
order_date DATE,
total_amount DECIMAL(10, 2),
status VARCHAR(50)
);
INSERT INTO ecommerce.customers (first_name, last_name, email, country) VALUES
('John', 'Doe', 'john.doe@email.com', 'USA'),
('Jane', 'Smith', 'jane.smith@email.com', 'Canada'),
('Bob', 'Johnson', 'bob.j@email.com', 'UK');
INSERT INTO ecommerce.products (product_name, category, price) VALUES
('Laptop Pro', 'Electronics', 1299.99),
('Wireless Mouse', 'Electronics', 29.99),
('Office Chair', 'Furniture', 249.99);
INSERT INTO ecommerce.orders (customer_id, order_date, total_amount, status) VALUES
(1, '2024-01-15', 1329.98, 'Completed'),
(2, '2024-01-16', 249.99, 'Completed'),
(3, '2024-01-17', 29.99, 'Processing');
MySQL Sample Data
Create mysql-init/01-create-tables.sql:
CREATE TABLE IF NOT EXISTS sales_transactions (
transaction_id INT AUTO_INCREMENT PRIMARY KEY,
store_id INT,
product_id INT,
quantity INT,
sale_date DATE,
amount DECIMAL(10, 2)
);
CREATE TABLE IF NOT EXISTS stores (
store_id INT PRIMARY KEY,
store_name VARCHAR(100),
city VARCHAR(50),
state VARCHAR(50)
);
INSERT INTO stores VALUES
(1, 'Downtown Store', 'New York', 'NY'),
(2, 'Mall Store', 'Los Angeles', 'CA'),
(3, 'Airport Store', 'Chicago', 'IL');
INSERT INTO sales_transactions (store_id, product_id, quantity, sale_date, amount) VALUES
(1, 1, 2, '2024-01-15', 2599.98),
(2, 2, 5, '2024-01-15', 149.95),
(3, 3, 1, '2024-01-16', 249.99);
Start Everything
# Create necessary directories
mkdir -p trino-config trino-data postgres-init mysql-init
# Start all services
docker-compose up -d
# View service status
docker-compose ps
# Check logs
docker-compose logs trino
Step 4: Visualize Trino Data with Knowi
Connection Parameters
- For Local Development:
- Host: localhost
- Port: 8080
- Catalog: tpch (or postgresql, mysql, etc.)
- Schema: sf1 (for TPC-H) or appropriate schema
- Username: (leave empty or any username)
- Password: (leave empty)
- For Docker-to-Docker Connection: If Knowi is also running in Docker:
- Host: trino-server (container name)
- Port: 8080
- Other parameters remain the same
Sample Queries to Test
Query across multiple catalogs:
-- Join PostgreSQL and MySQL data
SELECT
s.store_name,
s.city,
SUM(t.amount) AS total_sales
FROM
mysql.sales.sales_transactions t
JOIN
postgresql.analytics.ecommerce.orders o ON t.transaction_id = o.order_id
JOIN
mysql.sales.stores s ON t.store_id = s.store_id
GROUP BY
s.store_name, s.city
ORDER BY
total_sales DESC;
TPC-H Sample Query:
-- Top customers by order value
SELECT
c.name,
SUM(o.totalprice) as total_spent
FROM
tpch.sf1.customer c
JOIN
tpch.sf1.orders o ON c.custkey = o.custkey
GROUP BY
c.name
ORDER BY
total_spent DESC
LIMIT 10;
5. Common Errors & How to Fix Them
Common Issues and Solutions
Container won’t start
Check logs
docker-compose logs trino
Ensure enough memory
docker system info | grep Memory
Connection refused
Check if Trino is running
curl http://localhost:8080/v1/info
Check container status
docker ps
Out of memory errors
Increase memory allocation
docker-compose down
Edit docker-compose.yml, change JAVA_TOOL_OPTIONS to -Xmx8G
docker-compose up -d
Cannot connect to PostgreSQL/MySQL
Ensure containers are on same network
docker network ls
docker network inspect trino-network
6. Performance Optimization Tips
Memory Configuration
Create trino-config/config.properties:
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://localhost:8080
query.max-memory=5GB
query.max-memory-per-node=1GB
JVM Configuration
Create trino-config/jvm.config:
-server
-Xmx8G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
Query Optimization Tips
- Use appropriate data types
- Partition large tables
- Use predicate pushdown
- Monitor query performance via Web UI
Next Steps
- Security: Configure authentication and SSL
- Monitoring: Set up Prometheus/Grafana
- Scale: Add worker nodes for distributed queries
- Production: Use Kubernetes for orchestration
Useful Commands
Stop all containers
docker-compose down
Stop all containers
docker-compose down
Remove all data (fresh start)
docker-compose down -v
View real-time logs
docker-compose logs -f trino
Execute Trino CLI
docker exec -it trino-server trino
Check catalog list
docker exec trino-server trino --execute "SHOW CATALOGS"
Check schemas
docker exec trino-server trino --execute "SHOW SCHEMAS FROM tpch"
Additional Resources
Ready to Visualize Trino Data?
Once your Trino instance is up and running, it’s time to unlock real insights.
Knowi connects natively to Trino and lets you:
- Build dashboards directly on top of your Trino queries
- Run cross-source analytics with zero ETL
- Use Natural Language Queries to ask questions in plain English
- Blend Trino results with MongoDB, REST APIs, or spreadsheets
- Share, embed, and automate insights across your teams
Try Knowi for free or request a 15-minute demo to see how AI-powered analytics on Trino works in action.
Frequently Asked Questions
What is Trino and why should I use it?
Trino is an open-source distributed SQL query engine that lets you run federated queries across multiple data sources—including relational databases, NoSQL, and object stores—without needing to move or transform the data. It’s ideal for interactive, large-scale analytics.
Can I run Trino locally using Docker?
Yes. This blog walks you through setting up a single-node Trino server using Docker, which is perfect for local development, testing, or POCs (proof of concepts).
What types of data sources can I connect to Trino?
Trino supports over 30 connectors. In this guide, we cover PostgreSQL and MySQL, but you can also connect to MongoDB, Cassandra, S3, Hive, Elasticsearch, and more.
Do I need to load data into Trino?
No. Trino doesn’t store data—it connects to external systems where your data lives and queries it in place. However, for this tutorial, we show how to load sample data into PostgreSQL and MySQL to test queries.
What are some common issues I might face during setup?
Typical issues include Docker containers failing to start (usually memory-related), Trino not recognizing catalogs (misconfigured files), or connection errors between Trino and databases (network or credential problems). The blog has a dedicated troubleshooting section to help with these.
How do I visualize Trino data?
You can use Knowi, a BI platform with native Trino integration. Knowi lets you run SQL queries on Trino, build dashboards, and even ask questions in plain English using Natural Language Queries—no ETL required.
Is this setup production-ready?
No. This is a development/test setup. For production, you’ll need to configure security, scaling, authentication, monitoring, and possibly run Trino in a Kubernetes or cloud-native environment.