This guide walks you through deploying Trino on AWS using EC2, EMR, or EKS, depending on your needs, whether it’s for local testing or a production-grade setup. It includes everything from installation and configuration to security hardening, monitoring, and connecting with Knowi.
Deployment Options:
- EC2: Fast and simple for dev/testing
- EMR: Managed, scalable clusters for production
- EKS: Kubernetes-native deployment for advanced users
Data Source Integrations:
- S3 (Glue/Hive)
- PostgreSQL / MySQL (RDS)
- Redshift
- DynamoDB
Security Best Practices:
- IAM roles (no access keys)
- VPC + private subnets
- TLS encryption & MFA
- Use AWS Systems Manager instead of SSH
Monitoring & Scaling:
- CloudWatch agent setup for logs
- Auto-scaling workers via EKS HPA
- Query resource groups for cost control
Cost Optimization Tips:
- Use Spot Instances for workers
- Enable S3 requester pays if needed
- Right-size EC2/EMR resources
Knowi Integration:
- Connect via public IP or VPC peering
- Use basic auth or OAuth
- Specify catalog and port (8080 or 8443)
Note: Testing examples may use relaxed security—review the Security Configuration section before deploying in production.
Introduction
This guide covers multiple approaches to deploying Trino on AWS, from simple single-node setups for testing to production-ready distributed clusters. Choose the option that best fits your needs:
- EC2 Setup: Quick and simple for testing and development
- EMR Setup: Managed service with automatic scaling
- EKS Setup: Kubernetes-based for maximum flexibility
⚠️ SECURITY NOTICE: This guide includes examples for both testing and production environments. Testing examples may include simplified security settings that are NOT suitable for production. Always follow the security best practices section for production deployments. Never expose services to 0.0.0.0/0 or use hardcoded credentials in production.
Architecture Overview
Typical Trino AWS Architecture
┌───────────────┐ ┌───────────────┐ ┌────────────────────┐
│ Trino Users │────▶│ Load Balancer │────▶│ Trino Coordinator │
└───────────────┘ └───────────────┘ └────────────┬───────┘
│
┌─────────────────────────▼─────────────────────────┐
│ │ │
┌───────▼────────┐ ┌────────▼────────┐ ┌─────────▼────────┐
│ Trino Worker 1 │ │ Trino Worker 2 │ │ Trino Worker N │
└────────────────┘ └─────────────────┘ └──────────────────┘
│ │ │
┌────────────────────┴────────────┬────────────┴────────────┬────────────┐
│ │ │ │
┌───────▼──────┐ ┌────────▼────────┐ ┌────────▼────────┐ │
│ Amazon S3 │ │ Amazon RDS │ │ Amazon Redshift │ │
└──────────────┘ └─────────────────┘ └─────────────────┘ │
┌───────────────▼──────┐
│ Amazon Athena / ... │
└──────────────────────┘
Prerequisites
Make sure you have:
- AWS account with admin or IAM permissions
- AWS CLI configured (
aws configure
) - SSH key pair
- Familiarity with EC2, IAM, VPC
- (For EKS)
kubectl
,eksctl
, andhelm
installed
Option 1: EC2 Quick Setup
Best for: Testing, local development
Step 1: Launch and Configure EC2
``bash
# Set variables
REGION="us-east-1"
KEY_NAME="your-key-pair"
SECURITY_GROUP="sg-trino"
# Create security group
aws ec2 create-security-group \
--group-name $SECURITY_GROUP \
--description "Security group for Trino" \
--region $REGION
# Allow SSH from your IP only (replace with your actual IP)
MY_IP=$(curl -s https://checkip.amazonaws.com)
aws ec2 authorize-security-group-ingress \
--group-name $SECURITY_GROUP \
--protocol tcp \
--port 22 \
--cidr ${MY_IP}/32 \
--region $REGION \
--group-rule-description "SSH access from my IP"
# For production, use ALB/NLB instead of direct access
# For testing, restrict to specific IPs or VPN
aws ec2 authorize-security-group-ingress \
--group-name $SECURITY_GROUP \
--protocol tcp \
--port 8080 \
--cidr ${MY_IP}/32 \
--region $REGION \
--group-rule-description "Trino UI access from my IP"
# Better practice: Use Systems Manager Session Manager for SSH
# aws ec2 authorize-security-group-ingress \
# --group-name $SECURITY_GROUP \
# --protocol tcp \
# --port 443 \
# --source-group $ALB_SECURITY_GROUP \
# --region $REGION
# Create IAM role for Trino EC2 instance
aws iam create-role \
--role-name TrinoEC2Role \
--assume-role-policy-document file://trust-policy.json
aws iam attach-role-policy \
--role-name TrinoEC2Role \
--policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
aws iam create-instance-profile \
--instance-profile-name TrinoEC2Profile
aws iam add-role-to-instance-profile \
--instance-profile-name TrinoEC2Profile \
--role-name TrinoEC2Role
# Launch EC2 instance with IAM role
aws ec2 run-instances \
--image-id ami-0c02fb55956c7d316 \
--instance-type m5.xlarge \
--key-name $KEY_NAME \
--security-groups $SECURITY_GROUP \
--region $REGION \
--iam-instance-profile Name=TrinoEC2Profile \
--block-device-mappings DeviceName=/dev/xvda,Ebs={VolumeSize=100,Encrypted=true} \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=trino-server}]' \
--metadata-options "HttpTokens=required,HttpPutResponseHopLimit=2,HttpEndpoint=enabled"
Step 2: Install Trino
SSH into your instance and run:
```bash
#!/bin/bash
# install-trino.sh
# Update system
sudo yum update -y
# Install Java 11
sudo amazon-linux-extras install java-openjdk11 -y
# Download and install Trino
cd /opt
sudo wget https://repo1.maven.org/maven2/io/trino/trino-server/450/trino-server-450.tar.gz
sudo tar -xzf trino-server-450.tar.gz
sudo mv trino-server-450 trino
sudo rm trino-server-450.tar.gz
# Create directories
sudo mkdir -p /opt/trino/etc/catalog
sudo mkdir -p /var/trino/data
# Create node.properties
sudo tee /opt/trino/etc/node.properties > /dev/null <<EOF
node.environment=production
node.id=$(uuidgen)
node.data-dir=/var/trino/data
EOF
# Create JVM config
sudo tee /opt/trino/etc/jvm.config > /dev/null <<EOF
-server
-Xmx8G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
EOF
# Create config.properties for single-node
sudo tee /opt/trino/etc/config.properties > /dev/null <<EOF
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://localhost:8080
query.max-memory=5GB
query.max-memory-per-node=1GB
EOF
# Create catalog for S3 data (using IAM role)
sudo tee /opt/trino/etc/catalog/hive.properties > /dev/null <<EOF
connector.name=hive
hive.metastore.uri=thrift://localhost:9083
# Use IAM role instead of access keys
hive.s3.use-instance-credentials=true
hive.s3.region=us-east-1
EOF
# Create systemd service
sudo tee /etc/systemd/system/trino.service > /dev/null <<EOF
[Unit]
Description=Trino Server
After=network.target
[Service]
Type=forking
ExecStart=/opt/trino/bin/launcher start
ExecStop=/opt/trino/bin/launcher stop
User=ec2-user
Group=ec2-user
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# Set permissions and start service
sudo chown -R ec2-user:ec2-user /opt/trino /var/trino
sudo systemctl daemon-reload
sudo systemctl enable trino
sudo systemctl start trino
# Check status
sudo systemctl status trino
Step 3: Access Trino
# Get public IP
PUBLIC_IP=$(aws ec2 describe-instances \
--filters "Name=tag:Name,Values=trino-server" \
--query "Reservations[0].Instances[0].PublicIpAddress" \
--output text)
echo "Trino Web UI: http://$PUBLIC_IP:8080"
Option 2: Production Setup with EMR
Step 1: Create EMR Cluster with Trino
# Create EMR cluster with Trino
aws emr create-cluster \
--name "Trino-EMR-Cluster" \
--release-label emr-6.10.0 \
--applications Name=Trino Name=Hive \
--instance-type m5.xlarge \
--instance-count 3 \
--use-default-roles \
--region $REGION \
--log-uri s3://your-bucket/emr-logs/ \
--configurations file://emr-configurations.json
Create emr-configurations.json:
[
{
"Classification": "trino-config",
"Properties": {
"query.max-memory": "20GB",
"query.max-memory-per-node": "8GB"
}
},
{
"Classification": "trino-connector-hive",
"Properties": {
"hive.s3.endpoint": "s3.amazonaws.com",
"hive.s3.path-style-access": "false"
}
}
]
Step 2: Configure Additional Catalogs
SSH to master node and add catalogs:
# PostgreSQL RDS connector (using AWS Secrets Manager)
sudo tee /etc/trino/conf.dist/catalog/postgresql.properties > /dev/null <<EOF
connector.name=postgresql
connection-url=jdbc:postgresql://your-rds-endpoint.amazonaws.com:5432/database
connection-user=username
# For production, use AWS Secrets Manager or Parameter Store
# connection-password=${file:/opt/trino/secrets/postgresql-password}
connection-password=CHANGE_ME_USE_SECRETS_MANAGER
EOF
# Redshift connector (using AWS Secrets Manager)
sudo tee /etc/trino/conf.dist/catalog/redshift.properties > /dev/null <<EOF
connector.name=redshift
connection-url=jdbc:redshift://your-cluster.redshift.amazonaws.com:5439/database
connection-user=username
# For production, use AWS Secrets Manager or Parameter Store
# connection-password=${file:/opt/trino/secrets/redshift-password}
connection-password=CHANGE_ME_USE_SECRETS_MANAGER
EOF
# Example: Retrieve passwords from AWS Secrets Manager
# aws secretsmanager get-secret-value --secret-id trino/postgresql/password --query SecretString --output text > /opt/trino/secrets/postgresql-password
# chmod 400 /opt/trino/secrets/postgresql-password
# Restart Trino
sudo restart trino-server
Option 3: EKS Deployment
Step 1: Create EKS Cluster
# Install eksctl if needed
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
# Create cluster
eksctl create cluster \
--name trino-cluster \
--version 1.27 \
--region $REGION \
--nodegroup-name standard-workers \
--node-type m5.xlarge \
--nodes 3 \
--managed
Step 2: Deploy Trino with Helm
# Add Trino Helm repository
helm repo add trino https://trinodb.github.io/charts
helm repo update
# Create values.yaml
cat > values.yaml <<EOF
image:
tag: "450"
server:
workers: 2
coordinator:
resources:
requests:
memory: "8Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "2"
worker:
resources:
requests:
memory: "8Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "2"
additionalCatalogs:
postgresql: |
connector.name=postgresql
connection-url=jdbc:postgresql://your-rds-endpoint:5432/database
connection-user=username
connection-password=password
s3: |
connector.name=hive
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.endpoint=s3.amazonaws.com
EOF
# Deploy Trino
helm install trino trino/trino -f values.yaml
# Get service endpoint
kubectl get service trino
Step 3: Expose Trino
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: trino-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: trino
port:
number: 8080
Configuring Data Sources
Amazon S3 with Glue Catalog
# /opt/trino/etc/catalog/glue.properties
connector.name=hive
hive.metastore=glue
hive.metastore.glue.region=us-east-1
hive.metastore.glue.default-warehouse-dir=s3://your-bucket/warehouse/
Amazon RDS
# /opt/trino/etc/catalog/mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://your-rds.amazonaws.com:3306
connection-user=admin
connection-password=password
Amazon Redshift
# /opt/trino/etc/catalog/redshift.properties
connector.name=redshift
connection-url=jdbc:redshift://your-cluster.redshift.amazonaws.com:5439/dev
connection-user=admin
connection-password=password
Amazon DynamoDB
# /opt/trino/etc/catalog/dynamodb.properties
connector.name=dynamodb
dynamodb.region=us-east-1
# Use IAM role authentication - no access keys needed
# Ensure EC2 instance has IAM role with DynamoDB permissions
Security Configuration
Security Best Practices
IMPORTANT: Never use 0.0.0.0/0 in production! The examples above use restricted IP access. For production deployments:
- Network Security:
- Use VPC with private subnets for Trino nodes
- Deploy Application Load Balancer (ALB) in public subnets
- Use VPC endpoints for S3 access
- Implement VPC Flow Logs for monitoring
- Access Control:
- Use AWS Systems Manager Session Manager instead of SSH
- Implement SAML/OAuth authentication via ALB
- Use IAM roles instead of access keys
- Enable MFA for administrative access
- Encryption:
- Enable encryption at rest for all data stores
- Use TLS 1.2+ for all connections
- Encrypt data in transit between nodes
- Use AWS KMS for key management
Production Security Group Configuration
# Create VPC and subnets first
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 --query 'Vpc.VpcId' --output text)
PRIVATE_SUBNET=$(aws ec2 create-subnet --vpc-id $VPC_ID --cidr-block 10.0.1.0/24 --query 'Subnet.SubnetId' --output text)
PUBLIC_SUBNET=$(aws ec2 create-subnet --vpc-id $VPC_ID --cidr-block 10.0.2.0/24 --query 'Subnet.SubnetId' --output text)
# ALB Security Group (public facing)
ALB_SG=$(aws ec2 create-security-group \
--group-name trino-alb-sg \
--description "Security group for Trino ALB" \
--vpc-id $VPC_ID \
--query 'GroupId' --output text)
# Allow HTTPS only from specific IP ranges or CloudFront
aws ec2 authorize-security-group-ingress \
--group-id $ALB_SG \
--protocol tcp \
--port 443 \
--cidr YOUR_OFFICE_IP_RANGE/24 \
--group-rule-description "HTTPS from office network"
# Trino Security Group (private)
TRINO_SG=$(aws ec2 create-security-group \
--group-name trino-nodes-sg \
--description "Security group for Trino nodes" \
--vpc-id $VPC_ID \
--query 'GroupId' --output text)
# Allow traffic only from ALB
aws ec2 authorize-security-group-ingress \
--group-id $TRINO_SG \
--protocol tcp \
--port 8080 \
--source-group $ALB_SG \
--group-rule-description "HTTP from ALB only"
# Allow inter-node communication
aws ec2 authorize-security-group-ingress \
--group-id $TRINO_SG \
--protocol tcp \
--port 8080 \
--source-group $TRINO_SG \
--group-rule-description "Inter-node communication"
IAM Role for EC2
{
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::your-bucket/*",
"arn:aws:s3:::your-bucket"
]
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetTable",
"glue:GetPartitions"
],
"Resource": "*"
}
]
}
Enable HTTPS
# config.properties additions
http-server.https.enabled=true
http-server.https.port=8443
http-server.https.keystore.path=/path/to/keystore.jks
http-server.https.keystore.key=password
Authentication
# config.properties for basic auth
http-server.authentication.type=PASSWORD
# password-authenticator.properties
password-authenticator.name=file
file.password-file=/opt/trino/etc/password.db
Monitoring and Management
CloudWatch Integration
# Install CloudWatch agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
sudo rpm -U ./amazon-cloudwatch-agent.rpm
# Configure for Trino logs
sudo tee /opt/aws/amazon-cloudwatch-agent/etc/config.json > /dev/null <<EOF
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/trino/data/var/log/server.log",
"log_group_name": "trino-logs",
"log_stream_name": "{instance_id}"
}
]
}
}
}
}
EOF
# Start CloudWatch agent
sudo systemctl start amazon-cloudwatch-agent
Auto Scaling
# For EKS deployment - HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: trino-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: trino-worker
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Cost Optimization
1. Use Spot Instances for Workers
# EMR with spot instances
aws emr create-cluster \
--instance-groups \
InstanceGroupType=MASTER,InstanceType=m5.xlarge,InstanceCount=1 \
InstanceGroupType=CORE,InstanceType=m5.xlarge,InstanceCount=2,BidPrice=0.10
2. Implement Query Resource Groups
# resource-groups.properties
resource-groups.configuration-manager=file
resource-groups.config-file=/opt/trino/etc/resource-groups.json
{
"rootGroups": [
{
"name": "global",
"softMemoryLimit": "80%",
"hardConcurrencyLimit": 100,
"maxQueued": 1000,
"subGroups": [
{
"name": "analytics",
"softMemoryLimit": "50%",
"hardConcurrencyLimit": 20
},
{
"name": "adhoc",
"softMemoryLimit": "30%",
"hardConcurrencyLimit": 5
}
]
}
]
}
3. Enable S3 Request Payer
# In hive catalog
hive.s3.requester-pays.enabled=true
Troubleshooting
Common Issues
Out of Memory Errors
# Check memory usage
curl http://localhost:8080/v1/cluster/memory
# Increase heap size in jvm.config
-Xmx16G
S3 Access Denied
# Check IAM role
aws sts get-caller-identity
# Verify S3 permissions
aws s3 ls s3://your-bucket/
Connection Timeouts
# Check security groups
aws ec2 describe-security-groups --group-names sg-trino
# Test connectivity
telnet trino-server 8080
Performance Tuning
- Query Optimization
-- Use EXPLAIN to analyze query plans
EXPLAIN (TYPE DISTRIBUTED)
SELECT * FROM large_table WHERE date > '2024-01-01';
- JVM Tuning
# Advanced JVM settings
-XX:+UnlockDiagnosticVMOptions
-XX:G1NumCollectionsKeepPinned=10000000
- Network Optimization
# Enable enhanced networking
aws ec2 modify-instance-attribute \
--instance-id i-xxxxx \
--ena-support
Connecting from Knowi
Connection Parameters
- Public Endpoint:
- Host: your-alb-endpoint.amazonaws.com or EC2 public IP
- Port: 8080 (or 8443 for HTTPS)
- Catalog: Your configured catalog (e.g., hive, postgresql)
- Username/Password: As configured
- VPC Peering:
- Set up VPC peering between Knowi and Trino VPCs
- Use internal endpoints for better security and performance
Best Practices
- Use SSL/TLS for production
- Implement proper authentication
- Set up query resource limits
- Monitor query performance
- Regular backups of configuration
Next Steps
- Set up monitoring dashboards in CloudWatch or Grafana
- Implement disaster recovery with multi-region setup
- Optimize query performance with proper partitioning
- Integrate with AWS Lake Formation for data governance
Set up CI/CD for configuration management
Suggested Knowi CTA
Unlock your AWS data with AI-powered analytics.
Connect Trino to Knowi and start exploring cross-source insights in minutes.
Book a 15-minute demo or Start a free trial today.
Frequently Asked Questions
What is Trino and how does it help with analytics?
Trino is a distributed SQL query engine that allows you to run fast, interactive queries across multiple data sources (like S3, RDS, and Redshift) without moving or duplicating data.
Why pair Trino with Knowi?
Knowi adds a visual analytics and AI layer on top of Trino. You can create dashboards, ask natural-language questions, and generate AI-driven insights, all without writing complex SQL.
Does Knowi require ETL when using Trino?
No. Knowi connects natively to Trino’s federated query engine, so you can analyze data across different sources without additional ETL pipelines.
How secure is a Knowi–Trino integration?
Knowi supports VPC peering, SSL/TLS encryption, and IAM-based authentication. Your data remains in your AWS environment; Knowi only queries it.
Can Knowi handle large datasets queried through Trino?
Yes. Knowi is built for high-volume, real-time analytics, and can visualize results from large, multi-source Trino queries efficiently.
What kinds of visualizations can I build?
From interactive dashboards and time-series charts to map visualizations and AI-generated insights, Knowi offers a wide variety of visualization options out of the box.
How quickly can I get started?
Typically within minutes,simply connect Knowi to your Trino coordinator endpoint, select your catalogs, and begin building dashboards or using natural-language queries.