a

Setting Up Trino on AWS: Complete Deployment Guide with Knowi Integration

Setting Up Trino on AWS: Step-by-Step Deployment Guide

This guide walks you through deploying Trino on AWS using EC2, EMR, or EKS, depending on your needs, whether it’s for local testing or a production-grade setup. It includes everything from installation and configuration to security hardening, monitoring, and connecting with Knowi.

Deployment Options:

  • EC2: Fast and simple for dev/testing
  • EMR: Managed, scalable clusters for production
  • EKS: Kubernetes-native deployment for advanced users

Data Source Integrations:

  • S3 (Glue/Hive)
  • PostgreSQL / MySQL (RDS)
  • Redshift
  • DynamoDB

Security Best Practices:

  • IAM roles (no access keys)
  • VPC + private subnets
  • TLS encryption & MFA
  • Use AWS Systems Manager instead of SSH

Monitoring & Scaling:

  • CloudWatch agent setup for logs
  • Auto-scaling workers via EKS HPA
  • Query resource groups for cost control

Cost Optimization Tips:

  • Use Spot Instances for workers
  • Enable S3 requester pays if needed
  • Right-size EC2/EMR resources

Knowi Integration:

  • Connect via public IP or VPC peering
  • Use basic auth or OAuth
  • Specify catalog and port (8080 or 8443)

Note: Testing examples may use relaxed security—review the Security Configuration section before deploying in production.

Introduction

This guide covers multiple approaches to deploying Trino on AWS, from simple single-node setups for testing to production-ready distributed clusters. Choose the option that best fits your needs:

  • EC2 Setup: Quick and simple for testing and development
  • EMR Setup: Managed service with automatic scaling
  • EKS Setup: Kubernetes-based for maximum flexibility

⚠️ SECURITY NOTICE: This guide includes examples for both testing and production environments. Testing examples may include simplified security settings that are NOT suitable for production. Always follow the security best practices section for production deployments. Never expose services to 0.0.0.0/0 or use hardcoded credentials in production.

Architecture Overview

Typical Trino AWS Architecture

┌───────────────┐     ┌───────────────┐     ┌────────────────────┐
│   Trino Users │────▶│ Load Balancer │────▶│ Trino Coordinator  │
└───────────────┘     └───────────────┘     └────────────┬───────┘
                                                        │
                              ┌─────────────────────────▼─────────────────────────┐
                              │                         │                         │
                      ┌───────▼────────┐      ┌────────▼────────┐      ┌─────────▼────────┐
                      │ Trino Worker 1 │      │ Trino Worker 2  │      │ Trino Worker N   │
                      └────────────────┘      └─────────────────┘      └──────────────────┘
                              │                         │                         │
         ┌────────────────────┴────────────┬────────────┴────────────┬────────────┐
         │                                 │                          │            │
 ┌───────▼──────┐                 ┌────────▼────────┐        ┌────────▼────────┐   │
 │ Amazon S3    │                 │ Amazon RDS      │        │ Amazon Redshift │   │
 └──────────────┘                 └─────────────────┘        └─────────────────┘   │
                                                           ┌───────────────▼──────┐
                                                           │ Amazon Athena / ...  │
                                                           └──────────────────────┘

Prerequisites

Make sure you have:

  • AWS account with admin or IAM permissions
  • AWS CLI configured (aws configure)
  • SSH key pair
  • Familiarity with EC2, IAM, VPC
  • (For EKS) kubectl, eksctl, and helm installed

Option 1: EC2 Quick Setup

Best for: Testing, local development

Step 1: Launch and Configure EC2

``bash
# Set variables
REGION="us-east-1"
KEY_NAME="your-key-pair"
SECURITY_GROUP="sg-trino"

# Create security group
aws ec2 create-security-group \
  --group-name $SECURITY_GROUP \
  --description "Security group for Trino" \
  --region $REGION

# Allow SSH from your IP only (replace with your actual IP)
MY_IP=$(curl -s https://checkip.amazonaws.com)
aws ec2 authorize-security-group-ingress \
  --group-name $SECURITY_GROUP \
  --protocol tcp \
  --port 22 \
  --cidr ${MY_IP}/32 \
  --region $REGION \
  --group-rule-description "SSH access from my IP"

# For production, use ALB/NLB instead of direct access
# For testing, restrict to specific IPs or VPN
aws ec2 authorize-security-group-ingress \
  --group-name $SECURITY_GROUP \
  --protocol tcp \
  --port 8080 \
  --cidr ${MY_IP}/32 \
  --region $REGION \
  --group-rule-description "Trino UI access from my IP"

# Better practice: Use Systems Manager Session Manager for SSH
# aws ec2 authorize-security-group-ingress \
#   --group-name $SECURITY_GROUP \
#   --protocol tcp \
#   --port 443 \
#   --source-group $ALB_SECURITY_GROUP \
#   --region $REGION

# Create IAM role for Trino EC2 instance
aws iam create-role \
  --role-name TrinoEC2Role \
  --assume-role-policy-document file://trust-policy.json

aws iam attach-role-policy \
  --role-name TrinoEC2Role \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

aws iam create-instance-profile \
  --instance-profile-name TrinoEC2Profile

aws iam add-role-to-instance-profile \
  --instance-profile-name TrinoEC2Profile \
  --role-name TrinoEC2Role

# Launch EC2 instance with IAM role
aws ec2 run-instances \
  --image-id ami-0c02fb55956c7d316 \
  --instance-type m5.xlarge \
  --key-name $KEY_NAME \
  --security-groups $SECURITY_GROUP \
  --region $REGION \
  --iam-instance-profile Name=TrinoEC2Profile \
  --block-device-mappings DeviceName=/dev/xvda,Ebs={VolumeSize=100,Encrypted=true} \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=trino-server}]' \
  --metadata-options "HttpTokens=required,HttpPutResponseHopLimit=2,HttpEndpoint=enabled"

Step 2: Install Trino

SSH into your instance and run:

```bash
#!/bin/bash
# install-trino.sh

# Update system
sudo yum update -y

# Install Java 11
sudo amazon-linux-extras install java-openjdk11 -y

# Download and install Trino
cd /opt
sudo wget https://repo1.maven.org/maven2/io/trino/trino-server/450/trino-server-450.tar.gz
sudo tar -xzf trino-server-450.tar.gz
sudo mv trino-server-450 trino
sudo rm trino-server-450.tar.gz

# Create directories
sudo mkdir -p /opt/trino/etc/catalog
sudo mkdir -p /var/trino/data

# Create node.properties
sudo tee /opt/trino/etc/node.properties > /dev/null <<EOF
node.environment=production
node.id=$(uuidgen)
node.data-dir=/var/trino/data
EOF

# Create JVM config
sudo tee /opt/trino/etc/jvm.config > /dev/null <<EOF
-server
-Xmx8G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
EOF

# Create config.properties for single-node
sudo tee /opt/trino/etc/config.properties > /dev/null <<EOF
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://localhost:8080
query.max-memory=5GB
query.max-memory-per-node=1GB
EOF

# Create catalog for S3 data (using IAM role)
sudo tee /opt/trino/etc/catalog/hive.properties > /dev/null <<EOF
connector.name=hive
hive.metastore.uri=thrift://localhost:9083
# Use IAM role instead of access keys
hive.s3.use-instance-credentials=true
hive.s3.region=us-east-1
EOF

# Create systemd service
sudo tee /etc/systemd/system/trino.service > /dev/null <<EOF
[Unit]
Description=Trino Server
After=network.target

[Service]
Type=forking
ExecStart=/opt/trino/bin/launcher start
ExecStop=/opt/trino/bin/launcher stop
User=ec2-user
Group=ec2-user
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

# Set permissions and start service
sudo chown -R ec2-user:ec2-user /opt/trino /var/trino
sudo systemctl daemon-reload
sudo systemctl enable trino
sudo systemctl start trino

# Check status
sudo systemctl status trino

Step 3: Access Trino

# Get public IP

PUBLIC_IP=$(aws ec2 describe-instances \

  --filters "Name=tag:Name,Values=trino-server" \

  --query "Reservations[0].Instances[0].PublicIpAddress" \

  --output text)

echo "Trino Web UI: http://$PUBLIC_IP:8080"

Option 2: Production Setup with EMR

Step 1: Create EMR Cluster with Trino

# Create EMR cluster with Trino

aws emr create-cluster \

  --name "Trino-EMR-Cluster" \

  --release-label emr-6.10.0 \

  --applications Name=Trino Name=Hive \

  --instance-type m5.xlarge \

  --instance-count 3 \

  --use-default-roles \

  --region $REGION \

  --log-uri s3://your-bucket/emr-logs/ \

  --configurations file://emr-configurations.json

Create emr-configurations.json:

[

  {

    "Classification": "trino-config",

    "Properties": {

      "query.max-memory": "20GB",

      "query.max-memory-per-node": "8GB"

    }

  },

  {

    "Classification": "trino-connector-hive",

    "Properties": {

      "hive.s3.endpoint": "s3.amazonaws.com",

      "hive.s3.path-style-access": "false"

    }

  }

]

Step 2: Configure Additional Catalogs

SSH to master node and add catalogs:

# PostgreSQL RDS connector (using AWS Secrets Manager)
sudo tee /etc/trino/conf.dist/catalog/postgresql.properties > /dev/null <<EOF
connector.name=postgresql
connection-url=jdbc:postgresql://your-rds-endpoint.amazonaws.com:5432/database
connection-user=username
# For production, use AWS Secrets Manager or Parameter Store
# connection-password=${file:/opt/trino/secrets/postgresql-password}
connection-password=CHANGE_ME_USE_SECRETS_MANAGER
EOF

# Redshift connector (using AWS Secrets Manager)
sudo tee /etc/trino/conf.dist/catalog/redshift.properties > /dev/null <<EOF
connector.name=redshift
connection-url=jdbc:redshift://your-cluster.redshift.amazonaws.com:5439/database
connection-user=username
# For production, use AWS Secrets Manager or Parameter Store
# connection-password=${file:/opt/trino/secrets/redshift-password}
connection-password=CHANGE_ME_USE_SECRETS_MANAGER
EOF

# Example: Retrieve passwords from AWS Secrets Manager
# aws secretsmanager get-secret-value --secret-id trino/postgresql/password --query SecretString --output text > /opt/trino/secrets/postgresql-password
# chmod 400 /opt/trino/secrets/postgresql-password

# Restart Trino
sudo restart trino-server

Option 3: EKS Deployment

Step 1: Create EKS Cluster

# Install eksctl if needed

curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp

sudo mv /tmp/eksctl /usr/local/bin

# Create cluster

eksctl create cluster \

  --name trino-cluster \

  --version 1.27 \

  --region $REGION \

  --nodegroup-name standard-workers \

  --node-type m5.xlarge \

  --nodes 3 \

  --managed

Step 2: Deploy Trino with Helm

# Add Trino Helm repository
helm repo add trino https://trinodb.github.io/charts
helm repo update

# Create values.yaml
cat > values.yaml <<EOF
image:
  tag: "450"
  
server:
  workers: 2
  
coordinator:
  resources:
    requests:
      memory: "8Gi"
      cpu: "2"
    limits:
      memory: "8Gi"
      cpu: "2"
      
worker:
  resources:
    requests:
      memory: "8Gi"
      cpu: "2"
    limits:
      memory: "8Gi"
      cpu: "2"
      
additionalCatalogs:
  postgresql: |
    connector.name=postgresql
    connection-url=jdbc:postgresql://your-rds-endpoint:5432/database
    connection-user=username
    connection-password=password
    
  s3: |
    connector.name=hive
    hive.metastore.uri=thrift://hive-metastore:9083
    hive.s3.endpoint=s3.amazonaws.com
EOF

# Deploy Trino
helm install trino trino/trino -f values.yaml

# Get service endpoint
kubectl get service trino

Step 3: Expose Trino

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: trino-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
spec:
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: trino
                port:
                  number: 8080

Configuring Data Sources

Amazon S3 with Glue Catalog

# /opt/trino/etc/catalog/glue.properties
connector.name=hive
hive.metastore=glue
hive.metastore.glue.region=us-east-1
hive.metastore.glue.default-warehouse-dir=s3://your-bucket/warehouse/

Amazon RDS

# /opt/trino/etc/catalog/mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://your-rds.amazonaws.com:3306
connection-user=admin
connection-password=password

Amazon Redshift

# /opt/trino/etc/catalog/redshift.properties

connector.name=redshift

connection-url=jdbc:redshift://your-cluster.redshift.amazonaws.com:5439/dev

connection-user=admin

connection-password=password

Amazon DynamoDB

# /opt/trino/etc/catalog/dynamodb.properties

connector.name=dynamodb

dynamodb.region=us-east-1

# Use IAM role authentication - no access keys needed

# Ensure EC2 instance has IAM role with DynamoDB permissions

Security Configuration

Security Best Practices

IMPORTANT: Never use 0.0.0.0/0 in production! The examples above use restricted IP access. For production deployments:

  1. Network Security:
  • Use VPC with private subnets for Trino nodes
  • Deploy Application Load Balancer (ALB) in public subnets
  • Use VPC endpoints for S3 access
  • Implement VPC Flow Logs for monitoring
  1. Access Control:
  • Use AWS Systems Manager Session Manager instead of SSH
  • Implement SAML/OAuth authentication via ALB
  • Use IAM roles instead of access keys
  • Enable MFA for administrative access
  1. Encryption:
  • Enable encryption at rest for all data stores
  • Use TLS 1.2+ for all connections
  • Encrypt data in transit between nodes
  • Use AWS KMS for key management

Production Security Group Configuration

# Create VPC and subnets first

VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 --query 'Vpc.VpcId' --output text)

PRIVATE_SUBNET=$(aws ec2 create-subnet --vpc-id $VPC_ID --cidr-block 10.0.1.0/24 --query 'Subnet.SubnetId' --output text)

PUBLIC_SUBNET=$(aws ec2 create-subnet --vpc-id $VPC_ID --cidr-block 10.0.2.0/24 --query 'Subnet.SubnetId' --output text)

# ALB Security Group (public facing)

ALB_SG=$(aws ec2 create-security-group \

  --group-name trino-alb-sg \

  --description "Security group for Trino ALB" \

  --vpc-id $VPC_ID \

  --query 'GroupId' --output text)

# Allow HTTPS only from specific IP ranges or CloudFront

aws ec2 authorize-security-group-ingress \

  --group-id $ALB_SG \

  --protocol tcp \

  --port 443 \

  --cidr YOUR_OFFICE_IP_RANGE/24 \

  --group-rule-description "HTTPS from office network"

# Trino Security Group (private)

TRINO_SG=$(aws ec2 create-security-group \

  --group-name trino-nodes-sg \

  --description "Security group for Trino nodes" \

  --vpc-id $VPC_ID \

  --query 'GroupId' --output text)

# Allow traffic only from ALB

aws ec2 authorize-security-group-ingress \

  --group-id $TRINO_SG \

  --protocol tcp \

  --port 8080 \

  --source-group $ALB_SG \

  --group-rule-description "HTTP from ALB only"

# Allow inter-node communication

aws ec2 authorize-security-group-ingress \

  --group-id $TRINO_SG \

  --protocol tcp \

  --port 8080 \

  --source-group $TRINO_SG \

  --group-rule-description "Inter-node communication"

IAM Role for EC2

{
 
        "s3:PutObject",

        "s3:DeleteObject"

      ],

      "Resource": [

        "arn:aws:s3:::your-bucket/*",

        "arn:aws:s3:::your-bucket"

      ]

    },

    {

      "Effect": "Allow",

      "Action": [

        "glue:GetDatabase",

        "glue:GetTable",

        "glue:GetPartitions"

      ],

      "Resource": "*"

    }

  ]

}

Enable HTTPS

# config.properties additions

http-server.https.enabled=true

http-server.https.port=8443

http-server.https.keystore.path=/path/to/keystore.jks

http-server.https.keystore.key=password

Authentication

# config.properties for basic auth

http-server.authentication.type=PASSWORD

# password-authenticator.properties

password-authenticator.name=file

file.password-file=/opt/trino/etc/password.db

Monitoring and Management

CloudWatch Integration

# Install CloudWatch agent

wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm

sudo rpm -U ./amazon-cloudwatch-agent.rpm

# Configure for Trino logs

sudo tee /opt/aws/amazon-cloudwatch-agent/etc/config.json > /dev/null <<EOF

{

  "logs": {

    "logs_collected": {

      "files": {

        "collect_list": [

          {

            "file_path": "/var/trino/data/var/log/server.log",

            "log_group_name": "trino-logs",

            "log_stream_name": "{instance_id}"

          }

        ]

      }

    }

  }

}

EOF

# Start CloudWatch agent

sudo systemctl start amazon-cloudwatch-agent

Auto Scaling

# For EKS deployment - HPA

apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler

metadata:

  name: trino-worker-hpa

spec:

  scaleTargetRef:

    apiVersion: apps/v1

    kind: StatefulSet

    name: trino-worker

  minReplicas: 2

  maxReplicas: 10

  metrics:

  - type: Resource

    resource:

      name: cpu

      target:

        type: Utilization

        averageUtilization: 70

Cost Optimization

1. Use Spot Instances for Workers

# EMR with spot instances

aws emr create-cluster \

  --instance-groups \

    InstanceGroupType=MASTER,InstanceType=m5.xlarge,InstanceCount=1 \

    InstanceGroupType=CORE,InstanceType=m5.xlarge,InstanceCount=2,BidPrice=0.10

2. Implement Query Resource Groups

# resource-groups.properties

resource-groups.configuration-manager=file

resource-groups.config-file=/opt/trino/etc/resource-groups.json

{

  "rootGroups": [

    {

      "name": "global",

      "softMemoryLimit": "80%",

      "hardConcurrencyLimit": 100,

      "maxQueued": 1000,

      "subGroups": [

        {

          "name": "analytics",

          "softMemoryLimit": "50%",

          "hardConcurrencyLimit": 20

        },

        {

          "name": "adhoc",

          "softMemoryLimit": "30%",

          "hardConcurrencyLimit": 5

        }

      ]

    }

  ]

}

3. Enable S3 Request Payer

# In hive catalog

hive.s3.requester-pays.enabled=true

Troubleshooting

Common Issues

Out of Memory Errors

# Check memory usage

curl http://localhost:8080/v1/cluster/memory

# Increase heap size in jvm.config

-Xmx16G

S3 Access Denied

# Check IAM role

aws sts get-caller-identity

# Verify S3 permissions

aws s3 ls s3://your-bucket/

Connection Timeouts

# Check security groups

aws ec2 describe-security-groups --group-names sg-trino

# Test connectivity

telnet trino-server 8080

Performance Tuning

  1. Query Optimization
-- Use EXPLAIN to analyze query plans

EXPLAIN (TYPE DISTRIBUTED) 

SELECT * FROM large_table WHERE date > '2024-01-01';
  1. JVM Tuning
# Advanced JVM settings

-XX:+UnlockDiagnosticVMOptions

-XX:G1NumCollectionsKeepPinned=10000000
  1. Network Optimization
# Enable enhanced networking

aws ec2 modify-instance-attribute \

  --instance-id i-xxxxx \

  --ena-support

Connecting from Knowi

Connection Parameters

  1. Public Endpoint:
  • Host: your-alb-endpoint.amazonaws.com or EC2 public IP
  • Port: 8080 (or 8443 for HTTPS)
  • Catalog: Your configured catalog (e.g., hive, postgresql)
  • Username/Password: As configured
  1. VPC Peering:
  • Set up VPC peering between Knowi and Trino VPCs
  • Use internal endpoints for better security and performance

Best Practices

  1. Use SSL/TLS for production
  2. Implement proper authentication
  3. Set up query resource limits
  4. Monitor query performance
  5. Regular backups of configuration

Next Steps

  1. Set up monitoring dashboards in CloudWatch or Grafana
  2. Implement disaster recovery with multi-region setup
  3. Optimize query performance with proper partitioning
  4. Integrate with AWS Lake Formation for data governance

Set up CI/CD for configuration management

Suggested Knowi CTA

Unlock your AWS data with AI-powered analytics.
Connect Trino to Knowi and start exploring cross-source insights in minutes.
Book a 15-minute demo or Start a free trial today.

Frequently Asked Questions

What is Trino and how does it help with analytics?

Trino is a distributed SQL query engine that allows you to run fast, interactive queries across multiple data sources (like S3, RDS, and Redshift) without moving or duplicating data.

Why pair Trino with Knowi?

Knowi adds a visual analytics and AI layer on top of Trino. You can create dashboards, ask natural-language questions, and generate AI-driven insights, all without writing complex SQL.

Does Knowi require ETL when using Trino?

No. Knowi connects natively to Trino’s federated query engine, so you can analyze data across different sources without additional ETL pipelines.

How secure is a Knowi–Trino integration?

Knowi supports VPC peering, SSL/TLS encryption, and IAM-based authentication. Your data remains in your AWS environment; Knowi only queries it.

Can Knowi handle large datasets queried through Trino?

Yes. Knowi is built for high-volume, real-time analytics, and can visualize results from large, multi-source Trino queries efficiently.

What kinds of visualizations can I build?

From interactive dashboards and time-series charts to map visualizations and AI-generated insights, Knowi offers a wide variety of visualization options out of the box.

How quickly can I get started?

Typically within minutes,simply connect Knowi to your Trino coordinator endpoint, select your catalogs, and begin building dashboards or using natural-language queries.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
About the Author:

RELATED POSTS