Deploying an OpenCTI OSINT Stack for Cybersecurity Research
Updated 1/29/26 following issues discovered with the original deployment/guide.
Introduction
OpenCTI is an open-source threat intelligence platform that allows security professionals, researchers, and students to aggregate, analyze, and correlate cyber threat data. Combined with tools like SpiderFoot for OSINT gathering, it creates a powerful research environment for cybersecurity work, incident response training, and cybercrime investigations.
This guide walks through deploying OpenCTI in a homelab environment using Docker Swarm, with hard-won lessons from real-world deployment iterations.
Update: Lessons Learned from Initial Deployment
TL;DR: The first version of this guide worked, but barely. Here's what changed and why.
What Went Wrong
The initial deployment had several critical flaws:
- Deployed all connectors simultaneously on first boot
- Used shared NFS storage for Elasticsearch (massive performance bottleneck)
- Under-resourced Elasticsearch (1-2GB heap)
- Let Swarm schedule services randomly across nodes
- No explicit network configuration for Elasticsearch
Result: Platform appeared functional but was fundamentally broken. Corrupted vocabulary initialization, < 1 object/second processing, 30k+ message queues that never cleared.
Key Changes Made
Critical fixes:
- Sequential deployment - Deploy core services first, wait for initialization, then enable connectors one at a time
- Dedicated fast storage - 100GB local NVMe disk exclusively for Elasticsearch
- Service co-location - Pin Elasticsearch and OpenCTI to the same powerful node
- Proper network binding - Elasticsearch must bind to
0.0.0.0, notlocalhost - Adequate resources - 4GB+ heap for Elasticsearch, explicit memory limits on all services
Performance impact: MITRE connector went from projected 12+ hours to actual 2-3 hours. Processing rate increased from < 1 obj/sec to 8-10 obj/sec.
Bottom line: If you deploy everything at once with default settings, you'll get a corrupted platform that looks like it works but doesn't. Follow the updated deployment process below.
These corrections are integrated throughout the guide. Let's get to the actual deployment.
Why OpenCTI for Cybersecurity Practitioners?
For Security Students:
- Hands-on experience with industry-standard threat intelligence platforms
- Learn MITRE ATT&CK framework interactively
- Practice correlating IOCs, TTPs, and threat actor behaviors
For OSINT Practitioners:
- Centralized repository for organizing reconnaissance data
- Automated enrichment of observables
- Link discovered infrastructure to known threat campaigns
For Cybercrime Investigators:
- Track threat actors and their methodologies
- Map relationships between malware, campaigns, and infrastructure
- Maintain chain of custody for digital evidence
Architecture Decisions
Docker Swarm vs Docker Compose
Docker Swarm provides orchestration, service recovery, rolling updates, and load balancing - valuable experience without Kubernetes complexity. This guide uses Swarm for its production-like capabilities in a homelab setting.
Database Placement
This deployment uses external PostgreSQL and Redis for better performance and easier maintenance, while keeping Elasticsearch and RabbitMQ containerized for simplicity.
Critical: Elasticsearch MUST use local fast storage (NVMe preferred). Shared/NFS storage will cripple performance.
Storage Strategy
- Shared storage (NFS): For file persistence, MinIO data, RabbitMQ data
- Local fast storage: Elasticsearch data directory (100GB dedicated)
- Bind mounts: Configuration files
Security Considerations
Local-Only Deployment
- No external exposure
- Access via VPN or local network only
- Best for malware analysis or sensitive research
Web-Accessible Deployment
- Reverse proxy with TLS (Traefik/Nginx/Caddy)
- Authentication layer (Authelia/Authentik recommended)
- Strong passwords and 2FA
- Monitor access logs
Required External Resources
NIST NVD API Key
Required for CVE connector. Highly recommended - without it, the CVE connector will take days instead of hours.
Rate limits:
- Without API key: 5 requests/30 seconds (~4-5 days for full import)
- With API key: 50 requests/30 seconds (~1-2 hours for full import)
How to obtain:
- Go to https://nvd.nist.gov/developers/request-an-api-key
- Enter your email address
- Check your email for the API key (arrives within minutes)
- No account creation or approval process required
Note: The API key is free and has no usage limits beyond the rate limit. Keep it secure but it's not considered sensitive like a password.
AbuseIPDB API Key
Optional but recommended for IP reputation enrichment. Free tier includes 1,000 requests per day.
How to obtain:
- Create free account at https://www.abuseipdb.com/register
- Verify your email address
- Navigate to https://www.abuseipdb.com/account/api
- Click "Create Key"
- Copy the generated API key
Free tier limits:
- 1,000 requests per day
- Sufficient for most homelab OSINT work
- Paid tiers available if you need more
SMTP Configuration (Optional)
For email notifications:
- SMTP__HOSTNAME=smtp.yourdomain.com
- SMTP__PORT=587
- SMTP__USERNAME=<your-username>
- SMTP__PASSWORD=<your-password>
- SMTP__USE_TLS=true
Component Overview
Core Services
- OpenCTI Platform - Main application, GraphQL API, web interface
- Worker - Background job processor (scale to 4-8 replicas)
- RabbitMQ - Message queue for connector/worker communication
- Elasticsearch - Search engine and data store (4-6GB RAM minimum)
- MinIO - S3-compatible object storage
Connectors
- MITRE ATT&CK - Imports TTPs, threat groups, tools (~25k objects)
- CVE - Vulnerability data (~200k+ CVEs, optional)
- AbuseIPDB - IP reputation enrichment
OSINT Tools
- SpiderFoot - Automated reconnaissance (200+ data sources)
Complete Docker Compose Configuration
version: '3.8'
networks:
app_overlay:
external: true
backend:
driver: overlay
attachable: true
services:
spiderfoot:
image: josaorg/spiderfoot:stable
hostname: spiderfoot
volumes:
- /mnt/cluster-shared-storage/spiderfoot/data:/var/lib/spiderfoot
networks:
- app_overlay
- backend
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
labels:
- traefik.enable=true
- traefik.docker.network=app_overlay
- traefik.constraint-label=app_overlay
- traefik.http.routers.spiderfoot.rule=Host(`spiderfoot.yourdomain.tld`)
- traefik.http.routers.spiderfoot.entrypoints=websecure
- traefik.http.routers.spiderfoot.tls=true
- traefik.http.routers.spiderfoot.tls.certresolver=cf
- traefik.http.routers.spiderfoot.service=spiderfoot
# Uncomment if exposing externally:
# - traefik.http.routers.spiderfoot.middlewares=authelia@file
- traefik.http.services.spiderfoot.loadbalancer.server.port=5001
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.19.9
hostname: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- xpack.ml.enabled=false
- ES_JAVA_OPTS=-Xms4g -Xmx4g
- bootstrap.memory_lock=true
volumes:
- /var/lib/elasticsearch:/usr/share/elasticsearch/data # Local fast storage
- /etc/elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro
networks:
- backend
deploy:
replicas: 1
placement:
constraints:
- node.hostname == <YOUR-STRONGEST-NODE> # Pin to powerful node
restart_policy:
condition: on-failure
resources:
reservations:
memory: 6G
limits:
memory: 10G
ulimits:
memlock:
soft: -1
hard: -1
minio:
image: minio/minio:latest
hostname: minio
command: server /data --console-address ":9001"
environment:
- MINIO_ROOT_USER=<SET-MINIO-USERNAME>
- MINIO_ROOT_PASSWORD=<SET-MINIO-PASSWORD>
volumes:
- /mnt/cluster-shared-storage/opencti/minio:/data
networks:
- app_overlay
- backend
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
labels:
- traefik.enable=true
- traefik.docker.network=app_overlay
- traefik.constraint-label=app_overlay
- traefik.http.routers.minio.rule=Host(`minio.yourdomain.tld`)
- traefik.http.routers.minio.entrypoints=websecure
- traefik.http.routers.minio.tls=true
- traefik.http.routers.minio.tls.certresolver=cf
- traefik.http.routers.minio.service=minio
# Uncomment if exposing externally:
# - traefik.http.routers.minio.middlewares=authelia@file
- traefik.http.services.minio.loadbalancer.server.port=9001
rabbitmq:
image: rabbitmq:4.2-management
hostname: rabbitmq
environment:
- RABBITMQ_DEFAULT_USER=<SET-RABBITMQ-USERNAME>
- RABBITMQ_DEFAULT_PASS=<SET-RABBITMQ-PASSWORD>
volumes:
- /mnt/cluster-shared-storage/opencti/rabbitmq:/var/lib/rabbitmq
networks:
- app_overlay
- backend
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
labels:
- traefik.enable=true
- traefik.docker.network=app_overlay
- traefik.constraint-label=app_overlay
- traefik.http.routers.rabbitmq.rule=Host(`rabbitmq.yourdomain.tld`)
- traefik.http.routers.rabbitmq.entrypoints=websecure
- traefik.http.routers.rabbitmq.tls=true
- traefik.http.routers.rabbitmq.tls.certresolver=cf
- traefik.http.routers.rabbitmq.service=rabbitmq
# Uncomment if exposing externally:
# - traefik.http.routers.rabbitmq.middlewares=authelia@file
- traefik.http.services.rabbitmq.loadbalancer.server.port=15672
opencti:
image: opencti/platform:6.9.10
hostname: opencti
environment:
- NODE_OPTIONS=--max-old-space-size=4096
- APP__PORT=8080
- APP__BASE_URL=https://opencti.yourdomain.tld
- [email protected]
- APP__ADMIN__PASSWORD=<SET-ADMIN-PASSWORD>
- APP__ADMIN__TOKEN=<GENERATE-UUID-V4>
- APP__APP_LOGS__LOGS_LEVEL=info
- REDIS__HOSTNAME=<YOUR-REDIS-IP>
- REDIS__PORT=6379
- REDIS__PASSWORD=<YOUR-REDIS-PASSWORD>
- ELASTICSEARCH__URL=http://elasticsearch:9200
- POSTGRES__HOSTNAME=<YOUR-POSTGRES-IP>
- POSTGRES__PORT=5432
- POSTGRES__DATABASE=opencti
- POSTGRES__USERNAME=<DB-USERNAME>
- POSTGRES__PASSWORD=<DB-PASSWORD>
- MINIO__ENDPOINT=minio
- MINIO__PORT=9000
- MINIO__USE_SSL=false
- MINIO__ACCESS_KEY=<MATCH-MINIO-ROOT-USER>
- MINIO__SECRET_KEY=<MATCH-MINIO-ROOT-PASSWORD>
- MINIO__BUCKET_NAME=opencti-bucket
- RABBITMQ__HOSTNAME=rabbitmq
- RABBITMQ__PORT=5672
- RABBITMQ__USERNAME=<MATCH-RABBITMQ-DEFAULT-USER>
- RABBITMQ__PASSWORD=<MATCH-RABBITMQ-DEFAULT-PASS>
volumes:
- /mnt/cluster-shared-storage/opencti/data:/opt/opencti/data
- /mnt/cluster-shared-storage/opencti/files:/var/lib/opencti/files
networks:
- app_overlay
- backend
deploy:
replicas: 1
placement:
constraints:
- node.hostname == <YOUR-STRONGEST-NODE> # Same as Elasticsearch
restart_policy:
condition: on-failure
resources:
reservations:
memory: 2G
limits:
memory: 6G
labels:
- traefik.enable=true
- traefik.docker.network=app_overlay
- traefik.constraint-label=app_overlay
- traefik.http.routers.opencti.rule=Host(`opencti.yourdomain.tld`)
- traefik.http.routers.opencti.entrypoints=websecure
- traefik.http.routers.opencti.tls=true
- traefik.http.routers.opencti.tls.certresolver=cf
- traefik.http.routers.opencti.service=opencti
# Uncomment if exposing externally:
# - traefik.http.routers.opencti.middlewares=authelia@file
- traefik.http.services.opencti.loadbalancer.server.port=8080
worker:
image: opencti/worker:6.9.10
environment:
- OPENCTI_URL=http://opencti:8080
- OPENCTI_TOKEN=<MATCH-OPENCTI-ADMIN-TOKEN>
- WORKER_LOG_LEVEL=info
networks:
- backend
deploy:
replicas: 4
placement:
constraints:
- node.role == worker # Different nodes than core services
restart_policy:
condition: on-failure
resources:
reservations:
memory: 512M
limits:
memory: 2G
# COMMENT OUT CONNECTORS FOR INITIAL DEPLOYMENT
# Uncomment and deploy one at a time after initialization completes
# connector-mitre:
# image: opencti/connector-mitre:6.9.10
# environment:
# - OPENCTI_URL=http://opencti:8080
# - OPENCTI_TOKEN=<MATCH-OPENCTI-ADMIN-TOKEN>
# - CONNECTOR_ID=mitre-attack
# - CONNECTOR_SCOPE=marking-definition,identity,attack-pattern,course-of-action,intrusion-set,campaign,malware,tool,report
# - MITRE_INTERVAL=7
# networks:
# - backend
# deploy:
# replicas: 1
# restart_policy:
# condition: on-failure
# connector-cve:
# image: opencti/connector-cve:6.9.10
# environment:
# - OPENCTI_URL=http://opencti:8080
# - OPENCTI_TOKEN=<MATCH-OPENCTI-ADMIN-TOKEN>
# - CONNECTOR_ID=cve
# - CVE_API_KEY=<YOUR-NVD-API-KEY>
# - CVE_INTERVAL=6
# networks:
# - backend
# deploy:
# replicas: 1
# restart_policy:
# condition: on-failure
# connector-abuseipdb:
# image: opencti/connector-abuseipdb:6.9.10
# environment:
# - OPENCTI_URL=http://opencti:8080
# - OPENCTI_TOKEN=<MATCH-OPENCTI-ADMIN-TOKEN>
# - CONNECTOR_ID=abuseipdb
# - ABUSEIPDB_API_KEY=<YOUR-ABUSEIPDB-API-KEY>
# networks:
# - backend
# deploy:
# replicas: 1
# restart_policy:
# condition: on-failure
Deployment Process
1. Prepare Infrastructure
# Initialize Swarm
docker swarm init
# Create network
docker network create --driver overlay --attachable app_overlay
2. Prepare External Services
PostgreSQL:
CREATE DATABASE opencti OWNER opencti_user;
ALTER SYSTEM SET max_connections = 200;
-- Restart PostgreSQL
MySQL (for SpiderFoot):
CREATE DATABASE spiderfoot;
CREATE USER 'spiderfoot_user'@'%' IDENTIFIED BY 'password';
GRANT ALL ON spiderfoot.* TO 'spiderfoot_user'@'%';
Redis:
- Set strong password
- Enable persistence (
appendonly yes)
3. Create Dedicated Elasticsearch Storage
CRITICAL: Use local fast storage, not NFS
Elasticsearch performance is heavily I/O dependent. Using network-attached storage (NFS, CIFS, etc.) will result in processing speeds that are 10-20x slower than local storage. This is the single most important performance factor in your deployment.
Storage options (best to worst):
- Local NVMe SSD - Ideal, 5-10x faster than spinning disks
- Local SATA SSD - Good, significantly better than HDD
- Local spinning disk - Acceptable for learning, slow for production
- NFS/Network storage - Avoid at all costs, will cripple performance
Option A: Dedicated Physical Disk or Partition (Recommended)
If you have a spare disk or can partition existing storage:
# Example: Using a dedicated disk (e.g., /dev/sdb)
sudo mkfs.ext4 /dev/sdb
sudo mkdir -p /var/lib/elasticsearch
sudo mount /dev/sdb /var/lib/elasticsearch
echo "/dev/sdb /var/lib/elasticsearch ext4 defaults 0 0" | sudo tee -a /etc/fstab
sudo chown -R 1000:1000 /var/lib/elasticsearch
Option B: Loop-Mounted Disk Image (Homelab Friendly)
If you can't dedicate a full disk, create a disk image on your fastest local storage:
# Create 100GB disk image (adjust size as needed)
# This creates a sparse file - it won't use 100GB immediately
sudo dd if=/dev/zero of=/var/lib/elasticsearch.img bs=1G count=100
# Format it as ext4
sudo mkfs.ext4 /var/lib/elasticsearch.img
# Create mount point
sudo mkdir -p /var/lib/elasticsearch
# Mount the image as a loop device
sudo mount -o loop /var/lib/elasticsearch.img /var/lib/elasticsearch
# Make it persistent across reboots
echo "/var/lib/elasticsearch.img /var/lib/elasticsearch ext4 loop 0 0" | sudo tee -a /etc/fstab
# Set ownership to elasticsearch user (UID 1000 in the container)
sudo chown -R 1000:1000 /var/lib/elasticsearch
Why 100GB?
- MITRE ATT&CK: ~2-3GB
- CVE database: ~40-50GB (and growing)
- Indices, snapshots, and overhead: ~10-20GB
- Breathing room for search operations: ~20-30GB
Verify the setup:
# Check it's mounted
df -h | grep elasticsearch
# Should show something like:
# /dev/loop0 98G 24K 93G 1% /var/lib/elasticsearch
# Check permissions
ls -ld /var/lib/elasticsearch
# Should show: drwxr-xr-x 2 1000 1000 ...
Create /etc/elasticsearch/elasticsearch.yml:
# Bind to all interfaces so Swarm overlay can reach it
network.host: 0.0.0.0
http.port: 9200
# Single-node setup
discovery.type: single-node
# Disk watermarks
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: "80%"
cluster.routing.allocation.disk.watermark.high: "90%"
cluster.routing.allocation.disk.watermark.flood_stage: "95%"
Important notes:
- The
network.host: 0.0.0.0setting is critical - without it, Elasticsearch only listens on localhost and won't be reachable from other Swarm services - The disk image file should be on your fastest local storage (ideally NVMe, not a network share)
- If you're using a loop-mounted image, put the
.imgfile on local storage, never on NFS
4. Deploy Core Services (No Connectors)
# Deploy with connectors commented out
docker stack deploy -c docker-compose.yml osint
# Watch initialization (15-20 minutes)
docker service logs osint_opencti --follow | grep "initialization"
Wait for: [INIT] Platform initialization done
Verify initialization:
curl -k https://opencti.yourdomain.tld/graphql \
-H "Authorization: Bearer <YOUR-TOKEN>" \
-d '{"query":"{ vocabularies { edges { node { name } } } }"}'
Should return 50+ vocabularies. If empty, initialization failed - nuke and restart.
5. Enable Connectors Sequentially
First: MITRE ATT&CK
# Uncomment connector-mitre in compose file
docker stack deploy -c docker-compose.yml osint
docker service logs osint_connector-mitre --follow
# Wait for completion (~30 minutes)
Second: AbuseIPDB (enrichment only, safe)
# Uncomment connector-abuseipdb
docker stack deploy -c docker-compose.yml osint
Third: CVE (optional, heavy - 1-2 hours)
# Uncomment connector-cve
docker stack deploy -c docker-compose.yml osint
Troubleshooting
Elasticsearch Not Reachable
Symptom: OpenCTI logs show "Cannot connect to Elasticsearch" or similar errors
Check: /etc/elasticsearch/elasticsearch.yml contains network.host: 0.0.0.0
If it says localhost or 127.0.0.1, Elasticsearch is only listening on the container's loopback interface and can't be reached by other services in the Swarm network.
Fix:
sudo nano /etc/elasticsearch/elasticsearch.yml
# Change network.host to 0.0.0.0
docker service update --force osint_elasticsearch
MinIO Connection Failed
Symptom: OpenCTI logs show MinIO authentication errors or "Access Denied"
Check: MINIO__ACCESS_KEY in OpenCTI service exactly matches MINIO_ROOT_USER in MinIO service
This is case-sensitive and must be character-for-character identical - not just similar.
Fix: Update docker-compose.yml to ensure exact match, then redeploy:
docker stack deploy -c docker-compose.yml osint
Slow Processing Despite Resources
Symptom: Connectors running but processing <1-2 objects per second, massive queue backlog
Check:
- Is Elasticsearch on local NVMe/SSD storage? (NFS = performance death)
- Are Elasticsearch and OpenCTI on the same node?
- Worker count adequate? (4-8 replicas optimal)
- Elasticsearch heap size? (Should be 4GB minimum)
Fix: Move Elasticsearch to local fast storage, pin services to same node, increase workers:
# In docker-compose.yml under elasticsearch and opencti services:
deploy:
placement:
constraints:
- node.hostname == your-strongest-node
Corrupted Initialization
Symptoms:
- Missing vocabularies (check via GraphQL query)
- "Unknown entity type" errors in logs
- Permanent queue backlog that never clears
- Connectors appear to work but data doesn't show up properly
Fix: This requires nuclear option - complete reset:
# Remove the stack
docker stack rm osint
# Wipe Elasticsearch data
sudo rm -rf /var/lib/elasticsearch/*
# Wipe OpenCTI shared storage
sudo rm -rf /mnt/cluster-shared-storage/opencti/*
# Reset PostgreSQL database
docker exec -it <postgres-container> psql -U postgres
DROP DATABASE opencti;
CREATE DATABASE opencti OWNER opencti_user;
\q
# Redeploy with connectors COMMENTED OUT
docker stack deploy -c docker-compose.yml osint
# Wait for initialization to complete (~15-20 minutes)
# Then enable connectors one at a time
RabbitMQ Queue Buildup
Symptom: Message queues in RabbitMQ management UI growing continuously
Check:
- Are workers actually running?
docker service ls | grep worker - Worker logs showing errors?
docker service logs osint_worker --tail 100 - Did you enable all connectors simultaneously on first boot? (Bad)
Fix:
- Scale workers if needed:
docker service scale osint_worker=6 - If queue won't clear and vocabularies are corrupted, see "Corrupted Initialization" above
Version Mismatch Warnings
Symptom: Connector logs show version mismatch between connector and platform
Impact: Usually harmless - connectors are generally backward compatible within the same major version. Data will still process correctly.
Action: Monitor connector logs for actual errors. If connector completes successfully, ignore the warning. Update to matching versions during next maintenance window if desired.
Critical Checklist
Before deploying, verify:
- [ ] Elasticsearch on local fast storage (not NFS)
- [ ]
network.host: 0.0.0.0in elasticsearch.yml - [ ] Elasticsearch 4GB+ heap
- [ ] OpenCTI and Elasticsearch pinned to same node
- [ ] Workers on different nodes
- [ ] PostgreSQL max_connections ≥ 200
- [ ] MinIO credentials exactly match between services
- [ ] All connectors commented out for first deploy
Next Steps
- Access
https://opencti.yourdomain.tld - Login with admin credentials
- Verify connectors in Data → Connectors
- Explore MITRE data in Data → Entities → Attack Patterns
- Create first investigation case
- Configure user accounts and retention policies
Resources
- OpenCTI Documentation: https://docs.opencti.io
- MITRE ATT&CK: https://attack.mitre.org
- OpenCTI Community: https://community.filigran.io
Guide updated January 2026 with production deployment lessons. Always check official docs for latest info.