Deploying an OpenCTI OSINT Stack for Cybersecurity Research
This guide walks through deploying OpenCTI in a homelab environment using Docker Swarm, with considerations for different deployment scenarios.
Introduction
OpenCTI is an open-source threat intelligence platform that allows security professionals, researchers, and students to aggregate, analyze, and correlate cyber threat data. Combined with tools like SpiderFoot for OSINT gathering, it creates a powerful research environment for cybersecurity work, incident response training, and cybercrime investigations.
This guide walks through deploying OpenCTI in a homelab environment using Docker Swarm, with considerations for different deployment scenarios.
Why OpenCTI for Cybersecurity Practitioners?
For Security Students
- Hands-on experience with industry-standard threat intelligence platforms
- Learn the MITRE ATT&CK framework interactively
- Practice correlating IOCs, TTPs, and threat actor behaviors
- Build investigation case files like you would in a SOC
For OSINT Practitioners
- Centralized repository for organizing reconnaissance data
- Automated enrichment of IP addresses, domains, and other observables
- Link discovered infrastructure to known threat campaigns
- Maintain operational security while conducting investigations
For Cybercrime Investigators
- Track threat actors and their methodologies
- Map relationships between malware, campaigns, and infrastructure
- Maintain chain of custody for digital evidence
- Collaborate with team members on active investigations
Architecture Decisions
Docker Swarm vs Docker Compose
Docker Compose is simpler for single-host deployments but lacks high availability and orchestration features.
Docker Swarm provides:
- Service orchestration across multiple nodes
- Automatic service recovery and rescheduling
- Rolling updates
- Built-in load balancing
- Secrets management
For a homelab learning environment, Swarm offers valuable experience with orchestration concepts without the complexity of Kubernetes, while still providing production-like capabilities.
High-Level Architecture Overview
At a high level:
- OpenCTI Platform serves the UI and GraphQL API
- Workers process background jobs and connector imports
- Connectors pull data from external sources and send tasks through RabbitMQ
- Elasticsearch powers indexing and search
- MinIO provides S3-compatible object storage for files and attachments
- External PostgreSQL + Redis provide persistence and performance isolation
- SpiderFoot provides OSINT scanning with its own external database (MySQL/MariaDB)
Database Placement: External vs Containerized
Containerized databases (running in Docker):
- Simpler initial setup
- Everything managed in one stack
- Easier to tear down and rebuild
- Good for testing and learning
External databases (dedicated PostgreSQL/Redis servers):
- Better performance and resource isolation
- Easier to backup and maintain
- Can be shared across multiple services
- More resilient to stack redeployments
- Recommended for production use
This deployment uses external PostgreSQL and Redis servers while keeping Elasticsearch and RabbitMQ containerized for simplicity.
Storage Strategy
Docker Volumes:
- Simple, managed by Docker
- Good for single-node deployments
- Data persists across container restarts
- Limited portability between hosts
Bind Mounts:
- Direct access to host filesystem
- Easy to back up with standard tools
- Works well for single-node Swarm
Shared Network Storage (NFS/GlusterFS/Ceph):
- Required for multi-node Swarm deployments
- Allows containers to move between nodes
- Centralized backup point
- Potential performance bottleneck if not configured properly
For multi-node Swarm, shared storage is essential since services can be scheduled on any node and need access to persistent data.
Important (Docker Swarm): all bind-mounted paths must exist on every node that may run a service. If a path is missing or not mounted, Swarm will reject the task before the container starts.
Security Considerations
Local-Only Deployment
For a completely isolated homelab:
- No external network exposure
- Access only via VPN or local network
- Useful for malware analysis or sensitive research
- Eliminates attack surface from internet
Configuration:
- Don’t expose any ports directly
- Access services via internal Docker network names
- Use a jump box or VPN for administration
Web-Accessible Deployment
For remote access to your OSINT platform:
Reverse proxy with authentication:
- Use Traefik, Nginx, or Caddy for TLS termination
- Implement Authelia, Authentik, or similar for SSO
- Enforce strong authentication (2FA recommended)
- Use valid SSL certificates (Let’s Encrypt / Cloudflare)
Network security:
- Prefer VPN for access
- Consider IP allowlisting
- Monitor access logs regularly
Example Traefik labels:
labels:
traefik.enable: "true"
traefik.http.routers.opencti.rule: "Host(`opencti.yourdomain.com`)"
traefik.http.routers.opencti.entrypoints: "websecure"
traefik.http.routers.opencti.tls.certresolver: "cf"
traefik.http.routers.opencti.middlewares: "authelia@file"Required External Resources
NIST NVD API Key (Required for CVE Connector)
The CVE connector requires an API key from the National Vulnerability Database.
1. Visit https://nvd.nist.gov/developers/request-an-api-key
2. Submit your email address
3. Receive API key (usually within minutes)
4. Add to connector configuration:
- CVE_API_KEY=your-key-here
Without an API key, you’re limited to 5 requests per 30 seconds. The CVE connector needs to download 200,000+ vulnerabilities, which would take days without the key. With the key, you get 50 requests/30 seconds.
AbuseIPDB API Key (Required for AbuseIPDB Connector)
The AbuseIPDB connector enriches IP observables with abuse reports, reputation signals, and confidence scores.
How to obtain an AbuseIPDB API key:
1. Visit https://www.abuseipdb.com/
2. Create a free account (email verification required)
3. Navigate to Account → API
4. Generate a new API key
5. Add it to the connector configuration:
- ABUSEIPDB_API_KEY=your-api-key-here
SMTP Configuration (Optional)
For email notifications and user invitations, configure SMTP. AWS SES, SendGrid, or any standard SMTP server works.
Example (AWS SES):
- SMTP__HOSTNAME=email-smtp.us-east-2.amazonaws.com
- SMTP__PORT=587
- SMTP__USERNAME=your-ses-smtp-user
- SMTP__PASSWORD=your-ses-smtp-password
- [email protected]
- SMTP__USE_SSL=false
- SMTP__USE_TLS=trueAWS SES Note: ensure your SES account is out of sandbox mode or emails will only deliver to verified addresses.
External API Keys & Credentials Summary
|
Service |
Required |
Purpose |
How to Obtain |
|
NIST
NVD API Key |
Yes |
CVE ingestion |
https://nvd.nist.gov |
|
AbuseIPDB API
Key |
Yes |
IP reputation
enrichment |
https://www.abuseipdb.com |
|
SMTP
Credentials |
Optional |
Email
notifications |
Any SMTP
provider |
|
MinIO
Credentials |
Yes |
OpenCTI object
storage |
Defined during
MinIO setup |
|
OpenCTI
Admin Token (UUID) |
Yes |
Connector auth |
Generate with
uuidgen |
Important note: This is not a comprehensive list. Spiderfoot, for example, allows for the user to provide many, many of their own API keys for various services (Shodan, AbuseIPDB, Censys, and many others), some of which are free, easy to obtain, or require paid memberships.
Quick Pre-Deployment Checklist
Before running docker stack deploy, you should have:
- NVD API key (if using CVE connector)
- AbuseIPDB API key
- OpenCTI admin token (UUIDv4)
- MinIO access and secret keys
- External PostgreSQL, Redis, and MySQL reachable from Swarm nodes
- Shared storage paths created on all Swarm nodes
Component Overview
Core Services
OpenCTI Platform:
- Main application server
- GraphQL API for connectors and integrations
- Web interface for analysts
OpenCTI Worker:
- Background job processor
- Imports data from connectors
- Processes relationships and enrichments
RabbitMQ:
- Message queue
- Distributes work between connectors and workers
- Visibility into processing pipeline via management UI
Elasticsearch:
- Powers OpenCTI search
- Stores and indexes platform data
- Resource-intensive: plan for 2–4GB RAM minimum
MinIO:
- S3-compatible object storage
- Stores files and attachments
- Required dependency for OpenCTI
Security note: MinIO should only be exposed publicly during setup (if done manually, which is not usually necessary). After bucket creation, restrict or remove external access or protect it with SSO/VPN.
Connectors
MITRE ATT&CK:
- Imports ATT&CK tactics, techniques, and procedures
- ~25,000 objects on first sync
CVE Connector:
- Imports vulnerability data from NVD
- First sync can take 30–120 minutes depending on configuration
AbuseIPDB:
- Enriches IP observables
- Requires AbuseIPDB API key
OSINT Tools
SpiderFoot:
- Automated reconnaissance (DNS, WHOIS, feeds, etc.)
- Integrates with 200+ data sources
Note: the historically referenced spiderfoot/spiderfoot image may not exist or be consistently available. The josaorg/spiderfoot image is a reliable alternative for Swarm deployments.
Complete Docker Swarm Stack Configuration
This configuration is intended for Docker Swarm deployments using docker stack deploy, not standalone Docker Compose.
version: '3.8'
networks:
app_overlay:
external: true
backend:
driver: overlay
attachable: true
services:
spiderfoot:
image: josaorg/spiderfoot:stable
hostname: spiderfoot
environment:
- SPIDERFOOT_ACCEPT_TOS=yes
- SPIDERFOOT_DBM=mysql
- SPIDERFOOT_DB_HOST=<YOUR-MYSQL-IP>
- SPIDERFOOT_DB_PORT=3306
- SPIDERFOOT_DB_NAME=spiderfoot
- SPIDERFOOT_DB_USER=spiderfoot_user
- SPIDERFOOT_DB_PASSWORD=<YOUR-SPIDERFOOT-DB-PASSWORD>
volumes:
- /mnt/cluster-shared-storage/1/cybersec_osint/spiderfoot/data:/var/lib/spiderfoot
networks:
- app_overlay
- backend
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
labels:
traefik.enable: "true"
traefik.swarm.lbswarm: "true"
traefik.http.routers.spiderfoot.rule: Host(`spiderfoot.yourdomain.com`)
traefik.http.routers.spiderfoot.entrypoints: websecure
traefik.http.routers.spiderfoot.tls: "true"
traefik.http.routers.spiderfoot.tls.certresolver: cf
# traefik.http.routers.spiderfoot.middlewares: authelia@file
traefik.http.services.spiderfoot.loadbalancer.server.port: "5001"
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.1
hostname: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- xpack.ml.enabled=false
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
volumes:
- /mnt/cluster-shared-storage/1/cybersec_osint/opencti/elasticsearch:/usr/share/elasticsearch/data
networks:
- backend
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
# S3-compatible storage for OpenCTI (required dependency)
minio:
image: minio/minio:latest
hostname: minio
command: server /data --console-address ":9001"
environment:
- MINIO_ROOT_USER=<SET-MINIO-USERNAME>
- MINIO_ROOT_PASSWORD=<SET-MINIO-PASSWORD>
volumes:
- /mnt/cluster-shared-storage/1/cybersec_osint/opencti/minio:/data
networks:
- app_overlay
- backend
deploy:
labels:
traefik.enable: "true"
traefik.swarm.lbswarm: "true"
traefik.http.routers.minio.rule: "Host(`minio.yourdomain.com`)"
traefik.http.routers.minio.entrypoints: "websecure"
traefik.http.routers.minio.tls: "true"
traefik.http.routers.minio.tls.certresolver: "cf"
# traefik.http.routers.minio.middlewares: "authelia@file"
traefik.http.services.minio.loadbalancer.server.port: "9001"
rabbitmq:
image: rabbitmq:3.13-management
hostname: rabbitmq
environment:
- RABBITMQ_DEFAULT_USER=<SET-RABBITMQ-USERNAME>
- RABBITMQ_DEFAULT_PASS=<SET-RABBITMQ-PASSWORD>
volumes:
- /mnt/cluster-shared-storage/1/cybersec_osint/opencti/rabbitmq:/var/lib/rabbitmq
networks:
- app_overlay
- backend
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
labels:
traefik.enable: "true"
traefik.swarm.lbswarm: "true"
traefik.http.routers.rabbitmq.rule: Host(`rabbitmq.yourdomain.com`)
traefik.http.routers.rabbitmq.entrypoints: websecure
traefik.http.routers.rabbitmq.tls: "true"
traefik.http.routers.rabbitmq.tls.certresolver: cf
# traefik.http.routers.rabbitmq.middlewares: authelia@file
traefik.http.services.rabbitmq.loadbalancer.server.port: "15672"
opencti:
image: opencti/platform:6.0.6
hostname: opencti
environment:
- NODE_OPTIONS=--max-old-space-size=4096
- APP__PORT=8080
- APP__BASE_URL=https://opencti.yourdomain.com
- [email protected]
- APP__ADMIN__PASSWORD=<SET-OPENCTI-PASSWORD>
- APP__ADMIN__TOKEN=<SET-OPENCTI-TOKEN-UUID>
- APP__APP_LOGS__LOGS_LEVEL=info
# Optional but recommended for /health endpoint
- APP__HEALTH_ACCESS_KEY=<SET-STRONG-STRING>
# External Redis configuration
- REDIS__HOSTNAME=<YOUR-REDIS-IP>
- REDIS__PORT=6379
- REDIS__PASSWORD=<YOUR-REDIS-PASS>
# Elasticsearch configuration
- ELASTICSEARCH__URL=http://elasticsearch:9200
# External Postgres configuration
- POSTGRES__HOSTNAME=<YOUR-POSTGRES-IP>
- POSTGRES__PORT=5432
- POSTGRES__DATABASE=opencti
- POSTGRES__USERNAME=opencti_user
- POSTGRES__PASSWORD=<YOUR-OPENCTI-DB-PASSWORD>
# S3 bucket configuration (MinIO)
- MINIO__ENDPOINT=minio
- MINIO__PORT=9000
- MINIO__USE_SSL=false
- MINIO__ACCESS_KEY=<MATCH-MINIO-USERNAME>
- MINIO__SECRET_KEY=<MATCH-MINIO-PASSWORD>
- MINIO__BUCKET_NAME=opencti-bucket
# RabbitMQ configuration
- RABBITMQ__HOSTNAME=rabbitmq
- RABBITMQ__PORT=5672
- RABBITMQ__PORT_MANAGEMENT=15672
- RABBITMQ__MANAGEMENT_SSL=false
- RABBITMQ__USERNAME=<MATCH-RABBITMQ-USERNAME>
- RABBITMQ__PASSWORD=<MATCH-RABBITMQ-PASSWORD>
# SMTP configuration (AWS SES)
- SMTP__HOSTNAME=email-smtp.us-east-2.amazonaws.com
- SMTP__PORT=587
- SMTP__USERNAME=<YOUR-SES-USERNAME-IF-APPLICABLE>
- SMTP__PASSWORD=<YOUR-SES-PASSWORD-IF-APPLICABLE>
- [email protected]
- SMTP__USE_SSL=false
- SMTP__USE_TLS=true
- PROVIDERS__LOCAL__STRATEGY=LocalStrategy
volumes:
- /mnt/cluster-shared-storage/1/cybersec_osint/opencti/data:/opt/opencti/data
- /mnt/cluster-shared-storage/1/cybersec_osint/opencti/files:/var/lib/opencti/files
networks:
- app_overlay
- backend
deploy:
replicas: 1
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
labels:
traefik.enable: "true"
traefik.swarm.lbswarm: "true"
traefik.http.routers.opencti.rule: Host(`opencti.yourdomain.com`)
traefik.http.routers.opencti.entrypoints: websecure
traefik.http.routers.opencti.tls: "true"
traefik.http.routers.opencti.tls.certresolver: cf
# traefik.http.routers.opencti.middlewares: authelia@file
traefik.http.services.opencti.loadbalancer.server.port: "8080"
worker:
image: opencti/worker:6.0.6
hostname: worker
environment:
- OPENCTI_URL=http://opencti:8080
- OPENCTI_TOKEN=<MATCH-OPENCTI-TOKEN-UUID>
- WORKER_LOG_LEVEL=info
networks:
- app_overlay
- backend
deploy:
replicas: 3
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
connector-mitre:
image: opencti/connector-mitre:6.0.6
hostname: connector-mitre
environment:
- OPENCTI_URL=http://opencti:8080
- OPENCTI_TOKEN=<MATCH-OPENCTI-TOKEN-UUID>
- CONNECTOR_ID=mitre-attack
- CONNECTOR_NAME=MITRE ATT&CK
- CONNECTOR_SCOPE=marking-definition,identity,attack-pattern,course-of-action,intrusion-set,campaign,malware,tool,report,external-reference-as-report
- CONNECTOR_LOG_LEVEL=info
- MITRE_INTERVAL=7
networks:
- app_overlay
- backend
deploy:
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
connector-cve:
image: opencti/connector-cve:6.3.8
hostname: connector-cve
environment:
- OPENCTI_URL=http://opencti:8080
- OPENCTI_TOKEN=<MATCH-OPENCTI-TOKEN-UUID>
- CONNECTOR_ID=cve
- CONNECTOR_NAME=Common Vulnerabilities and Exposures
- CONNECTOR_SCOPE=identity,vulnerability
- CONNECTOR_LOG_LEVEL=info
- CVE_INTERVAL=6
- CVE_API_KEY=<YOUR-NVD-API-KEY>
# Optional settings:
- CVE_BASE_URL=https://services.nvd.nist.gov/rest/json/cves
- CVE_MAX_DATE_RANGE=120
- CVE_MAINTAIN_DATA=true
- CVE_PULL_HISTORY=false
networks:
- app_overlay
- backend
deploy:
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
connector-abuseipdb:
image: opencti/connector-abuseipdb:6.0.6
hostname: connector-abuseipdb
environment:
- OPENCTI_URL=http://opencti:8080
- OPENCTI_TOKEN=<MATCH-OPENCTI-TOKEN-UUID>
- CONNECTOR_ID=abuseipdb
- CONNECTOR_NAME=AbuseIPDB
- CONNECTOR_SCOPE=IPv4-Addr
- CONNECTOR_AUTO=true
- CONNECTOR_LOG_LEVEL=info
- ABUSEIPDB_API_KEY=<YOUR-ABUSEIPDB-API-KEY>
- ABUSEIPDB_MAX_TLP=TLP:AMBER
networks:
- app_overlay
- backend
deploy:
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
window: 120s
osint-stack.yml
Version Compatibility (Important Exception)
In general, OpenCTI connectors should match the platform version exactly. Mismatched versions can cause GraphQL schema errors, failed imports, or connector crashes.
Exception: CVE Connector
In this deployment, the CVE connector version 6.0.6 consistently failed. The configuration uses opencti/connector-cve:6.3.8, which is confirmed working with OpenCTI platform 6.0.6 for CVE ingestion. You may see “Unknown type” schema errors in the connector logs (health/status noise), but data ingestion can still succeed.
Deployment Process
1. Prepare Infrastructure
Single-node:
docker swarm initMulti-node:
docker swarm init --advertise-addr <manager-ip>
docker swarm join --token <worker-token> <manager-ip>:2377Create overlay network:
docker network create --driver overlay --attachable app_overlay2. Prepare External Services
PostgreSQL:
- Database: opencti
- User: opencti_user with full access
MySQL/MariaDB:
- Database: spiderfoot
- User: spiderfoot_user with full access
Redis:
- Set a strong password
- Persistence recommended
3. Deploy the Stack
docker stack deploy -c docker-compose.yml osintMonitor:
docker service ls
docker service logs osint_opencti --follow4. Initial Platform Setup
OpenCTI performs a one-time initialization on first boot:
- Creates schema
- Initializes indices
- Seeds default vocabularies
This typically takes 5–15 minutes.
5. Connector Synchronization
MITRE ATT&CK first sync:
- ~24,000–25,000 objects
- 10–30 minutes typical
CVE connector:
- 30–120 minutes depending on config and API rate limits
Monitor in the UI under Data → Connectors and via RabbitMQ management UI.
Troubleshooting Common Issues
Elasticsearch indexing failures: increase heap to 2g+.
MinIO connection issues: ensure OpenCTI and MinIO share a network and credentials match exactly.
CVE connector errors: if 6.0.6 fails, use 6.3.8 with OpenCTI 6.0.6 as a known working workaround; ignore schema status errors if ingestion succeeds.
Lessons Learned: Real-World Gotchas
After getting this stack running, here are the pain points that were not obvious from the documentation.
MinIO Credential Mismatch Hell
The problem: OpenCTI could not connect to MinIO despite both services running fine.
The root cause: MinIO uses MINIO_ROOT_USER and MINIO_ROOT_PASSWORD, but OpenCTI uses MINIO__ACCESS_KEY and MINIO__SECRET_KEY. These must match exactly, but the different naming makes it easy to overlook.
The fix:
- MINIO__ACCESS_KEY=<SAME-AS-MINIO_ROOT_USER>
- MINIO__SECRET_KEY=<SAME-AS-MINIO_ROOT_PASSWORD>Lesson: when services won’t talk to each other, verify that credential values match even when variable names differ.
Docker Swarm DNS Resolution
The problem: services could not find each other by hostname (getaddrinfo ENOTFOUND minio).
The root cause: services on different networks can’t resolve each other. In Swarm, services must share at least one network for DNS discovery.
The fix: put MinIO (and any shared dependency) on both networks:
networks:
- app_overlay
- backendLesson: in Swarm, network isolation affects DNS. Check network attachments before anything else.
Elasticsearch Needs Real Resources
The problem: MITRE sync failed with DATABASE_ERROR: Update indexing fail.
The root cause: Elasticsearch heap too small (1GB) for bulk indexing.
The fix:
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
Lesson: initial indexing is heavy; plan 2–4GB heap minimum.
“It’s Working” Doesn’t Mean It’s Done
The problem: UI looked healthy but no data appeared for 10+ minutes.
The root cause: OpenCTI does staged initialization and connectors/workers process queues afterward.
The fix: watch logs and queues; budget 30–60 minutes on first boot.
Lesson: the first sync is always the longest. Don’t confuse “UI is up” with “platform is populated.”
CVE Connector Without API Key Is Pointless
The problem: CVE sync would take days without an NVD API key.
The fix: get the free NVD key and use it.
Lesson: some connectors “work” without API keys but are practically unusable.
Deployment Order Matters (Kind Of)
The problem: intermittent startup dependency failures.
The root cause: Swarm starts services concurrently.
The fix: restart policies handle this well; investigate only if failures persist after retries.
Lesson: don’t panic on first-boot connection errors; let Swarm retry.
Shared Storage Permission Surprises
The problem: services failed to write to NFS despite correct paths.
The root cause: UID/GID mismatch (e.g., Elasticsearch runs as UID 1000).
The fix: align ownership or relax permissions for homelab use.
Lesson: test writes to shared storage before deployment.
Workers Are the Bottleneck
The problem: data imports but appears slowly in the UI.
The root cause: insufficient worker capacity.
The fix: scale workers:
worker:
deploy:
replicas: 3Lesson: more workers = faster processing; they’re stateless and scale well.
The “It Worked Yesterday” Phenomenon
The problem: a service that worked fails after restart.
The root cause: NFS mount missing or unstable.
The fix:
df -h | grep cluster-shared
mount -aLesson: shared storage is a common single point of failure. Monitor it and consider automated remount/alerting.
Conclusion
Deploying OpenCTI in a homelab provides hands-on experience with enterprise-grade threat intelligence platforms while building practical skills for cybersecurity careers. Docker Swarm lets you start small and scale as needed, and OpenCTI’s connector architecture makes it easy to tailor to your use case.
This guide reflects a working deployment as of January 2026. Always check official documentation for the latest configuration options and security recommendations. Suggestions, questions or other comments? Drop them below.