How I Fixed WebSocket Connection Failures in My Dockerized Rails/React Application
The Journey from Socket.IO to ActionCable: A Debugging Story
When I opened my browser console and saw endless WebSocket connection errors flooding the screen, I knew I was in for a deep debugging session. My ERP system's real-time features were completely broken, with "Failed to establish real-time connection" notifications appearing every few seconds. What started as a simple connection error investigation turned into a comprehensive architectural overhaul that taught me valuable lessons about distributed systems debugging.
Discovery: The Cascade of Connection Failures
The browser console painted a clear picture of failure:
WebSocket connection to 'ws://localhost:3002/socket.io/?EIO=4&transport=websocket' failed:
Error during WebSocket handshake: Unexpected response code: 404
Every five seconds, my React frontend attempted to establish a Socket.IO connection, only to receive 404 errors. I started by checking which containers were running and their network configuration:
docker ps --format "table {{.Names}}\t{{.Ports}}" | grep -E "backend|frontend"
# Output:
# pcvn-erp-frontend-prod 0.0.0.0:3003->80/tcp, [::]:3003->80/tcp
# pcvn-erp-backend-prod 0.0.0.0:3002->3002/tcp, [::]:3002->3002/tcp
The Rails backend logs confirmed these attempts were reaching the server but finding no handler:
docker logs pcvn-erp-backend-prod --tail 20 | grep -i "socket.io"
# Output:
# [244637a1-2baa-4472-8966-ea52abe823f8] Started GET "/socket.io/?EIO=4&transport=websocket" for 172.20.0.1
# [244637a1-2baa-4472-8966-ea52abe823f8] ActionController::RoutingError (No route matches [GET] "/socket.io")
Investigation Phase 1: Understanding the Backend Infrastructure
My first instinct was to check what WebSocket endpoints actually existed in my Rails backend:
docker exec pcvn-erp-backend-prod rails routes | grep -i cable
# Output: (empty)
The empty output was revealing - ActionCable wasn't mounted at all. However, when I dug deeper, I discovered a fully configured ActionCable infrastructure waiting to be activated:
docker exec pcvn-erp-backend-prod cat config/cable.yml
# Output:
# production:
# adapter: solid_cable
# connects_to:
# database:
# writing: cable
# polling_interval: 0.1.seconds
# message_retention: 1.day
I even found implemented ActionCable channels:
docker exec pcvn-erp-backend-prod cat app/channels/production_updates_channel.rb
The channel included sophisticated logic for handling production updates, user-specific streams, and department notifications. All this functionality existed but remained inaccessible because ActionCable wasn't mounted in the routes.
The First Major Discovery: Protocol Mismatch
After mounting ActionCable by adding mount ActionCable.server => "/cable"
to my routes file and rebuilding the Docker image, I still faced connection failures. This led me to the fundamental realization: my frontend was using Socket.IO while my backend provided ActionCable. These protocols are incompatible - they use different handshake mechanisms, message framing, and transport negotiation.
I located the Socket.IO implementation in my frontend:
cat erp-frontend/src/services/websocket.ts
The service was using the Socket.IO client library to establish connections, while my Rails backend expected ActionCable protocol. This explained why even with both endpoints configured, they couldn't communicate.
Investigation Phase 2: The Missing Database Tables
After deciding to replace Socket.IO with an ActionCable client in my frontend, I encountered a new error when connections were attempted:
docker logs pcvn-erp-backend-prod --tail 30
# Output:
# Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: websocket)
# There was an exception - NoMethodError(undefined method `decode' for JwtService:Class)
Progress! The WebSocket upgrade was succeeding, but authentication was failing. Before fixing that, I discovered another critical issue - the solid_cable adapter needed database tables that didn't exist:
docker exec pcvn-erp-backend-prod bundle exec rails runner "puts ActiveRecord::Base.connection.tables.grep(/cable/).inspect" RAILS_ENV=production
# Output: []
I had to create these tables by loading the cable schema:
docker exec pcvn-erp-backend-prod bundle exec rails runner "load 'db/cable_schema.rb'" RAILS_ENV=production
# Output: -- create_table("solid_cable_messages", {:force=>:cascade})
# -> 0.0879s
The Docker Network Configuration Issues
Throughout my debugging, I repeatedly encountered Docker network configuration errors that prevented containers from starting:
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml up -d
# ERROR: Network "pcvn-fullstack_pcvn-prod-network" needs to be recreated - option "com.docker.network.enable_ipv6" has changed
This error occurred whenever Docker's network settings had changed between when the network was created and when I tried to use it. I had to completely remove and recreate everything:
# First, check what's running
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"
# Stop all containers
docker-compose -f docker-compose.prod.yml down
# Output:
# Stopping pcvn-erp-nginx-prod ... done
# Stopping pcvn-erp-frontend-prod ... done
# Stopping pcvn-erp-sidekiq-prod ... done
# Stopping pcvn-erp-backend-prod ... done
# Stopping pcvn-erp-db-prod ... done
# Stopping pcvn-erp-redis-prod ... done
# Removing pcvn-erp-nginx-prod ... done
# Removing pcvn-erp-frontend-prod ... done
# Removing pcvn-erp-sidekiq-prod ... done
# Removing pcvn-erp-backend-prod ... done
# Removing pcvn-erp-db-prod ... done
# Removing pcvn-erp-redis-prod ... done
# Removing network pcvn-fullstack_pcvn-prod-network
# Verify network removal
docker network ls | grep pcvn
# (empty output confirms network is gone)
# Create fresh network and start containers in correct dependency order
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml up -d
# Output:
# Creating network "pcvn-fullstack_pcvn-prod-network" with driver "bridge"
# Creating pcvn-erp-redis-prod ... done
# Creating pcvn-erp-db-prod ... done
# Creating pcvn-erp-backend-prod ... done
# Creating pcvn-erp-sidekiq-prod ... done
# Creating pcvn-erp-frontend-prod ... done
# Creating pcvn-erp-nginx-prod ... done
The order was critical - database and Redis started first, then the backend that depended on them, followed by frontend and nginx.
The Thruster Proxy Discovery
Even with tables created and ActionCable mounted, connections still failed. Examining the Docker configuration revealed another layer of complexity:
# Check how the backend container starts
docker inspect pcvn-erp-backend-prod | grep -A 5 '"Cmd"'
# Output:
# "Cmd": [
# "./bin/thrust",
# "./bin/rails",
# "server"
# ]
# Examine the Dockerfile to understand the startup command
cat backend/Dockerfile | tail -20
# Output revealed: CMD ["./bin/thrust", "./bin/rails", "server"]
My Rails application was running behind Thruster, an HTTP/2 proxy that doesn't properly handle WebSocket upgrades. I had to bypass Thruster by creating a Docker Compose override:
# docker-compose.prod.override.yml
version: '3.8'
services:
backend:
command: ["bundle", "exec", "puma", "-C", "config/puma.rb", "-p", "3002", "-e", "production", "--preload"]
environment:
RAILS_ENV: production
WEB_CONCURRENCY: 2
RAILS_MAX_THREADS: 5
ACTION_CABLE_ALLOWED_REQUEST_ORIGINS: "http://localhost:3003"
The Puma Configuration Challenge
The next issue emerged when I discovered Puma wasn't preloading the application, which ActionCable requires in clustered mode:
docker exec pcvn-erp-backend-prod ps aux | grep puma
# Output showed Puma running without preload flag
I updated the Puma configuration file to include critical settings:
# backend/config/puma.rb additions
workers ENV.fetch("WEB_CONCURRENCY", 2).to_i
preload_app!
before_fork do
ActiveRecord::Base.connection_pool.disconnect! if defined?(ActiveRecord)
end
on_worker_boot do
ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
puts "Worker booted with PID: #{Process.pid}"
puts "ActionCable enabled: #{defined?(ActionCable) ? 'Yes' : 'No'}"
puts "Solid Cable configured: #{Rails.application.config.action_cable.cable.present? ? 'Yes' : 'No'}"
end
The Origin Security Layer
After rebuilding with proper Puma configuration, I encountered origin checking failures:
docker logs pcvn-erp-backend-prod --tail 20
# Output: Request origin not allowed: http://localhost:3003
I created an ActionCable initializer to configure allowed origins:
# backend/config/initializers/action_cable.rb
Rails.application.configure do
config.action_cable.allowed_request_origins = [
'http://localhost:3003',
'http://localhost:3002',
/http:\/\/localhost:.*/
]
end
The Authentication Fix
The final hurdle was an authentication method mismatch. I examined the JWT service to understand what methods were available:
docker exec pcvn-erp-backend-prod cat app/services/jwt_service.rb | grep "def "
# Output:
# def encode(user_id:)
# def encode_access_token(user)
# def encode_refresh_token(user)
# def decode_token(token) # <-- The actual method name
# def valid_token?(token)
# def get_user_from_token(token)
The error showed the connection class was calling JwtService.decode
but the actual method was decode_token
. I corrected the ActionCable connection class to use the proper method:
# backend/app/channels/application_cable/connection.rb
def find_verified_user
token = request.params[:token]
if token
user = JwtService.get_user_from_token(token) # Changed from JwtService.decode
user ? user : reject_unauthorized_connection
else
reject_unauthorized_connection
end
end
The Final Rebuild and Deployment
After making all the fixes, I needed to rebuild the Docker images to include all changes. The rebuild process was complicated by persistent network configuration issues:
# First attempt to rebuild
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml up -d --build backend
# ERROR: Network "pcvn-fullstack_pcvn-prod-network" needs to be recreated - option "com.docker.network.enable_ipv4" has changed
# Complete teardown was necessary
docker-compose -f docker-compose.prod.yml down
# Output:
# Stopping pcvn-erp-nginx-prod ... done
# Stopping pcvn-erp-frontend-prod ... done
# Stopping pcvn-erp-sidekiq-prod ... done
# Stopping pcvn-erp-backend-prod ... done
# Stopping pcvn-erp-db-prod ... done
# Stopping pcvn-erp-redis-prod ... done
# Removing all containers and network...
# Rebuild backend with all fixes
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml up -d --build backend
# Output:
# Creating network "pcvn-fullstack_pcvn-prod-network" with driver "bridge"
# Building backend
# [+] Building 48.8s (20/20) FINISHED
# => [internal] load build definition from Dockerfile
# => [build 4/5] COPY . .
# => [build 5/5] RUN bundle exec bootsnap precompile app/ lib/
# => exporting to image
# Creating pcvn-erp-redis-prod ... done
# Creating pcvn-erp-db-prod ... done
# Creating pcvn-erp-backend-prod ... done
# Bring up remaining services
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml up -d
# Output:
# Creating pcvn-erp-sidekiq-prod ... done
# Creating pcvn-erp-frontend-prod ... done
# Creating pcvn-erp-nginx-prod ... done
# Verify all containers are running
docker ps --format "table {{.Names}}\t{{.Status}}"
# Output:
# NAMES STATUS
# pcvn-erp-nginx-prod Up 2 minutes
# pcvn-erp-frontend-prod Up 2 minutes
# pcvn-erp-backend-prod Up 3 minutes
# pcvn-erp-sidekiq-prod Up 2 minutes
# pcvn-erp-db-prod Up 3 minutes
# pcvn-erp-redis-prod Up 3 minutes
Verification: The Moment of Success
After all services were running with the corrected configuration:
# Check if Puma started with correct configuration
docker logs pcvn-erp-backend-prod --tail 30
# Output:
# [1] Puma starting in cluster mode...
# [1] * Puma version: 6.6.0 ("Return to Forever")
# [1] * Ruby version: ruby 3.2.1
# [1] * Environment: production
# [1] * Workers: 2
# [1] * Preloading application
# [1] * Listening on http://0.0.0.0:3002
# Worker booted with PID: 10
# ActionCable enabled: Yes
# Solid Cable configured: Yes
# Monitor real-time connection attempts
docker logs -f pcvn-erp-backend-prod --tail 10
# Output:
# [cbdda168-598c-458b-9272-3d25eee7f5e8] Started GET "/cable?token=[FILTERED]" [WebSocket]
# [cbdda168-598c-458b-9272-3d25eee7f5e8] Successfully upgraded to WebSocket
# ProductionUpdatesChannel is streaming from production_updates
# ProductionUpdatesChannel is streaming from production_updates_user_e147e167-6a24-4a5c-9786-de97745e8a3f
# Check network connectivity between containers
docker exec pcvn-erp-frontend-prod ping -c 1 backend
# Output: 1 packets transmitted, 1 received, 0% packet loss
# Verify the ActionCable endpoint is accessible
docker exec pcvn-erp-frontend-prod curl -I http://backend:3002/cable
# Output: HTTP/1.1 426 Upgrade Required
The browser console confirmed success with "ActionCable WebSocket connected successfully" messages, and my application finally displayed the green "Connected" status indicator.
Lessons I Learned from This Journey
This debugging experience taught me several crucial lessons about distributed systems:
Layer-by-layer debugging is essential: Each fix revealed the next issue, like peeling an onion. I had to solve the protocol mismatch, create database tables, bypass the proxy, configure the server, set security permissions, and fix authentication - in that exact order.
Docker adds complexity to debugging: Changes to files on my host didn't affect running containers until I rebuilt images. Understanding the distinction between build-time and runtime in Docker was crucial.
Partial implementations create confusion: Finding fully implemented ActionCable channels that weren't mounted was initially puzzling. The infrastructure existed but wasn't accessible - like having a complete electrical system with no power switch.
Protocol compatibility is non-negotiable: Socket.IO and ActionCable both use WebSocket transport but speak different languages. No amount of configuration could make them communicate without replacing one with the other.
Production configurations often differ significantly: Thruster proxy, Puma clustering, origin checking, and authentication - each added layers that didn't exist in development but were critical in production.
The systematic approach I took - checking routes, examining configurations, understanding protocols, tracing through Docker's build process, and methodically addressing each error - transformed what seemed like a simple connection problem into a comprehensive understanding of my application's real-time architecture.
This experience reinforced my belief that effective debugging requires understanding multiple abstraction layers simultaneously. In this case, I had to navigate the application protocol layer (Socket.IO vs ActionCable), the framework layer (Rails routing and channels), the server layer (Puma configuration), the proxy layer (Thruster), the containerization layer (Docker), and the security layer (origin checking and authentication). Each layer had its own configuration language and constraints, and making them work together required careful translation between these different contexts.
The WebSocket connection that finally succeeded represented more than just fixing bugs - it was the culmination of understanding how modern web applications layer complexity, how each abstraction both enables and constrains functionality, and how systematic investigation can unravel even the most tangled technical challenges.
CLI Commands Reference: My Debugging Toolkit
Throughout this debugging journey, I relied on specific Docker and Rails commands to investigate and resolve issues. Here's my consolidated reference of the most useful commands that helped me diagnose and fix the WebSocket problems:
Docker Container Management
# View running containers with their ports
docker ps --format "table {{.Names}}\t{{.Ports}}" | grep -E "backend|frontend"
# Inspect container startup command
docker inspect pcvn-erp-backend-prod | grep -A 5 '"Cmd"'
# Check running processes inside container
docker exec pcvn-erp-backend-prod ps aux | grep puma
# View container status in table format
docker ps --format "table {{.Names}}\t{{.Status}}"
Docker Network Troubleshooting
# Complete teardown when network configuration conflicts occur
docker-compose -f docker-compose.prod.yml down
# List Docker networks
docker network ls | grep pcvn
# Remove specific network
docker network rm pcvn-fullstack_pcvn-prod-network
# Recreate everything with fresh network
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml up -d
# Rebuild specific service
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml up -d --build backend
Rails and ActionCable Investigation
# Check Rails routes for WebSocket endpoints
docker exec pcvn-erp-backend-prod rails routes | grep -i cable
# Execute Rails code to inspect database tables
docker exec pcvn-erp-backend-prod bundle exec rails runner "puts ActiveRecord::Base.connection.tables.grep(/cable/).inspect" RAILS_ENV=production
# Load schema file directly
docker exec pcvn-erp-backend-prod bundle exec rails runner "load 'db/cable_schema.rb'" RAILS_ENV=production
# Check migration status
docker exec pcvn-erp-backend-prod bundle exec rails db:migrate:status RAILS_ENV=production | head -30
Log Analysis and Real-time Monitoring
# View recent logs with grep filtering
docker logs pcvn-erp-backend-prod --tail 30 | grep -i "cable\|websocket"
# Follow logs in real-time
docker logs -f pcvn-erp-backend-prod --tail 10
# Search for specific error patterns
docker logs pcvn-erp-backend-prod 2>&1 | grep -i "actioncable\|solid_cable\|cable" | head -20
File Inspection Inside Containers
# Check if file exists in container
docker exec pcvn-erp-backend-prod ls -la config/initializers/action_cable.rb
# View file contents
docker exec pcvn-erp-backend-prod cat config/cable.yml
# Search for files by pattern
docker exec pcvn-erp-backend-prod find app -name "*jwt*" -type f 2>/dev/null
# Grep specific methods in a file
docker exec pcvn-erp-backend-prod cat app/services/jwt_service.rb | grep "def "
Environment and Configuration Verification
# Check environment variables
docker exec pcvn-erp-backend-prod printenv | grep -E "DATABASE|DB_|POSTGRES"
# Verify ActionCable configuration
docker exec pcvn-erp-backend-prod bundle exec rails runner "puts Rails.application.config.action_cable.allowed_request_origins.inspect" RAILS_ENV=production
# Test connectivity between containers
docker exec pcvn-erp-frontend-prod ping -c 1 backend
# Check if endpoint is accessible
docker exec pcvn-erp-frontend-prod curl -I http://backend:3002/cable
Process Management and Container Restarts
# Restart specific service
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml restart backend
# Stop and remove everything, then rebuild
docker-compose -f docker-compose.prod.yml down && \
docker-compose -f docker-compose.prod.yml -f docker-compose.prod.override.yml up -d --build
These commands formed the backbone of my debugging methodology. The combination of Docker container inspection, Rails console access, log analysis, and network troubleshooting provided the visibility I needed to understand each layer of the problem. Having this toolkit allowed me to systematically investigate issues from the network layer up through the application layer, ultimately leading to a successful resolution of the WebSocket connection failures.
If you enjoyed this article, you can also find it published on LinkedIn and Medium.