Debugging WebSocket Connection Failures in a Dockerized Rails/React Application
The Problem
I recently encountered a frustrating issue with my production management ERP system built with Rails and React. The application displayed persistent "Failed to establish real-time connection" errors, and the WebSocket connections for ActionCable refused to establish. After hours of debugging, I discovered the root cause was unexpected: an overly aggressive Application Performance Monitoring (APM) system was flooding the nginx reverse proxy with requests, triggering rate limiting that blocked all legitimate traffic, including WebSocket connections.
Initial Symptoms
The application exhibited several concerning behaviors:
- Red error notifications stating "Failed to establish real-time connection" appeared on every page
- An orange "Reconnecting..." status bar persistently showed at the top of the interface
- HTTP 503 "Service Temporarily Unavailable" errors appeared during login
- Real-time features powered by ActionCable WebSockets were completely non-functional
Investigation Process
Step 1: Examining the Container Architecture
First, I needed to understand the Docker container setup to see how requests were being routed:
docker ps
Output revealed four containers:
CONTAINER ID IMAGE PORTS NAMES
abc123... pcvn-fullstack-frontend 0.0.0.0:8080->80/tcp pcvn-erp-frontend
def456... pcvn-fullstack-backend 3000/tcp pcvn-erp-backend
ghi789... redis:7-alpine 6379/tcp pcvn-redis
jkl012... postgres:15-alpine 5432/tcp pcvn-postgres
This showed that nginx in the frontend container was serving as the reverse proxy, routing requests to the Rails backend.
Step 2: Checking Nginx Configuration
I examined the nginx configuration to understand how requests were being routed:
docker exec pcvn-erp-frontend cat /etc/nginx/conf.d/default.conf
The configuration revealed proper WebSocket handling for /cable
:
location /cable {
proxy_pass http://pcvn-erp-backend:80/cable;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# ... other headers
}
Step 3: Discovering the Rate Limiting Issue
When checking nginx logs for errors, I discovered the smoking gun:
docker logs pcvn-erp-frontend --tail 100 | grep -E "error|503"
Output showed hundreds of rate limiting errors:
2025/08/07 21:12:05 [error] limiting requests, excess: 50.500 by zone "api"
2025/08/07 21:12:05 [error] limiting requests, excess: 51.000 by zone "api"
The nginx configuration included rate limiting:
docker exec pcvn-erp-frontend grep -r "limit_req" /etc/nginx/
Output:
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/s;
limit_req zone=api burst=50 nodelay;
Step 4: Identifying the Traffic Source
I examined the Rails backend logs to understand what was generating so much traffic:
docker logs pcvn-erp-backend --tail 50
The logs showed continuous requests to APM endpoints:
ActionController::RoutingError (No route matches [POST] "/api/apm/transactions")
ActionController::RoutingError (No route matches [POST] "/api/apm/metrics")
These requests were happening hundreds of times per second, overwhelming the rate limiter.
Step 5: Finding the APM Configuration
I searched for APM-related code in the React application:
find ./erp-frontend/src -name "*.ts" -o -name "*.tsx" | xargs grep -l "apm"
This revealed monitoring files:
./erp-frontend/src/monitoring/index.tsx
./erp-frontend/src/monitoring/apm.ts
./erp-frontend/src/monitoring/metrics.ts
I found the initialization in App.tsx
:
grep -A 5 "initializeMonitoring" erp-frontend/src/App.tsx
Output showed:
initializeMonitoring({
enableErrorTracking: true,
enableAPM: true, // This was the culprit!
enableAnalytics: true,
enableLogging: true,
});
The Solution
The fix was surprisingly simple - disable the APM system that was flooding the server:
# Disable APM in App.tsx
sed -i 's/enableAPM: true,/enableAPM: false,/' erp-frontend/src/App.tsx
# Verify the change
grep enableAPM erp-frontend/src/App.tsx
Output confirmed:
enableAPM: false,
Then I rebuilt and restarted the containers:
docker-compose down
docker-compose up --build -d
Results
After disabling the APM system:
- The flood of
/api/apm/transactions
and/api/apm/metrics
requests stopped completely - Rate limiting was no longer triggered (no more "excess: 50.500" errors)
- The 503 errors disappeared from the login page
- WebSocket connections could finally establish successfully
- Real-time features powered by ActionCable started working properly
- The "Failed to establish real-time connection" errors vanished
Key Lessons Learned
This debugging journey taught me several valuable lessons about distributed systems and monitoring:
Monitoring systems can become the problem they're meant to solve - The APM system designed to track performance was actually destroying performance by generating excessive traffic.
Rate limiting can mask root causes - The 503 errors initially appeared to be backend failures, but were actually nginx protecting itself from what looked like a DoS attack from the APM system.
Follow the chain of causation - What started as WebSocket connection failures traced back through rate limiting to an overly aggressive monitoring configuration.
Sometimes the fix is subtraction, not addition - Instead of adding more configuration or code, the solution was to remove the problematic component entirely.
Default configurations need scrutiny - The monitoring system was enabled by default with aggressive settings that weren't suitable for the application's architecture.
Conclusion
Debugging this issue reminded me that in complex distributed systems, problems often manifest far from their root causes. A WebSocket connection failure was ultimately caused by a monitoring system configuration issue, with rate limiting as the intermediary. By methodically tracing through the layers of the stack - from the frontend symptoms through the reverse proxy to the actual traffic patterns - I was able to identify and resolve the underlying problem. Sometimes the most powerful optimization is knowing what to turn off.
If you enjoyed this article, you can also find it published on LinkedIn and Medium.