Debugging a Complex Docker Application: From Complete Failure to Full Recovery
A Real-World Case Study in Systematic Container Troubleshooting
When I encountered a completely broken containerized application, the path to recovery required methodical investigation, deep understanding of container orchestration, and careful attention to how modern web applications communicate. This guide documents the systematic debugging process I used to transform a non-functional PCVN ERP system into a fully operational production management platform.
Initial State: Multiple Cascading Failures
The application consisted of four Docker containers that should work in harmony:
- Frontend: React/Vite application served by nginx
- Backend: Rails 8 API server
- PostgreSQL: Primary database
- Redis: Caching and WebSocket support
However, the system was experiencing complete failure with both the frontend and backend containers unable to start properly.
Problem 1: Frontend Container Restart Loop
Symptom
I discovered the frontend container was stuck in an endless restart cycle, preventing the web interface from being accessible.
Investigation Commands
# Check container status
docker ps -a
# Examine container logs
docker logs pcvn-erp-frontend --tail 20
# Output revealed:
# nginx: [emerg] host not found in upstream "backend" in /etc/nginx/nginx.conf:88
Root Cause
After investigating, I found that the nginx configuration was attempting to proxy API requests to a hostname called "backend" that didn't exist in Docker's DNS resolution. The configuration contained:
proxy_pass http://backend:3000;
Solution
I modified the nginx configuration to use Docker Desktop's special hostname that bridges to the host machine:
# Create corrected nginx configuration
sed 's/proxy_pass http:\/\/backend:3000;/proxy_pass http:\/\/host.docker.internal:3000;/g' \
nginx.conf > nginx-temp.conf
# Update Dockerfile to use corrected configuration
sed -i 's/COPY nginx.conf/COPY nginx-temp.conf/' Dockerfile
# Rebuild frontend image
docker build -t pcvn-fullstack-frontend ./erp-frontend
Problem 2: Backend Container Exit Code 127
Symptom
I noticed the backend container repeatedly failed with exit code 127 (command not found).
Investigation Commands
# Check container exit status
docker ps -a | grep backend
# Output: Exited (127)
# Examine what command is failing
docker inspect pcvn-erp-backend --format='{{.Config.Cmd}}'
# Output: [./bin/thrust ./bin/rails server]
# Search for the Rails executable
docker run --rm pcvn-fullstack-backend find / -name rails -type f 2>/dev/null
# Found: /rails/bin/rails
Root Cause
Through careful examination, I discovered the container's entrypoint script was trying to execute rails
without the correct path. The Rails executable existed at /rails/bin/rails
but the command was just rails server
without the path prefix.
Solution
I specified the correct path when starting the container:
docker run -d \
--name pcvn-erp-backend \
--network pcvn-fullstack_pcvn-network \
-p 3000:3000 \
-e RAILS_ENV=production \
-e DATABASE_URL="postgresql://user:pass@pcvn-postgres:5432/db" \
pcvn-fullstack-backend \
./bin/rails server -b 0.0.0.0 # Correct path to Rails
Problem 3: Database Configuration Mount Conflict
Symptom
The backend container failed to start with a mount error about /rails/config/database.yml
.
Investigation Commands
# Inspect mount configuration
docker inspect pcvn-erp-backend --format='{{json .Mounts}}' | python3 -m json.tool
# Check if source file exists
ls -la backend/config/database.yml*
Root Cause
I found that Docker was trying to mount a file (database.yml.docker
) to a location that had become a directory due to a corrupted container state.
Solution
I removed the corrupted container and created a fresh one without the problematic mount:
docker rm pcvn-erp-backend
# Then recreate without the conflicting mount
Problem 4: Hardcoded Development URLs in Production Build
Symptom
I discovered the frontend JavaScript contained hardcoded localhost:3000
URLs instead of configurable API endpoints.
Investigation Commands
# Search for hardcoded URLs in compiled JavaScript
docker exec pcvn-erp-frontend sh -c \
'strings /usr/share/nginx/html/assets/*.js | grep localhost:3000'
# Output: Multiple instances of "http://localhost:3000/api/v1"
Root Cause
After investigation, I realized the frontend was built without proper environment variables, causing Vite to use fallback development URLs:
// The code had fallbacks like:
API_URL: process.env.VITE_API_URL || "http://localhost:3000/api/v1"
Solution
While a proper fix requires rebuilding with correct environment variables, I noted that the hardcoded localhost:3000
actually works for local development since the backend is exposed on that port.
Problem 5: API Route Verification
Symptom
I needed to verify whether backend routes matched frontend expectations.
Investigation Commands
# Test if API base path exists
curl -I http://localhost:3000/api/v1
# Result: 404 (expected - no route for base path)
# Check actual Rails routes
docker exec pcvn-erp-backend ./bin/rails routes | head -20
# Confirmed routes exist at /api/v1/*
# Test specific endpoint
curl -I http://localhost:3000/api/v1/auth/me
# Result: 401 Unauthorized (good - endpoint exists)
Root Cause
The initial 404 was misleading - I learned that Rails doesn't create routes for API namespace roots, only for actual endpoints beneath them.
Final System Architecture
After resolving all issues, I achieved a working architecture consisting of:
Browser (http://localhost:8080)
↓
Frontend Container (nginx)
↓ (API calls to localhost:3000)
Host Machine Network
↓
Backend Container (Rails on port 3000)
↓
PostgreSQL & Redis Containers (via Docker network)
Key Debugging Principles I Applied
- Read error messages carefully - Exit code 127 specifically means "command not found"
- Trace the full request path - Understanding how requests flow from browser through containers
- Verify assumptions - Don't assume configuration matches documentation
- Check multiple layers - Problems can exist at the Docker, application, or network level
- Understand the build vs runtime distinction - Some issues are baked in during build, others occur at runtime
Critical Commands for Container Debugging
# Container status and health
docker ps -a
docker logs [container] --tail 50
# Container configuration
docker inspect [container] --format='{{.State.ExitCode}}'
docker inspect [container] --format='{{.Config.Cmd}}'
docker inspect [container] --format='{{json .Mounts}}' | python3 -m json.tool
# Network testing
docker exec [container] ping [other-container]
docker network ls
docker network inspect [network]
# File system exploration
docker exec [container] ls -la /path
docker exec [container] find / -name [filename] 2>/dev/null
# Application-specific debugging
docker exec backend-container ./bin/rails routes
docker exec frontend-container cat /etc/nginx/nginx.conf
Outcome
Through systematic debugging and understanding of container orchestration, I successfully:
- Resolved nginx hostname resolution issues
- Fixed Rails executable path problems
- Eliminated mount point conflicts
- Verified API endpoint alignment
- Established proper container networking
The result is a fully functional containerized application with all services running healthy and communicating properly.
Lessons for Production Deployments
From this experience, I learned several critical lessons:
- Environment variables must be provided at build time for frontend applications
- Container entrypoints need explicit paths to executables
- Docker Compose networking differs from direct Docker commands
- CORS configuration is critical for multi-port deployments
- Health checks should verify actual application functionality, not just process existence
This systematic approach to debugging containerized applications taught me the importance of understanding not just the surface errors, but the underlying architecture and communication patterns that make modern web applications function. Each problem I encountered became an opportunity to deepen my understanding of how containers, networking, and application layers interact in production environments.
If you enjoyed this article, you can also find it published on LinkedIn and Medium.