Docker Stack Recovery: A Troubleshooting Journey

August 6, 2025 4 min read

Docker Stack Recovery: A Troubleshooting Journey

The Challenge

I discovered a partially failed Docker deployment with critical services down. The frontend container was trapped in a restart loop while the backend had mysteriously stopped running. This guide documents my systematic approach to diagnosing and recovering the entire stack to full operational status.

Initial Assessment

When checking the container status, I found a concerning situation that required immediate attention:

$ docker ps -a

Output revealed:

Frontend: Continuously restarting every 50 seconds
Backend: Exited 11 hours ago with code 127
PostgreSQL: Running healthy on port 5433
Redis: Running healthy on port 6379

Step 1: Diagnosing the Frontend Restart Loop

I investigated why the frontend kept crashing by examining its logs:

$ docker logs pcvn-erp-frontend --tail 20

Output:

[emerg] host not found in upstream "backend" in /etc/nginx/nginx.conf:56

The nginx configuration couldn't resolve the "backend" hostname because the backend container was down.

Step 2: Investigating Backend Failure

Exit code 127 typically indicates "command not found." I checked the backend logs:

$ docker logs pcvn-erp-backend --tail 20

Output:

ActionController::RoutingError (No route matches [POST] "/api/apm/transactions")
{"level":"INFO","msg":"Request","path":"/api/apm/transactions","status":404}

The Rails application had been running successfully earlier, just returning 404s for monitoring routes.

Step 3: Reviving the Backend

I restarted the stopped backend container:

$ docker start pcvn-erp-backend
$ docker ps | grep backend

Result: Backend started successfully and showed healthy status on port 3000.

Step 4: Fixing the Frontend

With the backend running, I restarted the frontend to resolve the nginx upstream error:

$ docker restart pcvn-erp-frontend

Step 5: Verifying Full Recovery

I confirmed all containers were healthy:

$ docker ps

All services showing healthy:

Frontend: 0.0.0.0:8080->80/tcp
Backend: 0.0.0.0:3000->80/tcp
PostgreSQL: 0.0.0.0:5433->5432/tcp
Redis: 6379/tcp

Step 6: Testing Application Accessibility

I verified the frontend was responding:

$ curl -I http://localhost:8080

Output: HTTP/1.1 200 OK

I tested the backend health endpoint:

$ curl -I http://localhost:3000/up

Output: HTTP/1.1 200 OK

Key Learnings

Container Dependencies Matter

The frontend's nginx configuration depended on the backend being available. When the backend stopped, the frontend couldn't resolve the hostname and entered a restart loop.

Exit Codes Tell Stories

Exit 0: Clean shutdown
Exit 127: Command not found
Exit 1: General errors

Docker DNS Resolution

Containers communicate using service names through Docker's internal DNS. When a container stops, its hostname becomes unresolvable.

Health Checks Are Essential

Health status indicators helped me quickly identify which services were truly operational versus just "running."

Recovery Checklist

When encountering Docker deployment failures:

Assess the situation
```
docker ps -a
```
Check logs of failed containers
```
docker logs [container-name] --tail 50
```
Identify dependencies
- Map which containers depend on others
Start stopped containers
```
docker start [container-name]
```
Restart dependent containers
```
docker restart [dependent-container]
```
Verify health status
```
docker ps
```
Test connectivity
```
curl -I http://localhost:[port]/health
```

Conclusion

What appeared as a complex deployment failure was actually a simple cascade effect. The backend stopped, causing the frontend to lose its upstream connection and enter a restart loop. By methodically checking each component and understanding the dependency chain, I recovered the entire stack without data loss or configuration changes.

This experience reinforced the importance of understanding container orchestration, service dependencies, and systematic troubleshooting. Sometimes the solution is as simple as starting a stopped container and letting the system heal itself.

Tech Stack: Docker, nginx, Ruby on Rails, PostgreSQL, Redis
Environment: WSL 2 Ubuntu on Windows 11

If you enjoyed this article, you can also find it published on LinkedIn and Medium.

Docker Stack Recovery: A Troubleshooting Journey

Docker Stack Recovery: A Troubleshooting Journey

The Challenge

Initial Assessment

Step 1: Diagnosing the Frontend Restart Loop

Step 2: Investigating Backend Failure

Step 3: Reviving the Backend

Step 4: Fixing the Frontend

Step 5: Verifying Full Recovery

Step 6: Testing Application Accessibility

Key Learnings

Container Dependencies Matter

Exit Codes Tell Stories

Docker DNS Resolution

Health Checks Are Essential

Recovery Checklist

Conclusion

Tags

Share this post