Debugging a Railway Deployment: My Journey from PORT Errors to Production

Debugging a Railway Deployment: My Journey from PORT Errors to Production

The Problem Discovery

My Railway deployment showed "Deployment Successful" but the application kept crashing with a cryptic error:

Error: Invalid value for '--port': '$PORT' is not a valid integer.

This error repeated endlessly in the logs, preventing my Document Q&A application from serving users. I knew I had to dig deeper.

Initial Investigation

I started by checking the deployment status and logs:

railway status
railway logs --tail 20

The logs revealed the PORT environment variable wasn't being expanded properly. My application was literally trying to use the string $PORT instead of the actual port number Railway assigns dynamically.

Uncovering the First Issue: Dockerfile CMD Format

I examined my Dockerfile and found the problem:

cat Dockerfile.complete
CMD ["uvicorn", "backend.api:app", "--host", "0.0.0.0", "--port", "8000"]

The issue was twofold:

  1. Port was hardcoded to 8000 instead of using Railway's dynamic PORT
  2. The exec form (JSON array) doesn't expand environment variables

I created a new Dockerfile with shell form CMD to allow variable expansion:

# Dockerfile.railway
CMD uvicorn backend.api:app --host 0.0.0.0 --port ${PORT:-8000}

The ${PORT:-8000} syntax provides a fallback to port 8000 for local testing while allowing Railway's PORT variable to work in production.

Building and Pushing to Docker Hub

I rebuilt the image with the fixed configuration:

docker build --no-cache -t ltphongssvnclaudepromax/docqa-complete:railway -f Dockerfile.railway .
docker push ltphongssvnclaudepromax/docqa-complete:railway

I then configured Railway to use this Docker Hub image:

railway variables --set "DOCKER_IMAGE=ltphongssvnclaudepromax/docqa-complete:railway"
railway up --detach

Discovering the Second Issue: Configuration Override

Despite setting the DOCKER_IMAGE variable, the PORT error persisted. I investigated further:

railway variables  # Showed DOCKER_IMAGE was set correctly
ls -la railway.* .railway* 2>/dev/null
cat railway.json

I discovered a railway.json file that was forcing Nixpacks builds:

{
  "build": {
    "builder": "NIXPACKS"
  },
  "deploy": {
    "startCommand": "uvicorn backend.api:app --host 0.0.0.0 --port $PORT"
  }
}

This configuration file was overriding my Docker image setting. I removed it:

rm railway.json

The Third Issue: Root Dockerfile Detection

Even after removing railway.json, the error continued. I realized Railway might be detecting a Dockerfile in the root:

ls -la Dockerfile*
cat Dockerfile

Indeed, there was a root Dockerfile with the same hardcoded port issue. Railway automatically uses any Dockerfile it finds, ignoring the DOCKER_IMAGE variable. I replaced it with my fixed version:

cp Dockerfile.railway Dockerfile
railway up --detach

Success and Verification

Finally, the logs showed success:

railway logs --tail 20
# Output:
# INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

I verified the deployment was working:

curl https://enterprise-document-qa-production.thanhphongle.net
# Returned the React app HTML

curl https://enterprise-document-qa-production.thanhphongle.net/api/health
# Initially returned HTML instead of JSON - found API route ordering issue

Lessons I Learned

This debugging journey taught me several critical lessons about Railway deployments:

  1. Environment variable expansion requires shell form CMD - The exec form (JSON array) in Dockerfiles doesn't expand variables
  2. Railway's configuration hierarchy - Local config files override platform variables
  3. Dockerfile detection takes precedence - Railway automatically uses any Dockerfile it finds, ignoring DOCKER_IMAGE settings
  4. Route ordering matters in FastAPI - Catch-all routes must be defined last

My systematic approach—checking logs, examining configurations, testing hypotheses—helped me identify and fix three separate but related issues. Each fix brought me closer to a working deployment.

CLI Commands Reference

Here's my debugging toolkit from this session:

Deployment Status and Logs

railway status                    # Check deployment status
railway logs --tail 20           # View recent logs
railway variables                # List all environment variables

Docker Operations

docker build --no-cache -t <tag> -f <dockerfile> .  # Build image
docker push <repository>/<image>:<tag>              # Push to registry
docker tag <source> <target>                        # Tag image

Railway Configuration

railway variables --set "KEY=value"     # Set environment variable
railway up --detach                     # Deploy to Railway
railway domain add <custom-domain>      # Add custom domain

File Investigation

cat <file>                              # View file contents
ls -la <pattern>                        # List files with details
grep -n <pattern> <file>                # Find line numbers
wc -l <file>                           # Count lines in file

Testing Deployment

curl -I <url>                          # Check HTTP headers
curl <url>                             # Get full response

This debugging experience reinforced my belief that systematic investigation and understanding the underlying systems are key to solving deployment issues. Each error message was a clue leading to the root cause, and patience with methodical troubleshooting paid off with a successful production deployment.


If you enjoyed this article, you can also find it published on LinkedIn and Medium.