Debugging a Rails Authorization System: My Journey from 403 Errors to Resolution

Debugging a Rails Authorization System: My Journey from 403 Errors to Resolution

The Crisis: When Everything Returns 403

I recently found myself staring at a production management system that was completely inaccessible. Every API endpoint I tried to access returned either 403 Forbidden or 500 Internal Server errors. The browser console was flooding with errors, and the application that had been running in Docker for 8 hours suddenly seemed completely broken.

The initial symptoms were overwhelming:

  • WebSocket connections failing with 404 errors
  • Multiple API endpoints returning 403 Forbidden
  • Critical dashboard endpoints throwing 500 errors
  • Authentication appearing to work (200 OK on login) but authorization failing everywhere

Starting the Investigation

My first instinct was to verify what was actually running. I needed to understand if this was an infrastructure problem or an application-level issue.

docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Image}}"

The output showed all containers were healthy and had been running for 8 hours:

NAMES                    STATUS                 PORTS
pcvn-erp-backend-prod    Up 8 hours (healthy)   0.0.0.0:3002->3002/tcp
pcvn-erp-frontend-prod   Up 8 hours (healthy)   0.0.0.0:3003->80/tcp
pcvn-erp-db-prod         Up 8 hours (healthy)   0.0.0.0:5432->5432/tcp

This told me the Docker infrastructure was fine. The problem had to be within the application logic itself.

Diving Into the Backend Logs

Next, I examined the backend container logs to understand what was happening when requests were made:

docker logs pcvn-erp-backend-prod --tail 50

The logs revealed three distinct patterns:

  1. Authentication was working: POST /api/v1/auth/login returned Completed 200 OK
  2. Authorization was failing: Filter chain halted as :authorize_production_access! rendered or redirected
  3. Code errors existed: NameError (uninitialized constant Api::V1::MetricsController::ProductionStagesOrder)

Tracing the Authorization Logic

I needed to understand how the authorization system was supposed to work. I examined the controller structure:

docker exec pcvn-erp-backend-prod cat app/controllers/api/v1/production_orders_controller.rb | head -30

This revealed that the controller expected specific roles: production managers, line supervisors, and regular employees. The authorization methods were being called, but something was preventing them from working correctly.

The First Dead End

I searched for the authorization method in what I thought was the obvious place:

docker exec pcvn-erp-backend-prod grep -n "authorize_production_access" app/controllers/application_controller.rb

Nothing. The method wasn't where I expected it to be. This led me to search more broadly:

docker exec pcvn-erp-backend-prod bash -c "grep -r 'def authorize_production_access' app/ 2>/dev/null"

I found the methods were defined as private methods within each individual controller, not in a shared location as I'd initially assumed.

Understanding the Authorization Implementation

I extracted the actual authorization logic:

docker exec pcvn-erp-backend-prod sed -n '/def authorize_production_access!/,/^      def \|^    end/p' app/controllers/api/v1/production_orders_controller.rb

The output was revealing:

def authorize_production_access!
  unless can_access_production?
    render json: { error: "Forbidden" }, status: :forbidden
  end
end

This led me to find the actual permission check:

docker exec pcvn-erp-backend-prod sed -n '198,210p' app/controllers/api/v1/production_orders_controller.rb
def can_access_production?
  current_user.has_role?("admin") || current_user.has_role?("production_manager") || current_user.has_role?("line_supervisor")
end

The Root Cause Discovery

Now I understood the authorization flow, but I needed to know why it was failing. I checked the user's actual state:

docker exec pcvn-erp-backend-prod bundle exec rails runner "u = User.find_by(email: 'admin@pcvn.com'); puts \"Role ID: #{u.role_id || 'nil'}\"; puts \"Role: #{u.role&.name || 'No role assigned'}\""

The output was the smoking gun:

Role ID: nil
Role: No role assigned

My admin user had no role assigned at all! The authorization system was working correctly - it was just that nobody had any permissions.

Checking the Role System

I investigated whether roles even existed in the database:

docker exec pcvn-erp-backend-prod bundle exec rails runner "puts 'Total roles: ' + Role.count.to_s"
Total roles: 0

The database had zero roles. The entire role-based access control system was built and functional, but the actual role data was never created.

The Failed First Attempt

I tried to create the roles:

docker exec pcvn-erp-backend-prod bundle exec rails runner "['admin', 'production_manager'].each {|name| Role.create!(name: name)}"

But I got confusing errors about roles already existing, yet the count was still zero. This led me to dig deeper into the Role model requirements:

docker exec pcvn-erp-backend-prod bundle exec rails runner "r = Role.new(name: 'admin'); puts 'Valid?: ' + r.valid?.to_s; unless r.valid?; puts 'Errors:'; r.errors.full_messages.each {|e| puts '  - ' + e}; end"

The real problem emerged:

Valid?: false
Errors:
  - Description can't be blank

The Successful Fix

Armed with the knowledge that roles required both name and description, I created them properly:

docker exec pcvn-erp-backend-prod bundle exec rails runner "[['admin', 'System administrator with full access'], ['production_manager', 'Manages production operations'], ['line_supervisor', 'Supervises production lines'], ['employee', 'Regular employee']].each {|name, desc| Role.create!(name: name, description: desc)}"

Success! The roles were created with UUIDs as primary keys:

c7af64d2-5520-4a02-b12d-14fd507fdf26: admin - System administrator with full access
125aed27-0621-4751-803c-bb7318baf382: production_manager - Manages production operations

Assigning the Admin Role

Finally, I assigned the admin role to my user:

docker exec pcvn-erp-backend-prod bundle exec rails runner "u = User.find_by(email: 'admin@pcvn.com'); r = Role.find_by(name: 'admin'); u.update!(role: r); puts \"Has admin access: #{u.has_role?('admin')}\""
Has admin access: true

Verification

I verified the fix was working by creating a test JWT token and making a direct API call:

docker exec pcvn-erp-backend-prod bundle exec rails runner "require 'net/http'; u = User.find_by(email: 'admin@pcvn.com'); token = JWT.encode({user_id: u.id}, Rails.application.credentials.secret_key_base); uri = URI('http://localhost:3002/api/v1/production_orders'); req = Net::HTTP::Get.new(uri); req['Authorization'] = \"Bearer #{token}\"; res = Net::HTTP.new(uri.host, uri.port).request(req); puts \"Status: #{res.code}\""
Status: 200 OK

Reflections on the Journey

This debugging experience taught me several valuable lessons about systematic troubleshooting:

  1. Start with infrastructure verification - I first confirmed all Docker containers were healthy before diving into application logic.

  2. Follow the error messages systematically - The "Filter chain halted" message led me directly to the authorization system.

  3. Never assume, always verify - I initially assumed authorization methods would be in ApplicationController, but searching thoroughly revealed they were implemented differently.

  4. Understand the complete flow - Tracing from the controller through the authorization methods to the actual role checking logic revealed the complete picture.

  5. Check data prerequisites - The code was working perfectly; it was the missing data (roles) that caused the failures.

  6. Read validation errors carefully - The first attempt to create roles failed because I didn't know about the description field requirement.

The entire authorization system was like a perfectly functional lock with no keys manufactured yet. Once I created the keys (roles) and gave one to the admin user, everything worked as designed.

Technical Takeaways

Working through this issue reinforced several technical best practices:

  • Always check both code AND data when debugging authorization issues
  • Docker container health doesn't guarantee application functionality
  • Rails validation errors need careful attention - they tell you exactly what's wrong
  • JWT tokens are snapshots in time - after fixing permissions, users need new tokens
  • Systematic debugging beats random attempts every time

The most satisfying part was that fixing this required no code changes, no configuration updates, and no Docker container modifications. It was purely a matter of understanding what data the existing system expected and providing it. Sometimes the most complex-seeming problems have surprisingly simple solutions when approached methodically.


If you enjoyed this article, you can also find it published on LinkedIn and Medium.