Skip to main content

Job Management

This guide covers how to monitor and manage background jobs in Cascadia PLM.

Overview

Cascadia uses a RabbitMQ-backed background job system for async processing. Jobs handle tasks like:

  • Workflow Notifications: Sending email/webhook notifications on workflow transitions
  • Bulk Operations: Processing large data sets asynchronously
  • Scheduled Tasks: Recurring maintenance operations

Administrators can monitor job status, retry failed jobs, and cancel pending jobs from the Admin dashboard.

Accessing Job Management

  1. Navigate to Admin > Jobs from the main navigation
  2. The dashboard displays real-time job statistics and a filterable job list

Job Dashboard

Statistics Cards

The top of the page shows count cards for:

MetricDescription
TotalAll jobs in the system
PendingJobs waiting to be queued
QueuedJobs in the RabbitMQ queue
RunningJobs currently being processed
CompletedSuccessfully finished jobs
FailedJobs that encountered errors

Auto-Refresh

Toggle Auto-refresh to enable 5-second polling for real-time updates. Useful when monitoring active job processing.

Job List

The data grid displays all jobs with the following columns:

ColumnDescription
TypeJob type identifier (e.g., notification.workflow.transition)
StatusCurrent status with color-coded badge
PriorityJob priority level (Low, Normal, High, Critical)
ProgressVisual progress bar with percentage
AttemptsCurrent attempt count vs. maximum (e.g., "2/3")
CreatedTimestamp when job was submitted
ErrorError message if failed (hover for full text)

Status Colors

StatusColorDescription
pendingGrayWaiting to be queued
queuedBlueIn RabbitMQ queue
runningAmberCurrently processing
completedGreenFinished successfully
failedRedEncountered an error
cancelledGrayManually cancelled

Filtering

Use the filter controls to narrow down jobs by:

  • Status: Show only jobs in a specific state
  • Type: Filter by job type identifier

Job Actions

Click the ... menu on any job row to access available actions:

View Details

Opens a modal with comprehensive job information:

Basic Information:

  • Type, Status, Priority
  • Attempt count and maximum attempts
  • Creation and completion timestamps

Payload Section:

  • JSON display of the job's input data
  • Useful for debugging what parameters were passed

Result Section:

  • JSON display of the job's output data (completed jobs only)
  • Shows what the job produced

Logs Section:

  • Structured log entries with timestamps
  • Color-coded by level (debug, info, warn, error)
  • Includes any additional data attached to log entries

Error Section (failed jobs):

  • Full error message displayed in a code block
  • Stack trace if available

Retry Job

Available only for failed jobs.

  1. Click Retry Job from the row menu
  2. The job is reset:
    • Status changed to pending
    • Attempt counter reset to 0
    • Error and result cleared
  3. Job is re-queued for processing

Use this when:

  • A transient error caused the failure (network timeout, service unavailable)
  • You've fixed the underlying issue
  • You want to force immediate reprocessing

Cancel Job

Available only for pending or queued jobs.

  1. Click Cancel Job from the row menu
  2. Confirm the cancellation in the dialog
  3. Job is marked as cancelled with a completion timestamp

Use this when:

  • A job was submitted incorrectly
  • The task is no longer needed
  • You need to prevent processing before it starts

Note: You cannot cancel jobs that are already running. Wait for them to complete or fail.

Job Lifecycle

┌─────────┐ ┌────────┐ ┌─────────┐ ┌───────────┐
│ Pending │ ──→ │ Queued │ ──→ │ Running │ ──→ │ Completed │
└─────────┘ └────────┘ └─────────┘ └───────────┘
│ │ │
│ │ ↓
│ │ ┌────────┐
│ │ │ Failed │ ←─── (can retry)
│ │ └────────┘
↓ ↓
┌───────────┐
│ Cancelled │
└───────────┘

Automatic Retries

When a job fails, the system automatically schedules retries based on the job type's configuration:

  1. Exponential Backoff: Each retry waits longer (e.g., 30s, 2min, 10min)
  2. Max Attempts: Jobs stop retrying after reaching the configured limit
  3. Retry Delays: Configurable per job type

Failed jobs that have exhausted all retry attempts remain in failed status until manually retried or deleted.

Job Priorities

Jobs are processed by priority order within the queue:

PriorityRabbitMQ ValueUse Case
Critical9System alerts, security notifications
High6User-facing notifications, time-sensitive tasks
Normal3Standard background work
Low1Maintenance, cleanup, reports

Higher priority jobs are processed before lower priority jobs when workers are busy.

Monitoring Best Practices

Regular Checks

  1. Monitor Failed Jobs: Review failed jobs daily to identify recurring issues
  2. Check Queue Depth: A growing queue may indicate worker capacity issues
  3. Review Logs: Use job logs to debug failures without checking server logs

Troubleshooting Failures

  1. View job details to see the full error message
  2. Check the payload to verify correct input data
  3. Review logs for step-by-step execution history
  4. Check RabbitMQ UI (http://localhost:15672) for queue health

Performance Issues

If jobs are processing slowly:

  1. Increase worker concurrency: Set WORKER_CONCURRENCY env var (default: 5)
  2. Add more workers: Scale horizontally with additional worker containers
  3. Check job types: Some job types may have rate limits or concurrency caps

API Reference

Administrators can also manage jobs programmatically:

List Jobs

GET /api/admin/jobs?status=failed&limit=50

Query parameters:

  • status - Filter by job status
  • type - Filter by job type
  • limit - Max results (default: 100, max: 500)
  • offset - Pagination offset

Get Job Details

GET /api/admin/jobs/{jobId}

Returns full job object including logs.

Retry Job

POST /api/admin/jobs/{jobId}/retry

Resets and re-queues a failed job.

Cancel Job

POST /api/admin/jobs/{jobId}/cancel

Cancels a pending or queued job.

See Also