Troubleshooting

This guide covers common issues and their solutions.

Component-Based Setup

Component Not Found

Error:

Component 'radiuss/radiuss-shared-ci/dane-pipeline' not found

Possible Causes:

  1. Using GitLab version < 17.0

  2. Component not published to CI/CD Catalog

  3. Incorrect component path

  4. Network/permissions issue

Solutions:

  1. Check GitLab version at https://lc.llnl.gov/gitlab/help

  2. Verify component exists in catalog: https://lc.llnl.gov/gitlab/explore/catalog

  3. Check component path format: $CI_SERVER_FQDN/radiuss/radiuss-shared-ci/component-name@version

  4. Try using full path instead of $CI_SERVER_FQDN

Input Validation Errors

Error:

Required input 'github_project_name' not provided

Solution:

Add missing input to component’s inputs: section:

- component: .../base-pipeline@v2026.02.2
  inputs:
    github_project_name: "my-project"  # Add this
    github_project_org: "LLNL"
    github_token: $GITHUB_STATUS_TOKEN

Error:

Input 'job_cmd' has invalid type

Solution:

Check that input matches expected type. For job_cmd, provide a string:

inputs:
  job_cmd: $JOB_CMD              # Correct: variable reference
  job_cmd: "./script.sh"         # Correct: string
  job_cmd:                       # Incorrect: structured data
    value: "./script.sh"

Version Mismatch

Issue: Different component versions in same pipeline

Solution:

Use consistent version across all components:

include:
  - component: .../base-pipeline@v2026.02.2
  - component: .../utility-draft-pr-filter@v2026.02.2  # Same version

dane-build-and-test:
  trigger:
    include:
      - component: .../dane-pipeline@v2026.02.2  # Same version

Pipeline Issues

Pipeline Doesn’t Start

Possible Causes:

  1. Mirror not set up: Verify your GitHub project is mirrored to LC GitLab

  2. YAML syntax errors: Validate with the CI Lint tool at https://lc.llnl.gov/gitlab/<org>/<project>/-/ci/lint

  3. Wrong project name: Ensure GITHUB_PROJECT_NAME matches your GitHub repository name exactly

Machine Check Failures

Machine is down: Expected — the pipeline skips that machine gracefully, no action required.

Lorenz file missing: The machine availability check depends on the Lorenz status file. Contact LC support if this persists.

Template Not Found in Child Pipeline

Error:

Job extends template that doesn't exist: .job_on_dane

Cause:

Machine pipeline component not included in child pipeline trigger.

Solution:

Include machine component in trigger section, not parent pipeline:

# Parent pipeline (.gitlab-ci.yml)
dane-build-and-test:
  trigger:
    include:
      - component: .../dane-pipeline@v2026.02.2  # Include here
      - local: '.gitlab/jobs/dane.yml'

# Job file (.gitlab/jobs/dane.yml)
my-job:
  extends: .job_on_dane  # Template now available

Stage Not Defined

Error:

'prerequisites' is not a defined stage

Cause:

Components don’t define stages to allow customization.

Solution:

Define required stages in .gitlab-ci.yml:

stages:
  - prerequisites           # Required for machine checks
  - build-and-test          # Required for build/test jobs
  - performance-measurements # Required if using perf pipeline

Missing ASSOCIATED_CHILD_PIPELINE

Error:

Machine check failure status is never updated on GitHub after machine is back up.

Cause:

Machine check requires ASSOCIATED_CHILD_PIPELINE variable (added in v2026.02.0).

Solution:

Add variable to machine check job:

dane-up-check:
  extends: [.dane, .machine-check]
  variables:
    ASSOCIATED_CHILD_PIPELINE: "dane-build-and-test"  # Add this

Machine/Allocation Issues

Allocation Failures

Error:

sbatch: error: Batch job submission failed

Common Causes:

  1. Invalid allocation parameters

  2. No access to specified partition/queue/reservation

  3. Resource unavailable

  4. Time limit exceeded

Solutions:

Check allocation parameters:

# SLURM (Dane, Matrix)
inputs:
  shared_alloc: "--reservation=ci --exclusive --nodes=1 --time=30"
  job_alloc: "--reservation=ci --nodes=1"

# Flux (Tioga, Tuolumne, Corona)
inputs:
  shared_alloc: "--queue=pci --exclusive --nodes=1 --time-limit=30m"
  job_alloc: "--nodes=1 --begin-time=+5s"

Verify access to CI resources:

# SLURM
sinfo -p partition_name
scontrol show reservation=ci

# Flux
flux resource list

Service User Issues

Issue: Jobs fail with service user permissions

Solutions:

  1. Verify service user configured: LLNL_SERVICE_USER variable set

  2. Check service user has machine access

  3. Check allocation permissions for service user

Issue: Working directory creation fails

Solution:

Service user needs write access to default build directory or set custom path:

variables:
  CUSTOM_CI_BUILD_DIR: "/usr/workspace/your_project/ci"

GitHub Integration Issues

Token Not Working

Issue: GitHub status not updating

Solutions:

  1. Verify token scope: Requires repo:status permission

  2. Check token validity: Token may have expired

  3. Verify variable name: Should be consistent in UI and across configuration

To test token:

curl -H "Authorization: token YOUR_TOKEN" \
  https://api.github.com/repos/LLNL/your-project/commits/main/status

Job Execution Issues

Job Command Fails

Issue: JOB_CMD execution fails

Debugging Steps:

  1. Check command syntax:

    JOB_CMD:
      value: "./scripts/build-and-test.sh"
      expand: false  # Prevents variable expansion
    
  2. Verify script exists and is executable:

    ls -l scripts/build-and-test.sh
    # Should show: -rwxr-xr-x
    
  3. Test script locally:

    ./scripts/build-and-test.sh
    
  4. Check script dependencies: - Required modules loaded - Environment variables set - Paths correct

  5. Review job logs: Check GitLab job output for error messages

Shared Allocation Issues

Issue: Jobs waiting indefinitely

Possible Causes:

  1. Shared allocation failed but jobs still queued

  2. Allocation time limit too short and expired before jobs start

Solutions:

  1. Check allocation status in logs

  2. Increase time limit:

    shared_alloc: "--nodes=1 --time=60"  # Increase from 30
    
  3. Disable shared allocation:

    shared_alloc: "OFF"
    job_alloc: "--exclusive --nodes=1 --time=20"
    

Legacy Setup Issues

Include File Not Found

Error:

Include file 'pipelines/dane.yml' not found

Solutions:

  1. Check ref: points to valid version tag

  2. Verify file path correct (pipelines/ not pipeline/)

Common Error Patterns

This section lists error messages and their typical causes.

GitLab YAML Errors

Error: jobs:job_name config should implement a script: or a trigger: keyword

Cause: Job missing required execution method.

Solution: Add extends: or define script:/trigger:

my-job:
  extends: .job_on_dane  # Provides script from template

Error: Included file could not be parsed

Cause: YAML syntax error in included file.

Solution: Check YAML formatting:

  • Consistent indentation (spaces, not tabs)

  • Proper quoting of special characters

  • Valid structure

Error: This GitLab CI configuration is invalid

Cause: Various YAML validation errors.

Solution: Check CI Lint: https://lc.llnl.gov/gitlab/<org>/<project>/-/ci/lint

Variable Errors

Error: variable is not defined

Cause: Referenced variable not set.

Solution: Define variable in .gitlab-ci.yml or custom-variables.yml:

variables:
  MY_VARIABLE: "value"

Issue: Variable shows as literal $VAR in job

Cause: Variable not available in child pipeline context.

Solution: Pass explicitly through component inputs or job variables.

Issue: Variable expands too early

Cause: expand: true (default) expands in parent pipeline.

Solution: Use expand: false for late expansion:

variables:
  JOB_CMD:
    value: "./script.sh"
    expand: false

Machine-Specific Patterns

Dane/Matrix (SLURM):

sbatch: error: invalid partition specified
sbatch: error: invalid account name specified
sbatch: error: invalid reservation name specified

Solution: Check SLURM allocation parameters, verify access to resources.

Tioga/Tuolumne/Corona (Flux):

flux-submit: Invalid --queue specified
flux-submit: Unknown option --time

Solution: Use Flux syntax: --time-limit= not --time=

Getting More Help

If your issue isn’t covered here:

  1. Check logs: Review complete GitLab job output

  2. Check examples: See examples/ directory in repository

  3. Search issues: https://github.com/LLNL/radiuss-shared-ci/issues

  4. Ask for help: Open new issue with: - GitLab version - RADIUSS Shared CI version - Minimal reproduction example - Complete error message - Relevant configuration files

See Also