Core Concepts

This section explains the fundamental concepts behind RADIUSS Shared CI, helping you understand how the system works before diving into setup and configuration.

What is RADIUSS Shared CI?

RADIUSS Shared CI is a templated CI infrastructure for GitLab designed specifically for LLNL open-source projects hosted on GitHub that need to run tests on Livermore Computing (LC) systems.

Rather than each project maintaining its own complex CI configuration, RADIUSS Shared CI provides a shared, reusable framework that projects can consume and customize. This approach:

  • Reduces maintenance burden - Bug fixes and improvements benefit all users

  • Ensures consistency - Common patterns across RADIUSS projects

  • Simplifies onboarding - New projects can adopt proven CI practices quickly

  • Enables specialization - Projects focus on their build/test scripts, not CI internals

The infrastructure is project-agnostic: you provide a command to build and test your project, and RADIUSS Shared CI helps you with orchestration, machine scheduling, GitHub integration, and reporting.

Component Architecture Model

Starting with v2025.12.0, RADIUSS Shared CI uses GitLab CI Components as its primary architecture (requires GitLab 17.0+).

What are Components?

GitLab CI Components are reusable, versioned, parameterized CI templates that can be consumed from a catalog. Think of them as “functions” for CI configuration:

  • Inputs: Type-checked parameters you provide

  • Outputs: Job templates and variables your pipeline can use

  • Versioning: Pin to specific versions (e.g., @v2026.02.2)

  • Discovery: Browse available components in GitLab’s CI/CD Catalog

Components vs Include-Based Approach

The legacy approach used GitLab’s include: project: with file paths. Components provide several advantages:

Feature

Legacy (Include-Based)

Components (Modern)

Versioning

ref: 'v2026.02.2' in each include

@v2026.02.2 in component path

Type Safety

No validation, runtime errors

Input validation at parse time

Documentation

Scattered in files/docs

Self-documenting in catalog

Reusability

Some template files had to be copied

Direct component consumption

Discoverability

Search GitHub/docs

Browse GitLab catalog UI

Parent vs Child Pipelines

RADIUSS Shared CI uses GitLab’s parent-child pipeline pattern to organize work efficiently:

Parent Pipeline (.gitlab-ci.yml)
├── Stage: prerequisites
│   ├── dane-up-check        (Machine availability check)
│   ├── matrix-up-check
│   └── tioga-up-check
│
└── Stage: build-and-test
    ├── Child: dane-build-and-test
    │   ├── Job: gcc-build
    │   ├── Job: clang-build
    │   └── Job: intel-build
    │
    ├── Child: matrix-build-and-test
    │   └── Job: clang-cuda-build
    │
    └── Child: tioga-build-and-test
        └── Job: rocm-build
Parent pipeline (defined in .gitlab-ci.yml):
  • Orchestrates the overall workflow

  • Checks machine availability

  • Triggers child pipelines

  • Reports to GitHub

Child pipelines (triggered by parent):
  • Run machine-specific jobs

  • Execute within shared allocations (SLURM/Flux)

  • Each reports independently to GitHub

  • Inherit variables from parent

This separation provides:

  • Clarity: Each machine’s jobs grouped together

  • Scalability: Add machines without cluttering main config

  • Parallelism: Child pipelines run concurrently

  • GitHub integration: One status per machine

Machine Abstraction Layer

RADIUSS Shared CI provides components for different LC machines, abstracting their scheduler differences to some extent:

SLURM (Dane, Matrix):

Uses salloc for shared allocations and srun for jobs.

Flux (Corona, Tioga, Tuolumne):

Uses flux alloc for shared allocations and flux run for jobs.

Each machine component handles:

  • Scheduler-specific allocation commands

  • Job execution within allocations

  • Resource management

  • Reproducer generation for local testing

As a user, you primarily interact through allocation parameters (e.g., shared_alloc, job_alloc) which may still be scheduler specific. The component handles scheduler details, e.g. how to share allocations.

Template Extension Pattern

RADIUSS Shared CI components export job templates that you extend to define your specific jobs:

# Component exports .job_on_dane template
# You extend it with your custom variables

gcc-build:
  extends: .job_on_dane
  variables:
    SPEC: "gcc@11.0.0"

clang-build:
  extends: .job_on_dane
  variables:
    SPEC: "clang@14.0.0"

Each template includes:

  • Scheduler commands: Allocation and job execution

  • Reproducer logic: Generates commands for local testing

  • Machine tags: Ensures jobs run on correct runners

  • Customization hooks: .custom_job for project-specific setup

This pattern enables:

  • DRY principle: Define once, reuse many times

  • Consistency: All jobs follow same structure

  • Flexibility: Override only what you need

Key Workflows

GitHub to GitLab Integration

RADIUSS Shared CI bridges GitHub (where code lives) and GitLab (where CI runs):

  1. Mirroring: GitHub repository mirrored to LC GitLab (a GitLab feature)

  2. Trigger: Push/PR triggers GitLab pipeline (after mirroring delay)

  3. Execution: Pipeline runs on LC machines

  4. Reporting: Status posted back to GitHub PR/commit (RADIUSS Shared CI adds custom statuses)

Machine Availability Checks

Before running expensive builds, RADIUSS Shared CI checks if machines are available using the lorenz status system:

dane-up-check:
  extends: [.dane, .machine-check]
  variables:
    ASSOCIATED_CHILD_PIPELINE: "dane-build-and-test"

If a machine is down:

  • Check job reports “unavailable” to GitHub (Users see clear status on GitHub).

  • Child pipeline is skipped (No wasted time on failed allocations).

Shared Allocations

For machines using SLURM/Flux, RADIUSS Shared CI can create a shared allocation for all jobs on that machine:

Shared Allocation (30 minutes, 1 node)
├── Job 1: gcc-build (5 minutes)
├── Job 2: clang-build (5 minutes)
├── Job 3: intel-build (5 minutes)
└── Job 4: gcc-debug (5 minutes)

This approach has trade-offs: - Simpler configuration: One allocation to manage - Harder to understand: Scheduler behavior is less transparent

Set shared_alloc: "OFF" to disable for machines where this doesn’t make sense.

Warning

On machines using Flux, manually triggering a job in a pipeline will result in the job being scheduled using the default queue instead of the CI queue. This can lead to long wait times for the job to start. The workaround is to first run the shared allocation job, and only then trigger the desired job. We are working on a better solution for this issue.

Key Terminology

See also

Glossary — full definitions for all terms used in this documentation.

Architecture Diagram

┌──────────────────────────────────────────────────────────────┐
│ GitHub Repository (LLNL/my-project)                          │
│  - Source code                                               │
│  - Pull requests                                             │
│  - Commit statuses ←────────────────────┐                    │
└────────────┬────────────────────────────│────────────────────┘
             │ (mirrored)                 │ (status reports)
             ↓                            │
┌───────────────────────────────────────┐ │
│ LC GitLab (lc.llnl.gov/gitlab)        │ │
│                                       │ │
│  .gitlab-ci.yml (Parent Pipeline)     │ │
│  ├─ Include: base-pipeline component  │ │
│  ├─ Include: utility components       │ │
│  └─ Machine pipeline triggers         │ │
│                                       │ │
│  ┌─────────────────────────────────┐  │ │
│  │ dane-up-check ──→ Available?    │────┤
│  └─────────────────────────────────┘  │ │
│           ↓ (if available)            │ │
│  ┌─────────────────────────────────┐  │ │
│  │ Child: dane-build-and-test      │  │ │
│  │  ├─ Component: dane-pipeline    │  │ │
│  │  ├─ Local: custom-jobs.yml      │  │ │
│  │  └─ Local: jobs/dane.yml        │  │ │
│  │                                 │  │ │
│  │     Jobs run on Dane:           │  │ │
│  │     ├─ gcc-build ────────────────────┤
│  │     ├─ clang-build ──────────────────┤
│  │     └─ intel-build ──────────────────┤
│  └─────────────────────────────────┘  │ │
│                                       │ │
│  (Similar for matrix, tioga, etc.)    │ │
└───────────────────────────────────────┘ │
                                          │
┌─────────────────────────────────────────┘
│
│  Status Update: "dane-build-and-test: ✓ Success"
└──→ GitHub PR shows check results

Next Steps

Now that you understand the core concepts, you can:

See also