Connecting a CoreWeave Slurm Cluster

This guide connects a CoreWeave Slurm cluster (SUNK - Slurm on Kubernetes) to ACTIVATE so your users can submit batch jobs, open desktop sessions, and run workflows on it.

There are two sides to it:

Identity - point the cluster's identity cache (nsscache) at ACTIVATE's SCIM API and add an SSH authorized-keys command, so ACTIVATE users, groups, and SSH keys resolve as real Linux accounts on the cluster.
Connection - register the cluster in ACTIVATE as an existing cluster.

Connecting CoreWeave's Kubernetes API instead?

If you want to manage the cluster's Kubernetes workloads through ACTIVATE rather than submit Slurm jobs, see Connecting CoreWeave (Kubernetes).

Prerequisites

Organization admin permissions in ACTIVATE.
SCIM provisioning enabled for your organization, plus a bearer token. Follow SCIM Provisioning first and keep the token and endpoint URL handy.
kubectl access to the cluster's tenant-slurm namespace (via the kubeconfig from CoreWeave).
POSIX UIDs/GIDs and SSH public keys configured on your ACTIVATE users and groups - these are what get synchronized onto the cluster.

Point nsscache at ACTIVATE's SCIM API

CoreWeave's SUNK clusters resolve Linux identity through nsscache, which periodically syncs passwd, group, shadow, and sshkey maps from a source. We configure that source to be ACTIVATE's SCIM API, reading POSIX identity from the CoreWeave extension attributes.

Edit the nsscache-conf ConfigMap in the tenant-slurm namespace:

kubectl edit cm/nsscache-conf -n tenant-slurm

Update it to match the following. The two values you must set for your organization are scim_base_url (your SCIM endpoint) and scim_users_parameters (which requests the CoreWeave user extension):

apiVersion: v1
data:
  nsscache.conf: |
    [DEFAULT]
    cache=files
    files_cache_filename_suffix=cache
    files_dir=/etc/nsscache
    maps=passwd,shadow,group,sshkey
    scim_base_url=https://<platform-host>/api/organizations/<organization>/scim/v2
    scim_groups_endpoint=Groups
    scim_groups_parameters=excludeInactiveUsers=true
    scim_users_endpoint=Users
    scim_users_parameters=attributes=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser
    source=scim
    timestamp_dir=/var/lib/nsscache
    [group]
    scim_path_gid=sunkPosixGroupId
    scim_path_groupname=sunkPosixGroupName
    scim_path_username=members/sunkPosixUsername
    [passwd]
    scim_default_shell=/bin/bash
    scim_override_home_directory=/mnt/home/%%u
    scim_path_gid=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixGroupId
    scim_path_home_directory=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPreferredHomeDirectory
    scim_path_login_shell=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkLoginShell
    scim_path_uid=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUserId
    scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
    [shadow]
    scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
    [sshkey]
    scim_path_ssh_keys=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkSshKeys
    scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
  nsswitch.conf: |
    group: files cache
    passwd: files cache

What the key settings do:

scim_base_url - your organization's SCIM endpoint, shown on the SCIM Provisioning page (https://<platform-host>/api/organizations/<organization>/scim/v2).
scim_users_parameters=attributes=...CoreWeaveUser - requests the CoreWeave user extension. ACTIVATE omits that block by default, so without this parameter the POSIX UID/GID, shell, home directory, and SSH keys would be missing.
scim_groups_parameters=excludeInactiveUsers=true - drops disabled ACTIVATE accounts from group membership, so deactivated users stop resolving on the cluster.
scim_override_home_directory=/mnt/home/%%u - forces home directories under /mnt/home. This overrides the sunkPreferredHomeDirectory value from SCIM; set it to wherever home directories are mounted on your cluster.

Provide the bearer token

The SCIM API requires a bearer token on every request. On a SUNK cluster, nsscache reads it from the nsscache-scim-secret Secret in the tenant-slurm namespace - not from the ConfigMap above. This Secret is provisioned with the cluster; update it with the token you minted in SCIM Provisioning:

kubectl edit secret nsscache-scim-secret -n tenant-slurm

Secret values are base64-encoded, so encode the token before pasting it into the Secret's data field:

printf '%s' '<your-scim-token>' | base64

Configure the authorized keys command

So that sshd can authorize logins using each user's ACTIVATE SSH keys, install an AuthorizedKeysCommand that fetches them through the pw CLI.

Save the following as slurm-nsscache-authorized-keys-command.yaml, setting PLATFORM_HOST to your platform URL:

apiVersion: v1
kind: ConfigMap
metadata:
  name: slurm-nsscache-authorized-keys-command
  namespace: tenant-slurm
data:
  # Filename kept as .py for drop-in compatibility with the existing
  # AuthorizedKeysCommand path in sshd_config. The shebang determines the
  # interpreter, so bash content here is fine.
  nsscache-authorized-keys-command.py: |
    #!/bin/bash
    # AuthorizedKeysCommand: fetch the user's SSH public keys via the pw CLI.
    # Installs pw on first invocation; subsequent calls reuse the cached binary.
    set -e
 
    PLATFORM_HOST="${PLATFORM_HOST:-https://<platform-host>}"
    PW_INSTALL_DIR="${PW_INSTALL_DIR:-/usr/local/bin}"
    PW_BIN="$PW_INSTALL_DIR/pw"
    # /tmp is always writable, even by `nobody`. The lock only needs to exist
    # during one install attempt, so ephemeral storage is fine.
    INSTALL_LOCK="${PW_INSTALL_LOCK:-/tmp/pw-install.lock}"
 
    locate_pw() {
        if [ -x "$PW_BIN" ]; then
            return
        fi
        local found
        found="$(command -v pw 2>/dev/null || true)"
        if [ -n "$found" ] && [ -x "$found" ]; then
            PW_BIN="$found"
        fi
    }
 
    locate_pw
    if [ ! -x "$PW_BIN" ]; then
        # flock prevents concurrent sshd invocations from racing the install.
        (
            flock -x 9
            if [ ! -x "/usr/local/bin/pw" ] && ! command -v pw >/dev/null 2>&1; then
                # Send install output to stderr so it doesn't end up in the
                # keys stream sshd reads from stdout.
                curl -fsSL https://activate.parallel.works/cli/install.sh \
                    | bash -s -- --to "$PW_INSTALL_DIR" 1>&2
            fi
        ) 9>"$INSTALL_LOCK"
        locate_pw
    fi
 
    if [ ! -x "$PW_BIN" ]; then
        echo "pw CLI not found and install failed" >&2
        exit 1
    fi
 
    # Validate username contains only safe characters.
    if [[ ! "$1" =~ ^[a-zA-Z0-9._-]+$ ]]; then
        exit 1
    fi
 
    exec "$PW_BIN" ssh-public-keys --platform-host "$PLATFORM_HOST" "$1"

Apply it:

kubectl apply -f slurm-nsscache-authorized-keys-command.yaml

A running login pod won't pick up the new ConfigMap until it restarts. Delete the Slurm login pod so it's recreated with the updated command mounted (find it with kubectl get pods -n tenant-slurm):

kubectl delete pod -n tenant-slurm <login-pod>

On the first SSH login after the pod comes back, the script installs the pw CLI if it isn't already present, then calls pw ssh-public-keys to return the user's keys for sshd to authorize.

Confirm it works by execing into the recreated login pod and running the command with a username:

kubectl exec -it -n tenant-slurm <login-pod> -- \
  /usr/local/share/nsscache-authorized-keys-command.py <username>

When everything is wired up correctly, it prints that user's authorized SSH public keys - for example, nsscache-authorized-keys-command.py mcquade returns mcquade's keys.

Register the cluster in ACTIVATE

With identity resolving on the cluster, connect it like any other on-premises cluster:

Follow Configuring Existing Clusters to create the cluster definition.
Set the Scheduler Type to Slurm.
Enter the Cluster Login Node (the cluster's login/jump host) and your Username.

You can use the __USER__ token in any field and ACTIVATE substitutes the logged-in user's username automatically.

Verify

First, confirm the identity cache is populating. nsscache writes each synced map into a slurm-nsscache-<map> Secret in tenant-slurm. Decode the passwd cache to check that your ACTIVATE users are landing on the cluster with the expected UID, GID, shell, and home directory:

kubectl get secret slurm-nsscache-passwd -n tenant-slurm -o yaml \
  | yq '.data."passwd.cache"' | base64 -d

The group, shadow, and sshkey maps populate the parallel slurm-nsscache-group, slurm-nsscache-shadow, and slurm-nsscache-sshkey Secrets (keyed group.cache, shadow.cache, and sshkey.cache). An empty or stale cache usually means the SCIM URL, the bearer token, or the attributes parameter is wrong.

Then confirm the end-to-end connection:

From the Sessions tab, power on the cluster and confirm the connection succeeds.
Confirm your account resolves on the cluster (id <username> should show the POSIX UID/GID synced from ACTIVATE).
Submit a test job and confirm it runs. See Submitting Jobs via Slurm.