Connecting a CoreWeave Slurm Cluster
This guide connects a CoreWeave Slurm cluster (SUNK - Slurm on Kubernetes) to ACTIVATE so your users can submit batch jobs, open desktop sessions, and run workflows on it.
There are two sides to it:
- Identity - point the cluster's identity cache (
nsscache) at ACTIVATE's SCIM API and add an SSH authorized-keys command, so ACTIVATE users, groups, and SSH keys resolve as real Linux accounts on the cluster. - Connection - register the cluster in ACTIVATE as an existing cluster.
Connecting CoreWeave's Kubernetes API instead?
If you want to manage the cluster's Kubernetes workloads through ACTIVATE rather than submit Slurm jobs, see Connecting CoreWeave (Kubernetes).
Prerequisites
- Organization admin permissions in ACTIVATE.
- SCIM provisioning enabled for your organization, plus a bearer token. Follow SCIM Provisioning first and keep the token and endpoint URL handy.
kubectlaccess to the cluster'stenant-slurmnamespace (via the kubeconfig from CoreWeave).- POSIX UIDs/GIDs and SSH public keys configured on your ACTIVATE users and groups - these are what get synchronized onto the cluster.
Point nsscache at ACTIVATE's SCIM API
CoreWeave's SUNK clusters resolve Linux identity through nsscache, which periodically syncs passwd, group, shadow, and sshkey maps from a source. We configure that source to be ACTIVATE's SCIM API, reading POSIX identity from the CoreWeave extension attributes.
Edit the nsscache-conf ConfigMap in the tenant-slurm namespace:
kubectl edit cm/nsscache-conf -n tenant-slurmUpdate it to match the following. The two values you must set for your organization are scim_base_url (your SCIM endpoint) and scim_users_parameters (which requests the CoreWeave user extension):
apiVersion: v1
data:
nsscache.conf: |
[DEFAULT]
cache=files
files_cache_filename_suffix=cache
files_dir=/etc/nsscache
maps=passwd,shadow,group,sshkey
scim_base_url=https://<platform-host>/api/organizations/<organization>/scim/v2
scim_groups_endpoint=Groups
scim_groups_parameters=excludeInactiveUsers=true
scim_users_endpoint=Users
scim_users_parameters=attributes=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser
source=scim
timestamp_dir=/var/lib/nsscache
[group]
scim_path_gid=sunkPosixGroupId
scim_path_groupname=sunkPosixGroupName
scim_path_username=members/sunkPosixUsername
[passwd]
scim_default_shell=/bin/bash
scim_override_home_directory=/mnt/home/%%u
scim_path_gid=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixGroupId
scim_path_home_directory=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPreferredHomeDirectory
scim_path_login_shell=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkLoginShell
scim_path_uid=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUserId
scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
[shadow]
scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
[sshkey]
scim_path_ssh_keys=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkSshKeys
scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
nsswitch.conf: |
group: files cache
passwd: files cacheWhat the key settings do:
scim_base_url- your organization's SCIM endpoint, shown on the SCIM Provisioning page (https://<platform-host>/api/organizations/<organization>/scim/v2).scim_users_parameters=attributes=...CoreWeaveUser- requests the CoreWeave user extension. ACTIVATE omits that block by default, so without this parameter the POSIX UID/GID, shell, home directory, and SSH keys would be missing.scim_groups_parameters=excludeInactiveUsers=true- drops disabled ACTIVATE accounts from group membership, so deactivated users stop resolving on the cluster.scim_override_home_directory=/mnt/home/%%u- forces home directories under/mnt/home. This overrides thesunkPreferredHomeDirectoryvalue from SCIM; set it to wherever home directories are mounted on your cluster.
Provide the bearer token
The SCIM API requires a bearer token on every request. On a SUNK cluster, nsscache reads it from the nsscache-scim-secret Secret in the tenant-slurm namespace - not from the ConfigMap above. This Secret is provisioned with the cluster; update it with the token you minted in SCIM Provisioning:
kubectl edit secret nsscache-scim-secret -n tenant-slurmSecret values are base64-encoded, so encode the token before pasting it into the Secret's data field:
printf '%s' '<your-scim-token>' | base64Configure the authorized keys command
So that sshd can authorize logins using each user's ACTIVATE SSH keys, install an AuthorizedKeysCommand that fetches them through the pw CLI.
Save the following as slurm-nsscache-authorized-keys-command.yaml, setting PLATFORM_HOST to your platform URL:
apiVersion: v1
kind: ConfigMap
metadata:
name: slurm-nsscache-authorized-keys-command
namespace: tenant-slurm
data:
# Filename kept as .py for drop-in compatibility with the existing
# AuthorizedKeysCommand path in sshd_config. The shebang determines the
# interpreter, so bash content here is fine.
nsscache-authorized-keys-command.py: |
#!/bin/bash
# AuthorizedKeysCommand: fetch the user's SSH public keys via the pw CLI.
# Installs pw on first invocation; subsequent calls reuse the cached binary.
set -e
PLATFORM_HOST="${PLATFORM_HOST:-https://<platform-host>}"
PW_INSTALL_DIR="${PW_INSTALL_DIR:-/usr/local/bin}"
PW_BIN="$PW_INSTALL_DIR/pw"
# /tmp is always writable, even by `nobody`. The lock only needs to exist
# during one install attempt, so ephemeral storage is fine.
INSTALL_LOCK="${PW_INSTALL_LOCK:-/tmp/pw-install.lock}"
locate_pw() {
if [ -x "$PW_BIN" ]; then
return
fi
local found
found="$(command -v pw 2>/dev/null || true)"
if [ -n "$found" ] && [ -x "$found" ]; then
PW_BIN="$found"
fi
}
locate_pw
if [ ! -x "$PW_BIN" ]; then
# flock prevents concurrent sshd invocations from racing the install.
(
flock -x 9
if [ ! -x "/usr/local/bin/pw" ] && ! command -v pw >/dev/null 2>&1; then
# Send install output to stderr so it doesn't end up in the
# keys stream sshd reads from stdout.
curl -fsSL https://activate.parallel.works/cli/install.sh \
| bash -s -- --to "$PW_INSTALL_DIR" 1>&2
fi
) 9>"$INSTALL_LOCK"
locate_pw
fi
if [ ! -x "$PW_BIN" ]; then
echo "pw CLI not found and install failed" >&2
exit 1
fi
# Validate username contains only safe characters.
if [[ ! "$1" =~ ^[a-zA-Z0-9._-]+$ ]]; then
exit 1
fi
exec "$PW_BIN" ssh-public-keys --platform-host "$PLATFORM_HOST" "$1"Apply it:
kubectl apply -f slurm-nsscache-authorized-keys-command.yamlA running login pod won't pick up the new ConfigMap until it restarts. Delete the Slurm login pod so it's recreated with the updated command mounted (find it with kubectl get pods -n tenant-slurm):
kubectl delete pod -n tenant-slurm <login-pod>On the first SSH login after the pod comes back, the script installs the pw CLI if it isn't already present, then calls pw ssh-public-keys to return the user's keys for sshd to authorize.
Confirm it works by execing into the recreated login pod and running the command with a username:
kubectl exec -it -n tenant-slurm <login-pod> -- \
/usr/local/share/nsscache-authorized-keys-command.py <username>When everything is wired up correctly, it prints that user's authorized SSH public keys - for example, nsscache-authorized-keys-command.py mcquade returns mcquade's keys.
Register the cluster in ACTIVATE
With identity resolving on the cluster, connect it like any other on-premises cluster:
- Follow Configuring Existing Clusters to create the cluster definition.
- Set the Scheduler Type to Slurm.
- Enter the Cluster Login Node (the cluster's login/jump host) and your Username.
You can use the __USER__ token in any field and ACTIVATE substitutes the logged-in user's username automatically.
Verify
First, confirm the identity cache is populating. nsscache writes each synced map into a slurm-nsscache-<map> Secret in tenant-slurm. Decode the passwd cache to check that your ACTIVATE users are landing on the cluster with the expected UID, GID, shell, and home directory:
kubectl get secret slurm-nsscache-passwd -n tenant-slurm -o yaml \
| yq '.data."passwd.cache"' | base64 -dThe group, shadow, and sshkey maps populate the parallel slurm-nsscache-group, slurm-nsscache-shadow, and slurm-nsscache-sshkey Secrets (keyed group.cache, shadow.cache, and sshkey.cache). An empty or stale cache usually means the SCIM URL, the bearer token, or the attributes parameter is wrong.
Then confirm the end-to-end connection:
- From the Sessions tab, power on the cluster and confirm the connection succeeds.
- Confirm your account resolves on the cluster (
id <username>should show the POSIX UID/GID synced from ACTIVATE). - Submit a test job and confirm it runs. See Submitting Jobs via Slurm.