# Connecting a CoreWeave Slurm Cluster

> Source: https://parallelworks.com/docs/compute/connecting-coreweave-slurm

# Connecting a CoreWeave Slurm Cluster

This guide connects a [CoreWeave](https://www.coreweave.com/) Slurm cluster (SUNK - Slurm on Kubernetes) to ACTIVATE so your users can submit batch jobs, open desktop sessions, and run workflows on it.

There are two sides to it:

1. **Identity** - point the cluster's identity cache (`nsscache`) at ACTIVATE's SCIM API and add an SSH authorized-keys command, so ACTIVATE users, groups, and SSH keys resolve as real Linux accounts on the cluster.
2. **Connection** - register the cluster in ACTIVATE as an [existing cluster](/docs/compute/configuring-existing-clusters).

:::info Connecting CoreWeave's Kubernetes API instead?
If you want to manage the cluster's Kubernetes workloads through ACTIVATE rather than submit Slurm jobs, see [Connecting CoreWeave (Kubernetes)](/docs/kubernetes/connecting-clusters/coreweave).
:::

## Prerequisites

- **Organization admin permissions** in ACTIVATE.
- **SCIM provisioning enabled** for your organization, plus a **bearer token**. Follow [SCIM Provisioning](/docs/organization-admin/scim) first and keep the token and endpoint URL handy.
- **`kubectl` access** to the cluster's `tenant-slurm` namespace (via the kubeconfig from CoreWeave).
- POSIX UIDs/GIDs and SSH public keys configured on your ACTIVATE users and groups - these are what get synchronized onto the cluster.

## Point nsscache at ACTIVATE's SCIM API

CoreWeave's SUNK clusters resolve Linux identity through `nsscache`, which periodically syncs `passwd`, `group`, `shadow`, and `sshkey` maps from a source. We configure that source to be ACTIVATE's SCIM API, reading POSIX identity from the [CoreWeave extension attributes](/docs/organization-admin/scim#coreweave-extension-attributes).

Edit the `nsscache-conf` ConfigMap in the `tenant-slurm` namespace:

```bash
kubectl edit cm/nsscache-conf -n tenant-slurm
```

Update it to match the following. The two values you must set for your organization are **`scim_base_url`** (your SCIM endpoint) and **`scim_users_parameters`** (which requests the CoreWeave user extension):

```yaml
apiVersion: v1
data:
  nsscache.conf: |
    [DEFAULT]
    cache=files
    files_cache_filename_suffix=cache
    files_dir=/etc/nsscache
    maps=passwd,shadow,group,sshkey
    scim_base_url=https://<platform-host>/api/organizations/<organization>/scim/v2
    scim_groups_endpoint=Groups
    scim_groups_parameters=excludeInactiveUsers=true
    scim_users_endpoint=Users
    scim_users_parameters=attributes=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser
    source=scim
    timestamp_dir=/var/lib/nsscache
    [group]
    scim_path_gid=sunkPosixGroupId
    scim_path_groupname=sunkPosixGroupName
    scim_path_username=members/sunkPosixUsername
    [passwd]
    scim_default_shell=/bin/bash
    scim_override_home_directory=/mnt/home/%%u
    scim_path_gid=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixGroupId
    scim_path_home_directory=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPreferredHomeDirectory
    scim_path_login_shell=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkLoginShell
    scim_path_uid=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUserId
    scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
    [shadow]
    scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
    [sshkey]
    scim_path_ssh_keys=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkSshKeys
    scim_path_username=urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername
  nsswitch.conf: |
    group: files cache
    passwd: files cache
```

What the key settings do:

- **`scim_base_url`** - your organization's SCIM endpoint, shown on the [SCIM Provisioning](/docs/organization-admin/scim) page (`https://<platform-host>/api/organizations/<organization>/scim/v2`).
- **`scim_users_parameters=attributes=...CoreWeaveUser`** - requests the CoreWeave user extension. ACTIVATE omits that block by default, so without this parameter the POSIX UID/GID, shell, home directory, and SSH keys would be missing.
- **`scim_groups_parameters=excludeInactiveUsers=true`** - drops disabled ACTIVATE accounts from group membership, so deactivated users stop resolving on the cluster.
- **`scim_override_home_directory=/mnt/home/%%u`** - forces home directories under `/mnt/home`. This overrides the `sunkPreferredHomeDirectory` value from SCIM; set it to wherever home directories are mounted on your cluster.

### Provide the bearer token

The SCIM API requires a bearer token on every request. On a SUNK cluster, nsscache reads it from the `nsscache-scim-secret` Secret in the `tenant-slurm` namespace - not from the ConfigMap above. This Secret is provisioned with the cluster; update it with the token you minted in [SCIM Provisioning](/docs/organization-admin/scim#bearer-tokens):

```bash
kubectl edit secret nsscache-scim-secret -n tenant-slurm
```

Secret values are base64-encoded, so encode the token before pasting it into the Secret's data field:

```bash
printf '%s' '<your-scim-token>' | base64
```

## Configure the authorized keys command

So that `sshd` can authorize logins using each user's ACTIVATE SSH keys, install an `AuthorizedKeysCommand` that fetches them through the `pw` CLI.

Save the following as `slurm-nsscache-authorized-keys-command.yaml`, **setting `PLATFORM_HOST`** to your platform URL:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: slurm-nsscache-authorized-keys-command
  namespace: tenant-slurm
data:
  # Filename kept as .py for drop-in compatibility with the existing
  # AuthorizedKeysCommand path in sshd_config. The shebang determines the
  # interpreter, so bash content here is fine.
  nsscache-authorized-keys-command.py: |
    #!/bin/bash
    # AuthorizedKeysCommand: fetch the user's SSH public keys via the pw CLI.
    # Installs pw on first invocation; subsequent calls reuse the cached binary.
    set -e

    PLATFORM_HOST="${PLATFORM_HOST:-https://<platform-host>}"
    PW_INSTALL_DIR="${PW_INSTALL_DIR:-/usr/local/bin}"
    PW_BIN="$PW_INSTALL_DIR/pw"
    # /tmp is always writable, even by `nobody`. The lock only needs to exist
    # during one install attempt, so ephemeral storage is fine.
    INSTALL_LOCK="${PW_INSTALL_LOCK:-/tmp/pw-install.lock}"

    locate_pw() {
        if [ -x "$PW_BIN" ]; then
            return
        fi
        local found
        found="$(command -v pw 2>/dev/null || true)"
        if [ -n "$found" ] && [ -x "$found" ]; then
            PW_BIN="$found"
        fi
    }

    locate_pw
    if [ ! -x "$PW_BIN" ]; then
        # flock prevents concurrent sshd invocations from racing the install.
        (
            flock -x 9
            if [ ! -x "/usr/local/bin/pw" ] && ! command -v pw >/dev/null 2>&1; then
                # Send install output to stderr so it doesn't end up in the
                # keys stream sshd reads from stdout.
                curl -fsSL https://activate.parallel.works/cli/install.sh \
                    | bash -s -- --to "$PW_INSTALL_DIR" 1>&2
            fi
        ) 9>"$INSTALL_LOCK"
        locate_pw
    fi

    if [ ! -x "$PW_BIN" ]; then
        echo "pw CLI not found and install failed" >&2
        exit 1
    fi

    # Validate username contains only safe characters.
    if [[ ! "$1" =~ ^[a-zA-Z0-9._-]+$ ]]; then
        exit 1
    fi

    exec "$PW_BIN" ssh-public-keys --platform-host "$PLATFORM_HOST" "$1"
```

Apply it:

```bash
kubectl apply -f slurm-nsscache-authorized-keys-command.yaml
```

A running login pod won't pick up the new ConfigMap until it restarts. Delete the Slurm login pod so it's recreated with the updated command mounted (find it with `kubectl get pods -n tenant-slurm`):

```bash
kubectl delete pod -n tenant-slurm <login-pod>
```

On the first SSH login after the pod comes back, the script installs the `pw` CLI if it isn't already present, then calls `pw ssh-public-keys` to return the user's keys for `sshd` to authorize.

Confirm it works by execing into the recreated login pod and running the command with a username:

```bash
kubectl exec -it -n tenant-slurm <login-pod> -- \
  /usr/local/share/nsscache-authorized-keys-command.py <username>
```

When everything is wired up correctly, it prints that user's authorized SSH public keys - for example, `nsscache-authorized-keys-command.py mcquade` returns mcquade's keys.

## Register the cluster in ACTIVATE

With identity resolving on the cluster, connect it like any other on-premises cluster:

1. Follow [Configuring Existing Clusters](/docs/compute/configuring-existing-clusters) to create the cluster definition.
2. Set the **Scheduler Type** to **Slurm**.
3. Enter the **Cluster Login Node** (the cluster's login/jump host) and your **Username**.

You can use the `__USER__` token in any field and ACTIVATE substitutes the logged-in user's username automatically.

## Verify

First, confirm the identity cache is populating. nsscache writes each synced map into a `slurm-nsscache-<map>` Secret in `tenant-slurm`. Decode the `passwd` cache to check that your ACTIVATE users are landing on the cluster with the expected UID, GID, shell, and home directory:

```bash
kubectl get secret slurm-nsscache-passwd -n tenant-slurm -o yaml \
  | yq '.data."passwd.cache"' | base64 -d
```

The `group`, `shadow`, and `sshkey` maps populate the parallel `slurm-nsscache-group`, `slurm-nsscache-shadow`, and `slurm-nsscache-sshkey` Secrets (keyed `group.cache`, `shadow.cache`, and `sshkey.cache`). An empty or stale cache usually means the SCIM URL, the bearer token, or the `attributes` parameter is wrong.

Then confirm the end-to-end connection:

1. From the **Sessions** tab, power on the cluster and confirm the connection succeeds.
2. Confirm your account resolves on the cluster (`id <username>` should show the POSIX UID/GID synced from ACTIVATE).
3. Submit a test job and confirm it runs. See [Submitting Jobs via Slurm](/docs/compute/submitting-jobs).
