Skip to main content

Monitoring Your Work

The PW platform features several data monitoring modules as well as a monitor dashboard to track your work.

The Home Page

The Home page displays important data at a glance.

Workflow Monitor

This module shows a snapshot of your most recently run workflows.

Screenshot of the Workflow Monitor module.

Data columns for your workflows include:

  • ID
  • Workflow
  • Status
  • Submitted
  • Runtime (Minutes)

If you click a workflow’s ID number or its name in the Workflow column, you’ll be taken to a more detailed view on the Workflows page.

A workflow’s Status can be Started, Running, Completed, Canceled, or Error.

The Workflow Monitor also includes three important action buttons:

  • Cancel Run
  • Run Workflow Again
  • View Active Workflow

If you click the icon to re-run a workflow, you’ll be taken to the workflow’s configuration form on the Workflows page.

Resource Monitor

This module shows a summary of your node usage across your resources.

Screenshot of the Resource Monitor module.

Different resources are shown as white, green, and blue lines. You can mouse over data points to see corresponding resource names.

Storage Resources

This module shows your favorited storage resources and their status (active, starting, or stopped).

Screenshot of the Storage Resources module.

If you click the information icon, you’ll be taken to the storage’s configuration form on the Storage page.

My Computing Resources

This module shows your favorited resources and their status (active nodes and requested nodes if active or stopped if inactive).

Screenshot of the My Compute Resources module.

The navy bar reflects the number of maximum nodes a resource can have. In the screenshot above, the resource has the controller node and a partition that’s configured for 10 maximum nodes, for a total of 11 possible nodes.

The green bar reflects the number of active nodes on a compute resource. In the screenshot above, the resource is active but not running any jobs, so there is 1 active node (the controller).

If you click the gear icon, you’ll be taken to the resource’s configuration form on the Resources page.

If you click the information icon, you’ll be taken to the resource’s Sessions tab on the Resources page.

The Compute Page

When you click on a compute resource here, you’ll be taken to the resource’s Sessions tab.

Screenshot of the Sessions tab on the Compute page after clicking a resource.

The monitor at the top of the page mirrors the My Computing Resources module from the Compute page.

Active Nodes

This module shows details about nodes that are currently running on your resource, including:

  • Node ID
  • Public IP Address
  • Private IP Address
  • Node Runtime

Sessions

This module shows details about all your sessions with this resource, including:

  • Session
  • Status
  • Health Check
  • Creation Time
  • Deletion Time
  • Dashboards

If you click a number in the Session column, the Logs module will change to reflect information from that session.

If you click the icon in the Dashboard column, you’ll be taken to the cost dashboard. If you click the icon, you’ll be taken to the monitor dashboard.

Logs

This module features five tabs of detailed information, including:

  • Provision
  • Storages
  • Deletion
  • Scheduler
  • Health Check

The Provision log shows your resource’s provisioning process. If a resource has been provisioned successfully, you’ll see the message Tunnel established successfully, Controller IP is 12.345.678.90.

If a resource fails to provision, you’ll see an Error message. In that case, we suggest adjusting your resource’s configuration, then starting the resource again.

The Storages log shows the provisioning process for attached ephemeral storage resources, if any.

The Deletion log shows your resource’s deletion process after you turn it off. If a resource has been deleted successfully, you’ll see a message like 2023-07-10T17:15:34.721Z-INFO: delete() finished.

The Scheduler log shows your resource’s scheduled and completed Slurm jobs, if any.

The Health Check log shows details for an automated script that checks for provisioning and connection errors.

You can save any of these logs by clicking the Download button.

The Workflows Page

When you click on a workflow here, you’ll be taken to the workflow’s Jobs tab.

Screenshot of the Jobs tab on the Workflows page after clicking a workflow.

The Workflow Monitor here mirrors the Workflow Monitor from the Compute page.

The Job logs module shows details about specific workflow sessions. When you navigate to this page, this module shows No log found until you click a job number in the ID column.

If you navigate to this page after clicking the eye icon on a running workflow, the Job logs module will show details for that active session.

You can save workflow logs by clicking the Download button.

The Storage Page

When you click on a persistent storage resource here, you’ll be taken to the storage’s Sessions tab.

Screenshot of the Sessions tab on the Storage page after clicking a workflow.

The Sessions module shows details about all your sessions with this storage resource, including:

  • Session
  • Status
  • Creation Time
  • Deletion Time

The Logs module shows details about specific storage sessions. When you navigate to this page, the Provision and Deletion tabs here show Log not found until you click a number in the Session column.

You can save storage logs by clicking the Download button.

Ephemeral Storage

Please note that ephemeral storage resources don’t have this page because they’re created and destroyed with a resource.

You can see more details about ephemeral storage resources by navigating to their attached resource and clicking on the Storages tab of the Logs module.

The Monitor Page

When you navigate to the Monitor page, there are two tabs: Instances and Dashboard.

Instances

The Instances tab is the landing page for monitoring your active and deleted clusters within the last hour.

Each instance listed here includes the following data:

  • Pool
  • Session
  • Region
  • Number of Running Instances
  • Started
  • Deleted
  • State

If a cluster is active, its State will show Running in yellow.

Screenshot of a running cluster in the Instances tab on the Monitor page.

If a cluster has recently been shut down, its State will show Deleted in blue.

Screenshot of a deleted cluster in the Instances tab on the Monitor page.

If you haven’t started or stopped a cluster within the last hour, the Instances tab will show the message No instances found.

Screenshot of a blank Instances tab on the Monitor page.

Dashboard

The monitor dashboard provides an overview of your work on the platform.

Screenshot of overview for the Dashboard tab on the Monitor page.

By default, the monitor dashboard displays the following data modules. You can change the view at any time; for more information, please see Filters and Layout below.

Note

Please note that you must use the Pool filter to select a resource before the monitor dashboard will display data.

Graphs

Average Lustre Filesystem GB

This graph shows how many gigabytes (GB) a resource has used for Lustre storage.

Please note that Lustre is optional, so this graph will not display any data if you haven’t used a Lustre storage resource (as seen in the screenshot above).

Load and Utilization

This graph shows a summary of a resource’s processing and memory usage.

You can mouse over the lines to see detailed information for:

  • Average CPU User
  • Average CPU System
  • Average Memory Used
  • Average Disk Used

This graph is most useful when looking at specific sessions for a resource.

Memory GB

This graph shows how many gigabytes (GB) a resource has used for memory. This type of memory is similar to RAM on a personal computer.

IO KB/s

This graph shows a resource’s data input and output in kilobytes per second (KB/s).

Nodes

This graph shows how many nodes a resource has used.

The Nodes graph is especially useful when paired with the other data modules. For example, you could adjust the number of nodes on your resource and check the IO KB/s graph to maximize your nodes’ efficiency for your workload.

Root Filesystem GB

This graph shows a resource’s root filesystem in gigabytes (GB). This type of memory is similar to a hard drive on a personal computer.

Tables

Worker Table

This table shows a summary of a resource’s history. You can click any of the fields at the top of the table to sort the data. Fields include:

  • Hostname
  • Project
  • Username
  • Private IP Address
  • Session Number
  • Public IP Address
  • Pool Name
  • Last Active
  • CPU User
  • Disk Used (Percentage)
  • Memory Used (Percentage)
  • Created At

Slurm Jobs

This table shows a summary of a resource’s jobs that have been submitted via Slurm. You can click any of the fields at the top of the table to sort the data. Field include:

  • State
  • Job ID
  • Project
  • Job Name
  • Username
  • Start Time
  • Cluster
  • End Time
  • Elapsed Seconds
  • Nodes
  • CPUs

Customizing the Monitor Dashboard

You can customize the monitor dashboard by using different filters or changing the layout of the page.

Filters

The monitor dashboard includes the following options for filtering data:

  • Time
  • User
  • Pool
  • Session

Two filters must have options selected: Pool and Time. These filters are pinned to the top of the Monitor page. You can click either of these filters to change them.

To add additional filters, click Filter Options and select any filter from the list. Next, use the dropdown menu to select the filter parameter. All filter dropdown menus include a search bar for quickly finding parameters.

Please note that some filters are conditional. For example, you must select a Pool before you can select a Session.

Layout

You can change the layout of the monitor dashboard at any time. Your changes will not affect other users in your organization.

Click Options, then Unlock Layout.

Screenshot of the user clicking Unlock Layout in the Options dropdown menu.

When the monitor dashboard is in editing mode, a bracket will appear in the bottom-right corner for each data module.

Screenshot of the monitoring Dashboard with the resizing brackets circled because they are cute and tiny.

Drag and drop modules to change their positions on the page.

To resize a module, click the bracket in the bottom-right corner and drag vertically or horizontally.

Click the delete icon to remove a module from the page.

When you’re done making changes, click Options > Save Layout, then Lock Layout. Your cost dashboard’s layout will remain in this state until you make further changes.

Click Options > Reset Layout, then Save Layout to revert the page to its default state.

Printing Data

You can save the data from the cost dashboard by downloading a copy.

Click Options, then Print.

Screenshot of the user selecting Print in the Options dropdown menu.

A Print window will appear. Select the option for Save as PDF. Click Save.

The cost dashboard page will be downloaded as a PDF.