Creating Workflows

If you want a workflow that's not pre-built in the PW marketplace, you can create a local or remote workflow. Both types allow for importing files from an external repository, and they can be added to the PW Marketplace so that other users in your organization can access them.

Local Workflows

This method imports files from GitHub and gives you control over if/when your files are updated, regardless of when changes are saved on GitHub. You can also use this approach if you want to edit workflow files on the PW platform independent of GitHub.

Create a New Workflow

On the PW platform, navigate to the Workflows page.

Screenshot of the user clicking Workflows in the navigation bar.

Click + Add Workflow.

Screenshot of the user clicking the Add Workflow button.

On the next page, configure your workflow by entering a Workflow Name and selecting any workflow type. Optionally, enter a Short description and Tags.

Click Add workflow.

Screenshot of the user selecting a GitHub workflow and clicking Add Workflow.

Change or Remove Default Files

When you create a new workflow, you'll find placeholder files within your PW account. These files can serve as examples for you to develop your own workflows. Feel free to modify them according to your needs, or if you prefer, you can delete them.

You can delete files by right-clicking on them in the IDE and selecting Delete, or you can use a terminal with the following steps.

Click the IDE icon to expand the IDE.

Screenshot of the user clicking the IDE icon.

In the IDE, click Terminal then New Terminal.

Screenshot of the user clicking New Terminal in the Terminal dropdown menu.

A new terminal will open on the bottom half of the screen.

Screenshot of a new terminal on the PW platform.

Enter the command cd to navigate to your new workflow’s folder (in this case, workflow2):

demo@pw-user-demo:/pw$ cd workflows/workflow2
demo@pw-user-demo:/pw/workflows/workflow1$

Enter the command ls to see the files in your workflow’s folder:

demo@pw-user-demo:~/pw/workflows/workflow2$ ls
github.json

Enter the command rm * to delete all the file(s) in that directory. Enter ls to confirm the folder is now empty.

Add Files

While still in your workflow’s folder, you can create important workflow files.

Alternatively, you can clone a repository with the command git clone <repoURL> .(including the trailing period). In this example, we used the repository for the PW workflow SSH Bash Demo, and entered git clone https://github.com/parallelworks/workflow_tutorial.git ..

This process will create a copy of the repository files in your containerized PW workspace (i.e. the IDE). Your terminal will display the following message:

Cloning into '.'...
remote: Enumerating objects: 123, done.
remote: Counting objects: 100% (32/32), done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 123 (delta 24), reused 24 (delta 21), pack-reused 91
Receiving objects: 100% (123/123), 41.25 KiB | 1.65 MiB/s, done.
Resolving deltas: 100% (60/60), done.
demo@pw-user-demo:~/pw/workflows/workflow2$

main.sh can be anywhere inside a workflow's directory.

However, if you're creating a bash workflow, workflow.xml must be in the top level of the workflow's directory. In that case, use the command cp to copy it to that directory:

demo@pw-user-demo:/pw/workflows/workflow1$ cp folder/workflow.xml ./

Now you can run your workflow.

About Local Workflows

By cloning a git repository, you’ll be able to push and pull files from this clone.

However, if you choose to share this workflow on the PW Marketplace, the git tracking information will be lost. Other users who get your workflow from the Marketplace will only see a copy of the files in the state that they were shared to the Marketplace.

You can easily update your workflow on the Marketplace by un-sharing and then re-sharing it after any changes; however, there is no guarantee that other users will update their own copies of your workflow.

Remote Workflows

This method imports files from a GitHub repository every time workflow files are accessed by the PW platform. With this approach, workflow files are updated automatically whenever changes are saved in the repository.

Create a New Workflow

On the PW platform, navigate to the Workflows page.

Screenshot of the user clicking Workflows in the navigation bar.

Click + Add Workflow.

Screenshot of the user clicking the Add Workflow button.

On the next page, configure your workflow by entering a Workflow Name and selecting GitHub as your workflow type. Optionally, enter a Short description and Tags.

Click Add workflow.

Screenshot of the user selecting a GitHub workflow and clicking Add Workflow.

Modify the `github.json` File

When you create a new workflow, there will be a placeholder github.json file in your PW account. This file must be edited so that it points to your specific GitHub repository.

In the IDE pane, expand the workflows folder, then expand your new workflow’s folder. Double-click the github.json file in your new workflow’s folder.

Screenshot of the user clicking the github.json file in the IDE pane.

You’ll be taken to the file’s UI configuration page, where you can move and edit elements in the github.json file.

Screenshot of the github.json file's contents in the user interface.

When you mouse over file elements, they’ll be highlighted in yellow. Click any element to edit its text. Click the drag icon to move elements. Click the option icon to change, insert, duplicate, or remove elements.

Alternatively, you can click the DATA tab at the top of the page, which will take you to a plain-text editor for the file. You can edit the file directly or copy and paste text from elsewhere.

Replace the text in the github.json file with the following parameters:

repo is the URL for the GitHub repository you’re cloning.
- On GitHub, you can copy a repository’s URL by clicking the green Code button, then the copy icon.
branch is the repository’s git branch that information will be pulled from.
xml is the file that creates the workflow’s input form, which will be displayed in the Run Workflow tab on the PW platform.
thumbnail is the file that will be the workflow’s thumbnail image on the PW platform (optional).
readme is the file that holds the Markdown text for your workflow. This text will be displayed in the Run Workflow tab on the PW platform beneath the workflow’s input form.

In this example, we used a file from the PW repository Workflow Tutorial for the new parameters:

repo: https://github.com/parallelworks/workflow_tutorial.git
branch: main
xml: 001_single_resource_command/workflow.xml
thumbnail: thumbnail.png
readme: README.md

Your changes will be saved automatically. Now you can run your workflow.

About Cloning and Editing

If the workflow does not need all the files and/or folders in the repository you're cloning, you can use sparseCheckout in the JSON file to clone specific files:

repo: https://github.com/parallelworks/workflow_tutorial.git
branch: main
xml: 001_single_resource_command/workflow.xml
thumbnail: thumbnail.png
readme: README.md
    sparsecheckout [4]
    0 : folder1
    1 : folder2
    2 : file1.sh
    3 : file2.sh

Only the files and folders inside the sparsecheckout array will be cloned from the repository.

Additionally, if you edit your github.json file in a terminal interface, it will be formated like this:

{
    "repo": "https://github.com/folder/file.git",
    "branch": "main",
    "xml": "workflow.xml",
    "thumbnail": "logo.png",
    "readme": "README.md"
		"sparsecheckout": [
		    "folder1",
		    "folder2",
		    "file1.sh",
		    "file2.sh",
		]
}

Important Workflow Files

The files outlined in this section are essential for constructing workflows. You can create and edit these files either in a GitHub repository or directly in the IDE on the PW platform.

Please note that only form configuration and a main script are required to create a workflow. The other files listed here are optional.

Form Configuration

The file workflow.xml renders your workflow’s input form, defines its inputs, and tells the PW platform how to execute and cancel jobs. A PW workflow.xml has a nested structured: it’s made up of sections, which are made up of parameters, which are made up of parameter items.

Compare the XML data on the left from our Workflow Tutorial repository with the rendered workflow UI form on the right to see the relationship between workflow.xml and your workflow’s UI form:

A side-by-side view of the workflow.xml files and how it renders as a workflow's UI form.

Please note that this screenshot contains a simplified workflow.xml file; for the full list of XML parameters and items, see this example file in our GitHub repository Workflow Tutorial.

When creating a workflow.xml file, you must include these tags:

<tool> flags the file as containing scripts and/or commands (rather than only information because XML can be used to create a plain-text file, similar to Markdown or HTML).
<command> denotes what will run when you click Execute in the workflow form. You can combine this tag with interpreter to call a main script.
- Please note that this tag can only be used once (so only one script or file can be included).
<inputs> creates a section where you can configure input methods, which are denoted with the <param> tag. Please see Input Configuration below for more information.

Optionally, you can include these tags (not pictured in the example file above):

<cancel> denotes the command that stops a running workflow. You can combine this tag with interpreter to call a cancel script.
<section> creates a section break in your workflow form. You can customize a section heading with these options:
- name assigns a unique name to your section.
- type indicates the section type (this is almost always set as section).
- title creates a heading for your section.
- expanded indicates whether the section is expanded by default; pair this item with false if you want the section to be minimized or true if you want it to be expanded.

When configuring your workflow’s input options, you must include the following items inside the <param> and </param> flags:

name assigns a unique name to your parameter.
type denotes the kind of parameter, which can be:
- computeResource for a dropdown menu that lets users select their workspace or any running resource to run the workflow. Please see this section below for more information.
- integer for a field that accepts numeric input.
- hidden for a text parameter that you can configure and hide from the workflow's input form.
- select for a dropdown menu.
- text for a field that accepts alphanumeric input.
- textarea for an expanded field that accepts alphanumeric input. This item is useful if you want to include a code box for configuration. Use /n in this field to separate lines. For example, #!/bin/bash\nline 1\nline 2 would render as:

#!/bin/bash
line 1
line 2

You can also include these optional <param> items:

help creates a tooltip bubble that offers additional guidance when moused over (as seen on the right in the screenshot above).
label changes the visible label of a parameter. Without this item, a parameter defaults to its name value.
value creates a default value for your parameter. In the screenshot above, Command To Run has a value of hostname, rather than a blank text field.
depends_on and show_if items establish conditional relationships in a parameterized system, where parameter 1 depends on parameter 2 and is only shown if parameter 2 takes the value defined in show_if. The visibility and value assignment of parameter 1 hinge on the specific value of parameter 2, as stipulated by the condition set in show_if. For example, you could set a conditional dropdown that includes options for entering different scheduler directives, depending on whether you select a resource SLURM or PBS schedulers.

About computeResource

When choosing a parameter item for type, the most important parameter option is computeResource because it lets you select any active resource for your workflow. The platform pulls information about your selected resource, including the controller node's external IP address, username, and work directory; you can then use that information to run tasks in your workflow’s main script.

You can create a workflow without the computeResource parameter—in that case, the workflow will run on the user’s workspace rather than on a resource. In general, however, it’s better to include computeResource so that if a workflow happens to need more processing power, a user can choose an option that suits their needs.

Please note that the information collected by computeResource is only accurate if your active resource’s controller node is connected; for this reason, we recommend that new users wait until your resource's power button is green before running workflows.

For returning users, if your controller node is not connected, you are responsible for obtaining the information above directly by using the REST or the resource tool wrapper demonstrated in this PW workflow.

You can further customize computeResource with hideUserWorkspace and hideDisconnectedResources, as seen in this example:

<param
    name='resource'
    type='computeResource'
    label='Service host'
    hideUserWorkspace='true'
    hideDisconnectedResources='true'
    help='Resource to host the service'>
</param>

If you include hideUserWorkspace='true', users will only be able to select a resource; user workspace will not be an available option.

If you include hideDisconnectedResources='true', users will only be able to select resources with a connected controller node.

If you exclude these parameter items from your workflow.xml file, they will automatically be set to false.

Input Configuration

For complex workflows, two additional files are needed to work with the workflow.xml file: inputs.json and inputs.sh. These files contain the parameter names that are defined in the workflow.xml file as well as the parameter values when the job is launched. When you submit a workflow job, the platform writes these files to the job directory.

For more information, please see this section of our blog post on developing workflows.

Main Script

A fundamental component of every workflow is the main bash script. This script plays a pivotal role in the workflow's execution. It is stored in file main.sh and connects to the workflow through the <command> tag within the workflow.xml file. The main script can take the form of any bash script and can be used to call other scripts, such as Python scripts.

When a workflow job is launched, the main bash script is initiated. This script serves as the entry point for the workflow execution, orchestrating various tasks and operations defined within the workflow's logic. It acts as the focal point for coordinating the execution of the job and is responsible for loading the job's inputs, which are defined in the inputs.sh and/or inputs.json files, located within the job directory.

main.sh runs in the user workspace, but it can also interact with external resources using SSH, enabling the submission of jobs to remote resources. You can see this process in our simplest main script here in our repository Workflow Tutorial.

If a job is canceled, the process that is currently executing the main bash script is forcefully terminated.

Cancel Script

The cancel script can be called when a workflow job is canceled. To trigger the execution of the cancel script, simply include the <cancel> tag within the workflow.xml file associated with the workflow. This tag serves as a signal to initiate the cleanup process.

The cancel script itself can be placed in any location within the workflow's directory. Its purpose is to facilitate additional cleanup tasks that might be required upon cancellation of a workflow job.

When a workflow job cancellation is initiated:

the process responsible for executing the main script of the workflow is terminated.
the cancel script, if present, is executed to carry out further cleanup tasks.

A common use case for a cancel script is to manage background processes that were initiated by the workflow job. Additionally, it can be utilized to cancel remote batch jobs that were previously submitted to your resource, ensuring a comprehensive cleanup process.

Your cancel script can be created at runtime, or it can be added beforehand. For an example of a runtime cancel script, please see this main.sh file in our Workflow Tutorial repository. For an example of a cancel script added beforehand, please see this simplified example in our Interactive Session repository.

Description

Exclusive to GitHub-synced workflows, this file stores a Markdown description of what your workflow does. Generally, this description is stored in your repository’s readme.md file. The description is displayed at the bottom of the workflow’s input page on the PW platform.

`service.json`

This file is only essential for workflows that generate an interactive application or interface. For more information, please see Interactive Sessions.

Other Files

Be sure to include any additional files that are necessary for your main or cancel scripts.

Creating Workflows

Local Workflows​

Create a New Workflow​

Change or Remove Default Files​

Add Files​

Remote Workflows​

Create a New Workflow​

Modify the github.json File​

Important Workflow Files​

Form Configuration​

About computeResource​

Input Configuration​

Main Script​

Cancel Script​

Description​

service.json​

Other Files​

Further Reading​

Local Workflows

Create a New Workflow

Change or Remove Default Files

Add Files

Remote Workflows

Create a New Workflow

Modify the `github.json` File

Important Workflow Files

Form Configuration

About computeResource

Input Configuration

Main Script

Cancel Script

Description

`service.json`

Other Files

Further Reading