Work Smarter, Not Harder: CWL and the Seven Bridges platforms
Seven Bridges is committed to ensuring reproducibility and portability of research analysis, and the use of Common Workflow Language (CWL) for tool and workflow descriptions helps to facilitate both. CWL is an open-source, community-driven specification and emerging standard for describing how to run computational analysis with command line tools in short, human- and machine-readable text files. The use of CWL is being adopted increasingly in genomics, astronomy, and other scientific disciplines, and the community base has been growing steadily. Seven Bridges has been instrumental in developing and implementing CWL specifications, and has used CWL to describe all tools and workflows on our platforms since 2016.
Interested in the many benefits of using CWL for your research, but daunted by the time investment or learning curve? Seven Bridges is here to help: our platforms have a variety of intuitive tools and features designed to assist you in implementing CWL for your research and analysis needs. In this article, you will learn about how to turn this small upfront time investment into a big analysis dividend later. Working with CWL on the Seven Bridges platforms offers a variety of benefits to save you time, effort, and money:
- Reproducible and portable biomedical analysis
- Working with code made simple: Seven Bridges Graphical CWL Editor
- Streamlined and efficient collaboration
- Training and support for the user community
CWL and Seven Bridges platforms for reproducible and portable analysis
There are a growing number of tools available from all over the research community, and they are constantly being developed, edited, and iterated upon. While the proliferation of such tools provides a useful service to the community, it also presents a challenge: as more tools are created, a set of standards needs to be adopted in order to ensure reproducibility and portability of these tools. Seven Bridges advocates for CWL to be that standard. But why are reproducibility and portability important, how does CWL facilitate each?
Reproducibility in data computation and analysis is vital in order to make scientific research as accurate, efficient, and cost-effective as possible. Reproducibility is also crucial for tracking and debugging errors, and for the validation of results. Because CWL tracks code version, inputs, outputs, and more, researchers can use it to pinpoint exactly where in an analysis pipeline a particular piece of data led to a new insight, such as the identification of a gene of interest.
Portability is important because a workflow designed for one type of computational environment, such as in HPC, may not function in an identical way in another, such as in a cloud environment. When this happens, there is time and effort lost to debugging the tool to make it work properly in the desired analysis environment, and it could also result in errors or inconsistencies in the data produced. As such, portability is important for achieving reproducibility as well. Applications written in CWL are highly portable, and can be executed on any one of various CWL-compliant analysis environments. They can also be downloaded, modified, and executed on local infrastructure like High Performance Clusters (HPCs) or uploaded and run on the cloud.
CWL’s portability can also make finding and implementing tools and workflows easier: anything a user can find written in CWL on Dockstore or GitHub can be easily uploaded to the Seven Bridges platforms. For example, if you are working on the CGC and find a workflow in Dockstore, you can select the “Launch with CGC” button” and have that workflow appear in your CGC workspace, likewise for NHLBI BioData Catalyst powered by Seven Bridges. Tools and workflows provided on the Seven Bridges Public Apps Galleries are already cloud-optimized, and can be run easily directly through the platforms. In addition, there are hundreds of CWL tools and workflows readily accessible in popular repositories such as Dockstore and GitHub.
Graphical CWL Editors: ease-to-use tools, code made simple
Suppose you have found a tool in a repository, but aren’t sure how to get it running on your platform of choice. Or, you have found an existing tool and you want to modify and expand its capabilities? Or better still, you’ve had great success with your modified tool, and now you want to share it with other researchers, but don’t know how to proceed?
We take much of the coding burden off the user’s shoulders with Seven Bridges Graphical CWL Editor, named the Rabix Web Composer. The Rabix Web Composer is an open-source integrated development environment for CWL, and is now cloud-optimized and hosted on the Seven Bridges platforms, capable of handling both CWL v1.0 and CWL sbg:draft-2. The Rabix Web Composer shines in its ease-of-use, featuring a graphical Visual Editor and a text-based Code Editor for setting up CWL descriptions based on your preferred method. The Rabix Web Composer also features workflow sharing and version tracking, for enhanced reproducibility and portability for sharing with collaborators. There are two parts to the Rabix Composer: a Tool Editor to provide a user interface for describing individual command line tools in CWL, and a Workflow Editor for enabling rapid assembly of tools into a workflow.
Tool Wrapping and the Tool Editor
Tool wrapping with CWL simplifies the process of editing, running, and sharing tools. Tool wrapping is the process of using CWL to describe command-line tools so that they can be run as an application or as a tool in part of a larger workflow. Seven Bridges features the Tool Editor to enable easy CWL wrapping of tools with an interactive, visual format. It features text fields to comprehensively describe a tool’s commands and parameters. As you complete these fields, the Seven Bridges Graphical CWL Editor will dynamically generate corresponding CWL code without you having to know CWL syntax. If you wish, you can see this raw CWL code in the Code Pane. Better still, this is done directly on the platform, so the tools can be executed immediately in order to test and troubleshoot.
An additional benefit of wrapping tools with CWL is that it makes them portable, shareable, and also searchable on repositories like Dockstore for other collaborators to use. Getting a tool to run on the platform is as easy as copying and pasting the CWL text into the Tool Editor, which turns the tool into a graphical icon the user can then work with. The Tool Editor can also aid in fine-tuning tools for specialized use on the platforms, or for extending the capabilities of an existing tool for your own unique research needs. For more information, please see the Tool Editor documentation, general tutorial, and also an in-depth guide for wrapping tools in CWL here.
The Workflow Editor
The Workflow Editor is an integrated development environment for CWL, and optimized for working with large and complex workflows. With the Visual Editor, you are shown a graphical representation of the workflows, which is easily edited by dragging and dropping nodes (tools and workflows) and their connections to edit and rearrange workflows intuitively. You can also switch between the Visual and Code Editors seamlessly to rapidly develop applications: changes in one editor are reflected in the other in real-time. Once the workflow is arranged, you can simply run the workflow directly on the platform for the purposes of benchmarking or debugging.
To learn more, we have a tutorial for the Workflow Editor and also documentation for reference. For more information about Rabix and its features in general, see the Rabix Knowledge Center.
Dedication to user success
Daunted by the background information required to implement CWL effectively, or worried about learning the basics and features for the Rabix Web Composer, or the Seven Bridges platforms? Don’t fret: the Seven Bridges team has leveraged its expertise to provide a bounty of documentation and tutorials to help users get started on the platforms quickly and easily. Each platform also has its own extensive documentation pages, found for the CGC, NHLBI BioData Catalyst, CAVATICA, and the Seven Bridges Platform, respectively.
Also at Seven Bridges, we have Support and Outreach teams dedicated to user success. We offer small group training, 1-on-1s, and webinar-style sessions to get users up to speed. For users encountering technical difficulties on the platforms: our Support Team is very responsive to assist with troubleshooting and debugging questions, so you can spend more time on data analysis and less time fighting with code.