Bioinformatics Workflow Portability is Critical to Achieving Reproducibility
With the explosion of genomic data in recent years, the number of bioinformatics workflows has seen a corresponding proliferation. Researchers and developers now have a wealth of analysis options, from building their own tools to taking advantage of those developed by others. However, a workflow developed in one environment may not function the same way in another environment. This is true whether a workflow is developed locally or in the cloud. The impact of this is dramatic- at best, it can be frustrating and time consuming to adapt a workflow to a new location, and at worst, the same workflow run on the same data may produce different results in another environment. Workflow portability is therefore critical to improving overall reproducibility.
This is a major challenge for collaboration, sharing, and distributing tools. Whether you are a researcher collaborating across a hallway, a large consortium coordinating analysis of many datasets across institutions, or a pharmaceutical company with centers across the globe, the need for many users to run identical workflows is clear.
This challenge is addressed by combining software containers, like Docker, with Common Workflow Language (CWL), which is an open, community driven, text based standard for workflow descriptions. Together, CWL and Docker containers allow scientists to wrap and share tools so that they can be run in any environment. Docker images can be shared quickly and easily, while CWL ensures the workflow will operate consistently across different platforms, allowing workflows packaged in this manner to be portable and reproducible. Dockstore is a popular tool repository that houses over 300 CWL tools and workflows making it easy for users to find and manage collections of tools. CGC users can even copy and run any CWL workflow they find on Dockstore on the CGC Platform.
Every CWL workflow on Dockstore shows a “Launch with CGC” button. With a simple click of this launch button the workflow will be copied to a project of the users choosing and can then be run or edited just like any other CGC workflow. While in most cases workflows found on Dockstore will run as-is on a cloud environment like the CGC, in some rare cases modifications may be needed to make them operate properly. In such a case the CGC workflow editor can be used to quickly and easily make fixes, updates and modifications.
With this integration, dozens of new workflows are now accessible to CGC users in a streamlined and intuitive manner. Users can confidently run workflows from Dockstore regardless of the environment of the developer, which is made possible through the combined power of CWL and Docker. These workflows can be shared with collaborators around the world while maintaining full reproducibility. Users can spend more time focusing on the science, not on the mechanics of workflow maintenance.