Tutorials

ADIOS: An adaptable and scalable I/O framework for storage, in situ data processing and wide-area data transfer
October, 2023

This tutorial provides a comprehensive introduction to building blocks of complex scientific workflows, including I/O, remote data access, and visualization. The tutorial features live demonstrations involving a realistic simulation, data analysis and visualization, which illustrate how to use these tools in a plug-and-play manner. By the end of the day, attendees will see a workflow composed of simulation, data analysis and data visualization and run in various configurations of file-based and in situ setups. The tutorial will demonstrate how the presented tools can be used transparently for streaming workflows vs. file-based workflows. Furthermore, these tools can be used over the WAN to allow a client running at one location to access remote data for analyzing and visualizing the data locally. The tutorial will introduce the participants to three main US Department of Energy Exascale Computing Project tools and technologies including 1) ADIOS which provides a publish/subscribe I/O abstraction unifying storage and staging I/O. 2) Fides a visualization data model and schema which allows ADIOS data to be easily ingested in a variety of visualization tools, and 3) ParaView a scientific visualization framework for both file-based post-processing as well as in situ visualization. These tools are integrated together to allow for both file-based and in-memory streaming analysis. Furthermore, the tutorial demonstrates a remote data access service, WANDS, that uses ADIOS to transfer UK Fusion experimental data to compute clusters at University of Cambridge.

Evidence-based methods of communicating science to the public through data visualization
October, 2023

For eScience professionals at the forefront of innovation in collaborative, computationally-intensive, and data-intensive research, effective communication of complex scientific/technical concepts is crucial to engage diverse audiences and ensure the broader impact of this work. This tutorial on cinematic scientific visualization offers a powerful science communication approach relevant to eScience professionals’ needs. Benefitting from on decades of experience from the Advanced Visualization Lab at the National Center for Supercomputing Applications, participants will learn the intricacies of creating high-quality, immersive visualizations that contextualize cutting-edge research in an accessible and captivating manner. By leveraging tools from computational science communities and incorporating findings from recent empirical research, this tutorial provides valuable insights and guidance for eScience professionals to create visually compelling representations of data that resonate with a wide range of audiences.

Scientific Workflows at Scale using GNU Parallel
October, 2023

This tutorial offers theoretical foundations and hands-on experience on GNU Parallel, a shell tool that executes terminal commands in parallel on one or more computers. A versatile tool, Parallel offers numerous options and features that makes it easy, concise and efficient to work on multicore and multinode architectures. Ability to run tasks in parallel over files / raw data in a wide variety of modes and load distributions makes it a very powerful and suitable tool for a wide variety of workflows. GNU Parallel fits nicely with the HPC middleware and filesystems making it a low friction tool to perform not only the computations but data movements efficiently. Most HPC centers offer their resources in a shared manner via resource managers and schedulers. This makes it important for any tool to be able to work well with such schedulers. Parallel integrates nicely with HPC job schedulers such as SLURM, LSF, PBS / Torque and others. Workflows often require dependency chaining between multiple computing stages. With simple shell techniques we will see how we can leverage Parallel to achieve fully asynchronous, parallel multi-stage scientific workflows. Parallel is invaluable from the scientific user’s point of view, in that the simplicity of parallel empowers users to rapidly extract the parallel profile of a complex workflow and experiment and hone it for large scale runs using Parallel or some other specialized workflow tools.

Get started with Quarto - Markdown with executable code
October, 2023

The demand for scientific staff to prepare reports for governmental, public, and scientific dissemination has been increasing. However, the process of creating reports is often time-consuming and burdensome due to the need to switch between different programs for writing, analysis, creating tables and figures, and other graphical and textual tasks. To address this challenge, there is a growing importance in learning efficient tools that combine programming and report generation, streamlining the workflow and reducing manual effort. Quarto, a second-generation reporting framework developed by Posit, is built upon the widely used markdown plain text format. It seamlessly combines markdown-based prose with the ability to run code for formatting, analysis, and data visualisation within the same environment, providing researchers with an integrated platform to produce reports. Quarto is designed to be multilingual, meaning it can be utilised with any programming language, eliminating the need to have a specific language installed (other than Quarto) for rendering the final report. This versatility makes Quarto an excellent tool to learn, regardless of one's preferred programming language. In this tutorial, the primary focus will be on learning how to render basic HTML and PDF formats, which serve as the foundation for many other formats. Additionally, the tutorial will cover the incorporation of cross-references and citations, crucial elements in any scientific workflow.

UnoAPI: Modern techniques for engineering high-performance software
October, 2023

One of the key goals in high-performance and distributed software engineering is to leverage the specific capabilities of the target hardware to the extent possible. Today’s systems are typically heterogeneous, where one or more architectures may be present within a single system, such as conventional CPU cores combined with accelerators such as GPUs and FPGAs. Although parallel computing itself has reached a high level of maturity, as we move toward exascale and beyond computing, challenges similar to those that plagued the earliest days of parallel and distributed computing are beginning to resurface: Leveraging heterogeneity while balancing performance, software portability, and developer productivity (P3). This tutorial provides hands-on experience in developing high-performance and embedded software for heterogeneous architectures using Intel’s oneAPI reference implementation of the Khronos SYCL standard in conjunction with state-of-the-art software engineering methods. By raising the abstraction level via its unified application programming interface (API), oneAPI makes it easier to develop portable high-performance software for systems with embedded hardware accelerators, such as GPUs and FPGAs.