Tutorials

Accelerated Data Science with RAPIDS

Didier Barradas-Bautista (King Abdullah University of Science and Technology)

This tutorial presents a practical introduction to GPU-accelerated data science using the open-source RAPIDS ecosystem, with a short advanced module on distributed hyperparameter optimization. The design prioritizes immediately transferable workflows for data preparation, machine learning, and graph analytics while preserving a concise extension for model tuning at scale. The core content covers cuDF, cuML, cuGraph, and GPU-enabled XGBoost for end-to-end analytics on accelerated hardware. A final advanced segment introduces Ray Tune and selected Skorch/PyTorch patterns to illustrate how the same workflow can be extended to distributed search and deeper optimization settings. The tutorial is delivered through code-based notebooks in a reproducible cloud-ready environment and is structured to serve intermediate practitioners who want a compact, applied introduction to modern GPU data science workflows.

End-to-End Provenance for Scientific Workflows with the yProv Ecosystem

Gabriele Padovani, Nicola Marchioro (University of Trento)

Traceability and Reproducibility are foundational challenges in modern scientific computing and machine learning. Yet most practitioners rely on ad-hoc logging, making it difficult to trace the origin of a result, reproduce an experiment, or audit a data pipeline. The yProv Framework is an open-source ecosystem built on the W3C PROV standard that addresses these challenges through structured, multi-level provenance collection, storage, and visualization. This tutorial introduces participants to the yProv lifecycle. Attendees will first instrument a PyTorch training script using yProv4ML, capturing metrics, hyperparameters, energy consumption, and output artifacts. Participants will upload their generated provenance documents to a live yProvStore instance (hosted at the University of Trento), which provides API access backed by MinIO object storage and a PostgreSQL database. They will use the yProvExplorer web dashboard to visualize provenance as interactive directed graphs, and finally they will be able to experiment with visualizing their results from the ML runs, tracking progress with yProv4DV. This final visualization phase will also be uploaded on the yProvStore for additional traceability, and other users will be able to download scripts and re-run them locally. Throughout the tutorial, participants will see how yProv adheres to FAIR principles: every document receives a unique identifier, standard APIs enable programmatic access, and native W3C PROV support ensures interoperability with other provenance tools. By the end of the session, attendees will be able to instrument their own workflows, query provenance programmatically, and gather provenance information from real scientific pipelines.

Improving a workflow engine for HPC applications with ad-hoc file systems

Genaro Śanchez-Gallegos, Catherine A. Torres Charles, Javier Garcia-Blas, and Jesus Carretero (Universidad Carlos III de Madrid)

Workflow engines are indispensable for orchestrating large-scale scientific computations across domains such as meteorology, disaster management, food safety and territorial management. However, implementing, managing, and executing real-world scientific applications as workflows across multiple infrastructures (e,g., servers, clusters, cloud) remains a challenge. DagOn* ( Directed Acyclic Graph On anything) addresses this by providing a lightweight Python-based engine capable of managing parallel jobs represented as Directed Acyclic Graphs (DAGs) based on parallel patterns that can be executed on any combination of local machines, on-premise high-performance computing clusters (HPC), containers, and cloud-based virtual infrastructures. Despite DagOn*’s ability to optimise task parallelism, overall performance is frequently hindered by I/O bottlenecks that occur when multiple applications concurrently access a shared parallel file system. A solution to this issue is to improve the storage hierarchies of traditional HPC systems by adding new tiers to manage I/O bottlenecks and to support data movement between tiers based on usage frequency. In this way, ad-hoc file systems serve as intermediate storage layers that utilise the new storage technologies, such as non-volatile random-access memory devices and flash-based solid-state drives, to provide temporary storage based on application behaviour in the HPC environment. This tutorial introduces Hercules as ad-hoc file systems for traditional HPC applications, AI, and data-intensive workflows like ones generated with DagOn*. This alternative presents a feasible solution for generic file accesses, MPI, and workflow acceleration.

Portable By Design: Deploying Notebook-based Scientific Workflows across HPC Clusters

Md Saiful Islam, Douglas Thain (University of Notre Dame); A S M Shahadat Hossain, Tanu Malik (University of Missouri)

Scientific workflows running on HPC systems are often difficult to reproduce and deploy across computing environments due to differences in software stacks, data dependencies, and execution models. This tutorial introduces a practical workflow ecosystem built around three complementary open-source tools: TaskVine, Sciunit, and Floability. Participants will construct distributed workflows using TaskVine, capture reproducible executions using Sciunit, and deploy portable notebook-based workflows across HPC systems using Floability backpacks. The tutorial emphasizes hands-on exercises that reflect real-world portability and reproducibility challenges faced by researchers working with HPC systems and national cyberinfrastructure.

Programming distributed computing with PyCOMPSs

Rosa M. Badia, Javier Conejero, Daniele Lezzi (Barcelona Supercomputing Center)

The tutorial aims to describe methodologies and tools to simplify the lifecycle of application workflows, in particular those that combine HPC software, data analytics and artificial intelligence. We will present tools to help to streamline the development, execution and sharing/exchange metadata of complex workflows in HPC systems. PyCOMPSs is a task-based parallel programming model in Python. Based on simple annotations, sequential Python programs can be executed in parallel in HPC-clusters and other distributed infrastructures. The tutorial will present an overview of PyCOMPSs, as well as an introduction to the dislib, a machine learning library developed on top of PyCOMPSs. The talk will be illustrated with the explanation of some applications developed with PyCOMPSs. The tutorial will include a hands-on session in the MareNostrum supercomputer with workflows combining HPC based and AI applications.