Modern accelerated programming

Igor Sfiligoi, Grant Scott

Many science domains have become dependent on compute, either for simulation or analysis tasks. With compute now being a critical step in those science discovery journeys, the affected scientists had to move from lightweight compute tasks on personal devices to long running processes on compute- optimized systems, many of which have transitioned to GPU resources, too. Writing high-performing code has thus become imperative, due to both time-to-solution needs and the high cost of such systems. Unfortunately, accelerated high performance programming requires significantly more thought than simply converting mathematical formulas into a set of sequential compute commands. Modern systems are highly parallel, with a high compute throughput to memory bandwidth ratio. The ability of splitting a problem in many independent sub-problems thus becomes critical, with memory-locality playing an outsized role in reducing time-to-solution. Many scientific programmers do not understand the complexity of modern hardware, resulting in poorly performing code. The aim of this tutorial is to fill that knowledge gap by providing the necessary notions for any C/C++ programmer to develop effective and efficient applications for modern accelerated systems. Theoretical lectures are followed by hands-on exercises on CPU and GPU systems, both traditional CPU plus discrete GPU and more tightly coupled Grace Hopper nodes, inside the National Research Platform.

ADIOS: An adaptable and scalable I/O framework for storage, in situ data processing and wide-area data transfer

Scott Klasky, Norbert Podhorszki, Qian Gong, Stefanie Reuter

This half-day tutorial provides a comprehensive introduction to building blocks of complex scientific workflows, including I/O, remote data access, and visualization. The tutorial features live demonstrations involving a realistic simulation, data analysis and visualization, which illustrate how to use these tools in a plug-and-play manner. By the end of the day, attendees will see a workflow composed of simulation, data analysis and data visualization and run in various configurations of file-based and in situ setups. The tutorial will demonstrate how the presented tools can be used transparently for streaming workflows vs. file-based workflows. Furthermore, these tools can be used over the WAN to allow a client running at one location to access remote data for analyzing and visualizing the data locally. The tutorial will introduce the participants to four main US Department of Energy Exascale Computing Project tools and technologies including 1) ADIOS which provides a publish/subscribe I/O abstraction unifying storage and staging I/O. 2) MGARD which is an error-controlled compression and refactor software for floating-point scientific data 3) Fides a visualization data model and schema which allows ADIOS data to be easily ingested in a variety of visualization tools, and 4) ParaView a scientific visualization framework for both file-based post-processing as well as in situ visualization. These tools are integrated together to allow for both file-based and in-memory streaming analysis. Furthermore, the tutorial demonstrates a remote data access service, WANDS, that uses ADIOS to transfer UK Fusion experimental data to compute clusters at University of Cambridge.

WfCommons - a framework for enabling scientific workflow research and development

Rafael Ferreira da Silva, TainĂ£ Coleman, Fred Suter

WfCommons is an open-source framework for enabling scientific workflow research and development by providing foundational tools for analyzing workflow execution instances and generating synthetic, yet realistic, workflow instances that can be used to develop new techniques, algorithms, and systems that can overcome the challenges of efficient and robust execution of ever larger workflows on increasingly complex distributed infrastructures. In this tutorial, we will introduce the five major components of the WfCommons framework and present how they can be combined to cover the entire research and development life cycle of workflows and workflow management systems. We will describe the generic workflow description format (WfFormat) used by all WfCommons components and explain how to create workflow execution instances (WfInstances) from actual execution logs. A second time, we will explain how to go beyond available logs by creating workflow recipes (WfChef) and generating synthetic, yet realistic workflows from these recipes (WfGen). Finally, we will expose how to use simulation to develop and evaluate scheduling and resource provisioning algorithms and evaluate current and emerging computing platforms (WfSim).

Applications patterns for Urgent Science and the Edge-Cloud Computing Continuum

Daniel Balouek, Baptiste Jonglez, Sidi Mohammed Kaddour

The field of sensors and wireless communications has seen enormous technological advancements over the past 20 years, making it possible to transform nearly any physical "Thing," defined in the broadest sense of the word, into an Internet of Things (IoT) device that can elaborate on data sensed from its surroundings and communicate with other nearby or distant smart devices via wireless channels and the Internet. The successful adoption of IoT devices in various application domains, including industry, cities, and agriculture, has made it possible to realize pervasive systems and applications. These applications make use of the new and finely-grained information gathered from the target environment to enable autonomous decision-making and the implementation of the reliable and robust management routines. In this context, choosing where to execute a function in a massively geographically distributed system depends on a large number of factors ranging from the performance of a given implementation on a given resource to the variability of the data input size, network links, and computing resources. Motivated by Early Earthquake Warning (EEW) and fire science workflows, we propose to abstract the common operations that enable the fluid integration of data-driven analytics with computing resources across a large, dynamic infrastructure. In this tutorial, we present a software architecture middleware services to adapt data-driven analytics across a federation of edge and cloud resources. Specifically, attendees will be introduced to Urgent Computing requirements and the presentation of common application patterns common for most applications in data-driven analytics (deployment, orchestration, adaptation, distributed sensing). Attendees will be introduced to programming system and resource management approaches for determining what, where and when to compute across a large set of resources. This session is aimed at computer or data scientists with interest in leveraging cyberinfrastructures for stream processing, either for systems or applications research in any domain.

Cloud DevOps for Scientific Workflows

Prasad Calyam, Roshan Neupane

Cloud-hosted services are being increasingly used in hosting business and scientific applications due to cost-effectiveness, scalability, and ease of deployment. To facilitate rapid experimentation, automation, and reproducibility of scientific workflows, the area of Development and Operations (DevOps) is fast evolving. It is necessary to train the future generation of scientific researchers and practitioners such that they are knowledgeable in the DevOps-enabled scientific workflow operations that leverage technologies such as KubeFlow for Machine Learning Operations (MLOps), Prometheus and Grafana for data security, monitoring and visualization. In this tutorial, we will present two learning modules that use cutting-edge Cloud DevOps tools/technologies using open/public cloud infrastructures hosted on our online "Mizzou Cloud DevOps" (MCD) platform. These modules feature research-inspired real-world application use cases offering unique focus on DevOps technology and cloud platforms. Specifically, this tutorial will cover - a) Kubeflow with Smart Scheduling app for tutoring MLOps-related skills for ML scientists/practitioners, and b) Prometheus/Grafana with Video Transmission app for learning cutting-edge monitoring and visualization of data. Our learning modules allow learners to gain skills in using the latest technologies to implement relevant machine learning pipelines and monitoring and visualization of data analytics.