Talks

Invited Talks #

2026

PASC 2026

Panel on the Evolution of Workflow Management Software in Service of Scientific Discovery

Invited speaker · Bern, Switzerland

Slides

2026

Dal PNRR al Futuro: un'Infrastruttura di Ricerca e Innovazione per un Impatto Reale

Future-proof Workflow Orchestration with StreamFlow

Invited speaker · Bologna, Italy

Slides

2024

WSCC 2024

Scientific Workflows in the Continuum Era

Keynote speaker · Madrid, Spain

Abstract

Thanks to their generality, workflow models represent a powerful abstraction for designing complex applications and executing them on large-scale distributed architectures. However, several additional challenges appear when transitioning from cloud/HPC environments to the entire compute continuum. Continuum execution environments are fully distributed and modular, and modules can be heterogeneous and independent of each other. In addition, continuum workflows often rely on multiple intercommunicating agents that form complex micro-services architectures. Different agents deal with different communication and parallelization paradigms: network-based stream processing at the edge and file-based batch processing on HPC facilities. Finally, support for efficient interactive workflows in the continuum remains an open research problem. This talk explores these challenges and provides insights on how to deal with them. A ready-to-use software library accompanies each proposed solution to facilitate the reproducibility and reusability of the presented concepts.

Slides

2024

ELISE Wrap Up Conference & ELLIS Community Event

Cross-Facility Federated Learning - Part II

Invited speaker · Helsinki, Finland

Slides

2024

Workshop on workflow languages for HEP analysis

CWL in the HPC Ecosystem

Invited speaker · CERN, Meyrin, Switzerland

Slides

2023

ITADATA 2023

Workflow models for heterogeneous distributed systems

Invited speaker · Naples, Italy

Slides

2023

OSA2Micro

Workflows and the Common Workflow Language (CWL)

Invited speaker · Turin, Italy

Slides

2023

Human Brain Project Summit 2023

Standardised Workflows at EBRAINS

Invited speaker · Marseille, France

Abstract

A hands-on training offer for Standardised Workflows in EBRAINS. A short presentation will be used as an introduction, while the main hands-on session will provide information about Writing and Executing Standardised Workflows. TC will give some guidelines, so attendees can experiment with writing CWL tools and workflows and then they will be given access to VM to execute these workflows. The Workflows Dashboard will be also presented during the same session, offering to the attendees the opportunity to understand the different functionalities, use it with TC support and provide useful comments.

Slides

2023

CWLCon 2023

CWL for HPC: are we there yet?

Invited speaker · EMBL, Heidelberg, Germany

Abstract

Modern HPC applications are becoming so heterogeneous and complex that a modular approach to their design, deployment and orchestration is now necessary. This talk explores the benefits of using a vendor-agnostic workflow language (CWL) coupled with a hybrid workflow management system (StreamFlow) in the HPC ecosystem. Also, it will examine the requirements needed to model HPC applications effectively, the CWL’s readiness to meet such requirements, and the proposals made to improve the language where needed. Four real use cases will drive the discussion: the ACROSS Project (G.A. n. 955648), where CWL is the primary interface to model three HPC workflows, and the EUPEX Project (G.A. n. 101033975), where StreamFlow will be used for the rapid prototyping of a seismic engineering HPC application for a Modular Supercomputing Architecture (MSA) system.

Slides

2022

NVIDIA HPC Roundtable

CINI HPC-KTT: HPC Key Technologies and Tools National Lab

Invited speaker · Casalecchio di Reno, Italy

Slides

2022

2^nd HealthyCloud Workshop

Dossier: multi-tenant distributed Jupyter Notebooks

Invited speaker · Virtual event

Abstract

When providing data analysis as a service, one must tackle several problems. Data privacy and protection by design are crucial when working on sensitive data. Performance and scalability are fundamental for compute-intensive workloads, e.g. training Deep Neural Networks. User-friendly interfaces and fast prototyping tools are essential to allow domain experts to experiment with new techniques. Portability and reproducibility are necessary to assess the actual value of results. Kubernetes is the best platform to provide reliable, elastic, and maintainable services. However, Kubernetes alone is not enough to achieve large-scale multi-tenant reproducible data analysis. OOTB support for multi-tenancy is too rough, with only two levels of segregation (i.e. the single namespace or the entire cluster). Offloading computation to off-cluster resources is non-trivial and requires the user's manual configuration. Also, Jupyter Notebooks per se cannot provide much scalability (they execute locally and sequentially) and reproducibility (users can run cells in any order and any number of times). The Dossier platform allows system administrators to manage multi- tenant distributed Jupyter Notebooks at the cluster level in the Kubernetes way, i.e. through CRDs. Namespaces are aggregated in Tenants, and all security and accountability aspects are managed at that level. Each Notebook spawns into a user-dedicated namespace, subject to all Tenant-level constraints. Users can rely on provisioned resources, either in-cluster worker nodes or external resources like HPC facilities. Plus, they can plug their computing nodes in a BYOD fashion. Notebooks are interpreted as distributed workflows, where each cell is a task that one can offload to a different location in charge of its execution.

Slides

2021

CWLCon 2021

The Universal Cloud-HPC Pipeline for the AI-Assisted Explainable Diagnosis of COVID-19 Pneumonia

Invited speaker · Virtual event

Abstract

We'll present a methodology to run DNN pipelines on hybrid cloud+HPC infrastructure. We'll also define a "universal pipeline" for medical images. The pipeline can reproduce all state-of-the-art DNNs to diagnose COVID-19 pneumonia, which appeared in the literature during the first Italian lockdown and following months. We can run all of them (across cloud+HPC platforms) and compare their performance in terms of sensitivity and specificity to set a baseline to evaluate future progress in the automated diagnosis of COVID-19. Also, the pipeline makes existing DNNs explainable by way of adversarial training. The pipeline is easily portable and can run across different infrastructures, adapting the performance-urgency trade-off. The methodology builds onto two novel software programs: the streamflow workflow system and the AI-sandbox concept (parallel container with user-space encrypted file system). We reach over 92% accuracy in diagnosing COVID pneumonia.

Slides

Regular Talks #

2025

eScience 2025

BookedSlurm: meeting user needs for advanced resource reservations in Slurm

Chicago, IL, USA

Abstract

Modern scientific discovery is frequently backed by large-scale scientific experiments, which cannot prescind from the unparalleled computational capabilities provided by high-performance computing systems. However, large data centers usually prioritize system efficiency over user accessibility, posing challenges for researchers without advanced computer science expertise. This work introduces BookedSlurm, a secure and user-focused extension of the Slurm workload manager, aiming to democratize HPC access across interdisciplinary research domains. BookedSlurm enables a partially decentralized regulation of fine-grained advanced resource reservations through a novel credit-based framework, ensuring fair and predictable access to computing resources. Its modular architecture leverages dedicated microservices to manage reservations, credit handling, and accounting. These components are exposed through a secure REST API and an intuitive web-based dashboard, enhancing system usability for novice and expert users. While the dashboard simplifies interactions for non- specialists, advanced users can directly access agent-level APIs for more complex and automated operations. The effectiveness of BookedSlurm is validated on a real-world bioinformatics use case from the SUS-MIRRI.IT project, showcasing its ability to enhance usability, optimize job scheduling, and streamline execution workflows.

Slides

2025

2025 ICSC Spoke 1 Event

Cross-Platform Full Waveform Inversion (CPFWI)

Milan, Italy

Slides

2024

EuroHPC User Day 2024

Towards a European AI Platform

Amsterdam, Netherlands

Abstract

The rapid advancements in AI and Machine Learning necessitate a robust computational infrastructure to support cutting-edge research and industrial applications. From the academic and industrial AI community perspective, voiced in the recent ELISE project, the European AI platform is recommended to center around the EuroHPC growing ecosystem. It should be user-driven, easily accessible, powerful, and compliant with European regulations. AI-optimized and dedicated supercomputers for the European AI community are also coming, in addition to upgrading partitions of existing EuroHPC systems to 'AI enabled' stage. Related calls have been initiated in September 2024. Further, conventional EuroHPC systems are suggested to be extended with quantum computing, edge AI, and neuromorphic computing to cater to AI models deployed on network edge devices and sustainability in the long run. The challenges are presented in three case studies, ranging from training Transformers on HPC to LLMs trained federally across three different Euro HPC systems to recent results on hybrid classical-quantum application. This paper concludes with case studies results-informed next steps believed to benefit AI practitioners and the broader AI community.

Slides

2024

CWLCon 2024

CWL Working Groups

Amsterdam, Netherlands

Abstract

This presentation introduces the new CWL Working Groups initiative, describing what a Working Group actually is, which Working Groups already exist in the CWL community, and how anybody can create a new officially recognized Working Group. Then, the presentation will explore the CWL4HPC Working Group, using it as an example of how a CWL Working Group can actually work.

Slides

2023

EuroHPC User Day 2023

Cross-Facility Federated Learning

Bruxelles, Belgium

Abstract

In a decade, AI frontier research transitioned from the researcher's workstation to thousands of high-end hardware-accelerated compute nodes. This rapid evolution shows no signs of slowing down in the foreseeable future. While top cloud providers may be able to keep pace with this growth rate, obtaining and efficiently exploiting computing resources at that scale is a daunting challenge for universities and SMEs. This work introduces the Cross-Facility Federated Learning (XFFL) framework to bridge this compute divide, extending the opportunity to efficiently exploit multiple independent data centres for extreme-scale deep learning tasks to data scientists and domain experts. XFFL relies on hybrid workflow abstractions to decouple tasks from environment-specific technicalities, reducing complexity and enhancing reusability. In addition, Federated Learning (FL) algorithms eliminate the need to move large amounts of data between different facilities, reducing time-to-solution and preserving data privacy. The XFFL approach is empirically evaluated by training a full LLaMAv2 7B instance on two facilities of the EuroHPC JU, showing how the increased computing power completely compensates for the additional overhead introduced by two data centres.

Slides

2023

2023 Workflows Community BoF: Modern Workflows for Continuum and Cross-Facility Computing

Orchestrating Multi-Domain Workflows: The ACROSS Approach

Denver, CO, USA

Slides

2023

WORKS 2023

A Systematic Mapping Study of Italian Research on Workflows

Denver, CO, USA

Abstract

An entire ecosystem of methodologies and tools revolves around scientific workflow management. They cover crucial non-functional requirements that standard workflow models fail to target, such as interactive execution, energy efficiency, performance portability, Big Data management, and intelligent orchestration in the Computing Continuum. Characterizing and monitoring this ecosystem is crucial to developing an informed view of current and future research directions. This work conducts a systematic mapping study of the Italian workflow research community, analyzing 25 tools and 10 applications from several scientific domains in the context of the “National Research Centre for HPC, Big Data, and Quantum Computing” (ICSC). The study aims to outline the main current research directions and determine how they address the critical needs of modern scientific applications. The findings highlight a variegated research ecosystem of tools, with a prominent interest in advanced workflow orchestration and still immature but promising efforts toward energy efficiency.

Slides

2023

2023 CINI HPC-KTT National Assembly

ACROSS: HPC Big Data Artificial Intelligence Cross Stack Platform Towards Exascale

Pisa, Italy

Slides

2022

EAGE 2022

Hybrid Workflows For Large-Scale Scientific Applications

Milan, Italy

Abstract

Large-scale scientific applications are facing an irreversible transition from monolithic, high- performance oriented codes to modular and polyglot deployments of specialised (micro-)services. The reasons behind this transition are many: coupling of standard solvers with Deep Learning techniques, offloading of data analysis and visualisation to Cloud, and the advent of specialised hardware accelerators. Topology-aware Workflow Management Systems (WMSs) play a crucial role. In particular, topology-awareness allows an explicit mapping of workflow steps onto heterogeneous locations, allowing automated executions on top of hybrid architectures (e.g., cloud+HPC or classical+quantum). Plus, topology-aware WMSs can offer non-functional requirements OOTB, e.g. components’ life-cycle orchestration, secure and efficient data transfers, fault tolerance, and cross-cluster execution of urgent workloads. Augmenting interactive Jupyter Notebooks with distributed workflow capabilities allows domain experts to prototype and scale applications using the same technological stack, while relying on a feature-rich and user-friendly web interface. This abstract will showcase how these general methodologies can be applied to a typical geoscience simulation pipeline based on the Full Wavefront Inversion (FWI) technique. In particular, a prototypical Jupyter Notebook will be executed interactively on Cloud. Preliminary data analyses and post-processing will be executed locally, while the computationally demanding optimisation loop will be scheduled on a remote HPC cluster.

Slides

2022

ITWSHPC 2022

Hybrid workflows for heterogeneous distributed computing

Torino, Italy

Slides

2022

JOTB 2022

Distributed workflows with Jupyter

Malaga, Spain

Abstract

This workshop explores Jupyter Notebooks and its potential to express complex workflows and coordinate their distributed execution, powered by the Jupyter workflow kernel developed at University of Torino. In particular, the workshop will be composed of two main units. The first part will cover a general introduction to literate computing and Jupyter workflows, exploring their features and limitations in terms of portability, reproducibility, and ease of use by domain experts. Then, the second part will explore their capability to express distributed applications and to automatically optimize their distributed execution. In both parts, a theoretical introduction will be followed by hands-on exercises.

Slides

2022

JOTB 2022

OpenDeepHealth: Crafting a Deep Learning Platform as a Service with Kubernetes

Malaga, Spain

Abstract

Did you ever see a Distributed Deep-Learning Platform as a Service? Sure not, it's challenging! Join this session to discover OpenDeepHealth, a PaaS built on top of Kubernetes and designed from principles with a multi-tenancy first approach! OpenDeepHealth (ODH) is a hybrid HPC/cloud infrastructure designed and developed by the University of Torino in the DeepHealth European project. The goal was to provide a self-service platform for Deep Learning, allowing domain experts to bring their own data and run training and inference workflows in a multi-tenant container-native environment. Kubernetes, the de-facto standard for container orchestration, is the perfect framework for building such a distributed system, optimising resource usage and allowing a horizontal scaling of the infrastructure. StreamFlow, the ODH workflow engine, can schedule and coordinate different workflow steps on top of a diverse set of execution environments, ranging from single Pods to entire HPC centres. As a result, each step of a complex Data Analysis pipeline can be scheduled on the most efficient infrastructure. At the same time, the underlying run-time layer automatically takes care of workers' lifecycle, data transfers, and fault-tolerance aspects. ODH implements a novel form of multi-tenancy called "HPC Secure Multi-Tenancy,"" specifically designed to support AI applications on critical data. Thanks to Capsule, the multi-tenant Kubernetes operator, ODH can enforce multi-tenancy at the cluster level, avoiding privilege escalations and exploits, minimising operational costs, and enforcing custom policies to access external HPC facilities. Finally, ODH provides multi-tenant distributed Jupyter Notebooks as a service through the Dossier platform. This feature gives domain experts a high-level, well-known programming model to write portable and reproducible Deep Learning pipelines, augmenting standard notebooks with resource segregation, data protection and computation offloading capabilities.

Slides

2020

Workshop GARR 2020

JupyterFlow: Jupyter Notebooks su larga scala

Rome, Italy

Abstract

I Jupyter Notebook sono largamente utilizzati sia in ambito industriale che accademico come strumento di didattica, prototipazione e analisi esplorative. Purtroppo il sistema runtime standard di Jupyter non è abbastanza potente per sostenere un carichi di lavoro reali e spesso l'unica soluzione è quella di riscrivere il codice da zero in una tecnologia con supporto HPC. Intrgrando lo stack Jupyter con StreamFlow (https://streamflow.di.unito.it/) è possibile creare i Notebook tramite un'interfaccia web su cloud ed eseguirli in maniera trasparente in remoto su una VM con GPU o su nodi HPC.

Slides

2019

Workshop GARR 2019

Un approccio dichiarativo a workflow e pipeline di micro-servizi

Rome, Italy

Slides

2019

PDP 2019

Deep Learning at Scale

Pavia, Italy

Abstract

This work presents a novel approach to distributed training of deep neural networks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training. Established techniques for data parallel training are discussed from both a parallel computing and deep learning perspective, then a different approach is presented that is meant to allow DNN training to scale while retaining good convergence properties. Moreover, an experimental implementation is presented as well as some preliminary results.

Slides