Optimizing workflows to advance the use of AI for science


December 17, 2021 – Scientists come to the US Department of Energy (DOE) national laboratories to solve big problems. Increasingly, these scientists are turning to artificial intelligence (AI) and machine learning (ML) to help them answer scientific questions. Like AI and ML continue to evolve and progress, as does the complexity of running them on supercomputers and distributed computer networks.

PosEiDon aims to advance knowledge of how simulation and machine learning methodologies can be harnessed and amplified to improve DOE computing and data science applications. Image from PosEiDon.

Scientists from DOEThe Argonne national laboratory tackles this challenge by modeling, simulating, predicting and optimizing the performance of workflows. These workflows orchestrate and manage large compute and data science applications running on supercomputers connected by large data transfer networks, such as those connecting the DOEnational laboratories, user facilities and data storage centers across the country. A new project funded by the DOE, PosEiDon: Platform for Explainable Distributed Infrastructure, looks to AI and ML to improve the performance of these workflows.

By optimizing the scientific workflows that run on distributed computing and data infrastructure, we will be able to accelerate scientific discovery, ”said Prasanna Balaprakash, IT manager at Argonne whose research focuses on efficient machine learning methods for scientific applications. The results could accelerate the discovery of new battery materials, aid exploration of the universe, advance the science of nuclear physics, and improve climate simulations.

Balaprakash and his team in Argonne will collaborate on PosEiDon with partners from the University of Southern California, the DOEfrom the Lawrence Berkeley National Laboratory and the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill. The interdisciplinary team brings together a unique combination of expertise in high performance computing, simulation and modeling, workflows, networking and detection of anomalies, such as problems in the system. Together, they will model scientific workflows, predict their performance, automatically identify performance anomalies, and optimize all workflows to ensure they run as quickly and efficiently as possible.

the DOE operates some of the fastest supercomputers in the world. As science experiments and the simulations taking place there become more complex, the process of understanding results has become more distributed. Often times, experiments conducted in one location transfer data to another for processing by a supercomputer, which extracts more data from other sources and then sends the results back to scientists. Workflows provide a way to manage the complexity of these large scientific enterprises. Essentially, it is a series of highly interdependent tasks that must be performed in a certain order, in a certain location, with minimal human intervention.

First, the project will use traditional modeling and simulation approaches to simulate workflows executed on different compute and data infrastructures. However, this approach is computationally expensive as it can take several weeks to simulate even a few hundred workflow setups. But using ML, the team will drastically reduce this time. Once PosEiDon is complete, it will be able to predict millions of workflow setups within minutes to determine which one will perform best. To this end, PosEiDon will rely on DeepHyper, a scalable automated system ML package.

With DeepHyper, we will automate the design and development of ML models required to predict workflow performance, to detect anomalies and to tune performance, ”said Balaprakash. In addition, scientists will be able to use the predictive models to identify anomalies or differences between the duration of the workflow and the actual duration. It will also be able to tell scientists where, when and why the anomaly is occurring, so they can identify and resolve any issues in the workflow or in the computer system.

Once PosEiDon is settled, researchers will test it on several real science issues at Argonne and elsewhere. DOE computer installations. These include nuclear physics and weather and climate simulations, which will run on various computing resources and distributed supercomputers, including the next-generation Polaris and Aurora systems from the Argonne Leadership Computing Facility (ALCF), a DOE User installation of the Science Office.

Balaprakash hopes that this project will accelerate and expand the use of AI for scientific applications.A breakthrough in simulation, modeling and optimization of workflows will not only improve DOEartificial intelligence and machine learning applications, but it is also radically changing the way computer and data science can be used to pursue new scientific discoveries in a variety of fields, ”he said.

PosEiDon is funded by the Ministry of Energy as part of the Integrated Computing and Data Infrastructure (ICDI) for the Scientific Discovery Program. To learn more about the project, visit the PosEiDon website.

DeepHyper is funded by Balaprakash DOE Early Career Scholarship for the Advanced Scientific Computing Research program within the DOE Science office.

The Argonne Leadership Computing Center provides high-performance computing capabilities to the scientific and engineering community to advance fundamental discovery and understanding across a wide range of disciplines. Supported by the US Department of Energy (DOE‘s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of the two DOE Leading IT facilities in the country dedicated to open science.

Argonne National Laboratory seeks solutions to urgent national problems in science and technology. The country’s leading national laboratory, Argonne conducts cutting-edge fundamental and applied scientific research in virtually all scientific disciplines. Argonne researchers work closely with researchers from hundreds of businesses, universities, and federal, state, and municipal agencies to help them solve their specific problems, advance U.S. scientific leadership, and prepare the nation for a better future. With employees over 60 nations, Argonne is managed by UChicago Argonne, SARL for the Office of Science of the US Department of Energy.

The Office of Science of the United States Department of Energy is the largest proponent of basic physical science research in the United States and strives to address some of the most pressing challenges of our time. For more information, visit https: // ener gy .gov / s c ience.

Source: Liz Thompson, Argonne National Laboratory


About Author

Comments are closed.