Data Systems
Challenges and Opportunities at LCLS
X-ray facilities have the potential to address scientific grand challenges that underpin the future of the energy sector. Cross cutting these science challenges is a recognition of the importance of complexity, structural heterogeneity, spontaneous fluctuations, rare events, and quantum phenomena in determining the properties and functionality of advanced real-world materials and molecular assemblies. The timescales for such dynamics span many decades and are directly coupled with structural changes that span from atomic scale to mesoscale. Direct observation and characterization at these scales is essential for a deeper understanding of these operating systems. Next generation light sources are driving a qualitative advance on these challenges using coherent X-rays to rapidly collect massive data sets. Advanced data science using AI/ML methods will be essential to fully exploit the incredibly rich information content within this torrent of new data and to further provide rapid feedback to intelligently guide experiments in real time.
The newest light sources, X-ray free electron lasers (XFELs), are revolutionary new tools for energy science. In particular, the high repetition rate of Linac Coherent Light Source upgrades, LCLS-II and LCLS-II-HE, will provide a transformational capability to collect 108 -1010 scattering patterns or spectra (each an independent measurement) per day. Such data rates will enable us to map the reaction pathways of operating man-made catalysts and natural enzymes for the first time. This insight will be essential to develop more efficient and selective catalysts for chemical processing, energy production and related chemical needs that underpin modern industrial society. In functioning electrochemical cells, it will be possible to directly follow the flow of ions to reveal the atomic distortions, strain fields, and transient defects in matter that mediate the performance of energy storage materials - new insight that will be essential for major advances in performance. By coupling the unprecedented capabilities of high-rep-rate XFELs with advanced AI/ML algorithms, Edge model inference (e.g. for feature extraction), and use of HPC to analyze or retrain data in near real-time for the purpose of feeding new models to the Edge, it will become possible to gain a deeper understanding of the natural systems that will guide the design of new molecular complexes and materials with tailored functionality.
LCLS Computing Status, Drivers, and Challenges
Each of the seven LCLS instruments offers unique capabilities to study many different areas of science using the unique XFEL beam properties. Advanced computing systems are playing an increasingly important role in facility operation, data interpretation, and overall scientific productivity. The LCLS-II upgrade will increase the repetition rate from 120 Hz to 1 MHz. Coupled with the adoption ofu ltra-high repetition rate imaging detectors, there will be a significant increase in the data throughput from today’s 1 - 5 GB/s to 200 GB/s in 2023. Future planned upgrades are expected to increase the throughput to multiple TB/s by 2028. Corresponding advances in data handling and computing are required to manage the quality and quantity of the data. Computing demands at LCLS are driven by the repetition rate of the source, advances in detector technology, advances in data analysis algorithms, and the requirement to provide flexible and easy-to-use fast feedback to users in real-time, a challenge given the weekly turnaround of experiments and the number of new user groups. Filtering and feature extraction layers are required to remove non-essential information from the data, enhancing the quality of the data that are taken while simultaneously reducing the data volume. The LCLS-II data system provides core hardware and software infrastructure for scalable data acquisition, online data reduction, real-time monitoring, fast feedback for data quality monitoring, data storage and management, data archiving, and access to local and remote computing for offline analysis, enabling scientists to efficiently go from measurement to scientific insight.
The same data system is used at all instruments. The LCLS data system handles the transparent data movement within several layers of computing in the pipeline from the detector Edge through the data reduction compute layer, to the data cache where it is accessible to users for fast feedback analysis. Analysis is performed by user teams using available computing resources at the SLAC Shared Science Data Facility (S3DF), DOE HPC facilities, or users’ home institutions. Analysis is typically begun while the experiment is running but can be refined and repeated with different parameters many times after execution. In order to enable autonomous experiment steering and ensure experiment success, real-time feedback is critical so low latency turnaround of analysis is vital to LCLS’ success. Computing systems are important to facility operation, data interpretation and overall scientific productivity. Depending on the scientific workflow, there are a number of data paths that may be exercised: 1) LCLS facility to S3DF, 2) LCLS facility to ASCR compute facility, 3) LCLS facility to university/home institution (~10% of experiments, expected to shrink as datasets expand) and 4) LCLS facility to other light source facility. LCLS facility to S3DF is used by 80% of experiments and are expected to required processing resources on the order of PFLOPs and storage capacities about 100 PB by 2028. Approximately 20% of experiments have computational requirements that cannot be met by the local facility compute resources and must be offloaded to ASCR facilities, usually NERSC. These will heavily leverage ESnet network resrouces to facilitate data mobility. LCLS to S3DF/ASCR workflows are typically used to analyze raw data for data quality monitoring and experiment feedback or to do post-experiment analysis, archive LCLS data sets, train/retrain AI/ML models on specialized DCAI, transmit simulation data that can be used during the experiment, and to do multi-modal analysis.
Key elements of a future data management strategy for LCLS include a common API for accessing network and computing resources, parallel data transfer tools, high-fidelity data transfer, network performance monitoring, reservations, and dynamic network provisioning. Experiment data remain on disk for 4 months following the experiment. On average, users rerun over their entire data set up to 10 times. Experiment data and metadata collected at LCLS may be stored and retrieved from tape for a period of 10 years. Two copies of the data are made, one at SLAC, and one at a remote site such as NERSC. LCLS provides space for all experimental data at no additional cost to the user. Current datasets are between 1 - 10 TB in size, but LCLS-II datasets are expected to be about 1 PB per shift of several PBs per experiment.
The development and application of autonomous experiment steering requires that a number of gaps in capabilities and infrastructure be addressed: sufficient and reliable bandwidth, sufficient and sustainable storage resources, on-demand access to large-scale computing systems, transparent access to facilities and systems, software tools and infrastructure to facilitate the development of scientific data workflows, and the development of specialized algorithms to provide actionable information for the purpose of DAQ control and algorithms to intelligently guide experiment decisions.
Atomic, molecular, and optical science: By inducing energy modulations in the LCLS electron beam, high-current spikes could be produced in a magnetic chicane, which upon injection into an undulator led to the generation of widely tunable attosecond pulses with an intensity sufficient to drive nonlinear processes. The attosecond pulses generated at LCLS enable attosecond pump- attosecond probe experiments, as well as experiments where core-shell electrons are used to observe electron dynamics from the viewpoint of a particular atom. These attosecond capabilities recently led to a first major scientific result, namely the observation of coherent electron motion during Auger-Meitner decay and should lead to a deeper understanding of the role of electron correlation in quantum materials.
Biological science: LCLS experiments have revealed exquisite detail of ligand binding to several challenging membrane proteins by the use of the serial femtosecond crystallography method, which was developed to capture diffraction of protein crystals from single sample-destroying LCLS pulses. The LCLS has engendered a major advance in the understanding of important enzyme-catalyzed reactions. Using LCLS, short-lived intermediates and transient states of enzyme reactions are simultaneously characterized spectroscopically and “seen” in crystal structures. Further, the femtosecond time scale of LCLS pulses is highly relevant to chemical reactions. Data on the spectroscopic and structural properties of intermediates substantially advance reaction theory and our ability to simulate reaction trajectories in other enzyme systems.
Condensed-phase chemistry and catalysis: Experiments at LCLS have changed the way the scientific community thinks about molecules. Seminal and paradigm-shifting discoveries were enabled by the unique capabilities to probe the electronic and geometric properties of molecules and chemical systems at a more fundamental level than was ever possible before. Completely new observables were revealed, and the field was pushed beyond the boundaries of what is currently known and/or presumed to be true. This created new paradigms for chemical transformations. The fundamental new knowledge and the new approaches provide templates for probing a broad array of reactions spanning chemical, biological, and materials chemistries for new ways of translating the fundamental understanding of chemical systems to solutions of key challenges for a sustainable future.
Gas-phase chemistry: LCLS experiments provide the long sought-after separation of structural and electronic effects, allowing the extraction of molecular dynamics directly from the experimental results. The pioneering experimental capabilities at LCLS include: 1) the ability to record time-resolved X-ray diffraction using the free-electron laser and time-resolved electron diffraction through the Ultrafast Electron Diffraction (UED) instrument; 2) the ability to implement spectroscopic techniques based on site-specific core-level photoabsorption. The UED instrument enabled the study of ring opening of cyclohexadiene with unprecedented sub- ängstrom spatial and femtosecond temporal resolution. The X-ray wavelength of the LCLS allows the development of measurement schemes that are based on diffraction, which are ideally suited for measuring time-dependent structural changes, as well as the development of novel techniques that are uniquely sensitive to time-dependent electronic changes in a molecule, and moreover, from the viewpoint of a particular atom within the molecule.
Materials science and condensed matter physics: LCLS, with complementary advances in theory, simulation, and materials synthesis, has vastly expanded the range of materials phenomena that the materials community can study and ultimately control. Discoveries at LCLS have initiated the coherent control of excitations in a previously inaccessible regime in which it is possible to use light and other experimental tools to interact directly with the relevant physical phenomena. Atomic-scale information derived from X-ray scattering studies at LCLS has provided an understanding of the transformations of materials between phases, specifically in highly disordered systems for which the dynamics at short time scales have been previously unknown. LCLS has also provided new insights into energetically fragile many-electron dynamics of quantum materials via the separation of coupled dynamics in the time domain, based on its unique combination of selective near-equilibrium perturbation of coupled modes with precision snapshots of the electronic and atomic structure at fundamental length and time scales. Novel techniques and methods pioneered at LCLS will continue to offer new insight into the structure and excited states of solids and nanoscale materials.
Materials in extreme conditions: Experiments at LCLS have provided unique insight into strength and plasticity at the lattice level, enabling links to macroscopic models that bridge micro- to macro-length scales in an area of importance to national security. The Matter in Extreme Conditions (MEC) platform at LCLS has provided completely new insights into pressure induced phase transitions at ultrahigh rates, demonstrating, for example, that the most complex phase a single element can form (an incommensurate host-guest structure) develops in less than a nanosecond. Complex physics that may take place in planetary physics – for example the properties of hydrocarbons in large planets – is being explored with LCLS. Finally, the ability of LCLS to create uniform hot (multi-million Kelvin) plasmas at exactly solid density is providing completely a new understanding of dense quantum plasmas.