LLAna
The LBNL/LCLS Pilot for Data Analytics Project (LLAna) was a 1-year initiative (2019-2020) to enhance LCLS data analysis capabilities by improving HDF5 interoperability, optimizing HPC workflows for I/O-intensive tasks, and scaling Jupyter for high-performance interactive use at NERSC.
Key Objectives
- Develop an HDF5 interface for LCLS-II data to enable cross-platform access, addressing variable data structures and read-while-write functionality.
- Optimize I/O management and resource scheduling for LCLS workflows on HPC systems, including Burst Buffer staging and data redundancy reduction.
- Scale Jupyter for large-core data processing and integrate HPC workflows directly within interactive notebooks.
- Automate data movement and analysis across HPC storage layers to improve end-to-end workflow performance and user experience.
Partners & Collaborators
Jana Thayer (PI, SLAC)
Deborah Bard (PI, LBNL/NERSC)
Shreyas Cholia, Jupyter (LBNL)
Murali Shankar, Jupyter (SLAC)
Matthew Henderson, Jupyter (LBNL)
Muammar El Khatib Rodriguez, Jupyter (LBNL)
Suren Byna, HDF5 (LBNL)
Chris O’Grady, HDF5 (SLAC)
Quincey Koziol, HDF5 (LBNL)
Tony Li, HDF5 (LBNL)
Lavanya Ramakrishnan, Workflows (LBNL)
Wilko Kroeger, Workflows (LBNL)
Devarshi Ghoshal, Workflows (LBNL)
Anna Giannakou, Workflows (LBNL)
Results
- Created a common software environment and Jupyter Kernel to run PSANA and Dask through Jupyter Notebooks: https://github.com/llanaproject/psana_jupyter_kernel
- Realtime publishing of intermediate results from PSANA application through Dask Pub/Sub and Queues: https://github.com/llanaproject/psana_jupyter_demo
- Running JUpyter on NERSC Cori with real time processing of results: https://github.com/llanaproject/psana_jupyter_demo/blob/master/demo/demo.ipynb
- HDF5-xtc vol connector documentation: https://drive.google.com/open?id=1C-YMYgStIA5_fo51z3RWp_ma3cz5FT2l
- HDF5-xtc vol connector source code: https://github.com/slac-lcls/lcls2/tree/master/xtc_vol