Farago T., Mikulik P., Ershov A., Vogelgesang M., Hanschke D., Baumbach T.

in Journal of Synchrotron Radiation, 24 (2017) 1283-1295. DOI:10.1107/S1600577517012255

Abstract

© International Union of Crystallography, 2017. An open-source framework for conducting a broad range of virtual X-ray imaging experiments, syris, is presented. The simulated wavefield created by a source propagates through an arbitrary number of objects until it reaches a detector. The objects in the light path and the source are time-dependent, which enables simulations of dynamic experiments, e.g. four-dimensional time-resolved tomography and laminography. The high-level interface of syris is written in Python and its modularity makes the framework very flexible. The computationally demanding parts behind this interface are implemented in OpenCL, which enables fast calculations on modern graphics processing units. The combination of flexibility and speed opens new possibilities for studying novel imaging methods and systematic search of optimal combinations of measurement conditions and data processing parameters. This can help to increase the success rates and efficiency of valuable synchrotron beam time. To demonstrate the capabilities of the framework, various experiments have been simulated and compared with real data. To show the use case of measurement and data processing parameter optimization based on simulation, a virtual counterpart of a high-speed radiography experiment was created and the simulated data were used to select a suitable motion estimation algorithm; one of its parameters was optimized in order to achieve the best motion estimation accuracy when applied on the real data. syris was also used to simulate tomographic data sets under various imaging conditions which impact the tomographic reconstruction accuracy, and it is shown how the accuracy may guide the selection of imaging conditions for particular use cases.The flexible and efficient framework syris is presented and its capabilities for the simulation of four-dimensional X-ray imaging experiments are demonstrated by two exemplary applications.

Mohr H., Dritschler T., Ardila L.E., Balzer M., Caselle M., Chilingaryan S., Kopmann A., Rota L., Schuh T., Vogelgesang M., Weber M.

in Journal of Instrumentation, 12 (2017), C04019. DOI:10.1088/1748-0221/12/04/C04019

Abstract

© 2017 IOP Publishing Ltd and Sissa Medialab srl. In this work, we investigate the use of GPUs as a way of realizing a low-latency, high-throughput track trigger, using CMS as a showcase example. The CMS detector at the Large Hadron Collider (LHC) will undergo a major upgrade after the long shutdown from 2024 to 2026 when it will enter the high luminosity era. During this upgrade, the silicon tracker will have to be completely replaced. In the High Luminosity operation mode, luminosities of 5-7 × 1034 cm-2s-1 and pileups averaging at 140 events, with a maximum of up to 200 events, will be reached. These changes will require a major update of the triggering system. The demonstrated systems rely on dedicated hardware such as associative memory ASICs and FPGAs. We investigate the use of GPUs as an alternative way of realizing the requirements of the L1 track trigger. To this end we implemeted a Hough transformation track finding step on GPUs and established a low-latency RDMA connection using the PCIe bus. To showcase the benefits of floating point operations, made possible by the use of GPUs, we present a modified algorithm. It uses hexagonal bins for the parameter space and leads to a more truthful representation of the possible track parameters of the individual hits in Hough space. This leads to fewer duplicate candidates and reduces fake track candidates compared to the regular approach. With data-transfer latencies of 2 μs and processing times for the Hough transformation as low as 3.6 μs, we can show that latencies are not as critical as expected. However, computing throughput proves to be challenging due to hardware limitations.

Kaever P., Balzer M., Kopmann A., Zimmer M., Rongen H.

in Journal of Instrumentation, 12 (2017), C04004. DOI:10.1088/1748-0221/12/04/C04004

Abstract

© 2017 IOP Publishing Ltd and Sissa Medialab srl. Various centres of the German Helmholtz Association (HGF) started in 2012 to develop a modular data acquisition (DAQ) platform, covering the entire range from detector readout to data transfer into parallel computing environments. This platform integrates generic hardware components like the multi-purpose HGF-Advanced Mezzanine Card or a smart scientific camera framework, adding user value with Linux drivers and board support packages. Technically the scope comprises the DAQ-chain from FPGA-modules to computing servers, notably frontend-electronics-interfaces, microcontrollers and GPUs with their software plus high-performance data transmission links. The core idea is a generic and component-based approach, enabling the implementation of specific experiment requirements with low effort. This so called DTS-platform will support standards like MTCA.4 in hard- and software to ensure compatibility with commercial components. Its capability to deploy on other crate standards or FPGA-boards with PCI express or Ethernet interfaces remains an essential feature. Competences of the participating centres are coordinated in order to provide a solid technological basis for both research topics in the Helmholtz Programme “Matter and Technology”: “Detector Technology and Systems” and “Accelerator Research and Development”. The DTS-platform aims at reducing costs and development time and will ensure access to latest technologies for the collaboration. Due to its flexible approach, it has the potential to be applied in other scientific programs.

Caselle M., Perez L.E.A., Balzer M., Dritschler T., Kopmann A., Mohr H., Rota L., Vogelgesang M., Weber M.

in Journal of Instrumentation, 12 (2017), C03015. DOI:10.1088/1748-0221/12/03/C03015

Abstract

© 2017 IOP Publishing Ltd and Sissa Medialab srl. Modern data acquisition and trigger systems require a throughput of several GB/s and latencies of the order of microseconds. To satisfy such requirements, a heterogeneous readout system based on FPGA readout cards and GPU-based computing nodes coupled by InfiniBand has been developed. The incoming data from the back-end electronics is delivered directly into the internal memory of GPUs through a dedicated peer-to-peer PCIe communication. High performance DMA engines have been developed for direct communication between FPGAs and GPUs using “DirectGMA (AMD)” and “GPUDirect (NVIDIA)” technologies. The proposed infrastructure is a candidate for future generations of event building clusters, high-level trigger filter farms and low-level trigger system. In this paper the heterogeneous FPGA-GPU architecture will be presented and its performance be discussed.

Caselle M., Perez L.E.A., Balzer M., Kopmann A., Rota L., Weber M., Brosi M., Steinmann J., Brundermann E., Muller A.-S.

in Journal of Instrumentation, 12 (2017), C01040. DOI:10.1088/1748-0221/12/01/C01040

Abstract

© 2017 IOP Publishing Ltd and Sissa Medialab srl. This paper presents a novel data acquisition system for continuous sampling of ultra-short pulses generated by terahertz (THz) detectors. Karlsruhe Pulse Taking Ultra-fast Readout Electronics (KAPTURE) is able to digitize pulse shapes with a sampling time down to 3 ps and pulse repetition rates up to 500 MHz. KAPTURE has been integrated as a permanent diagnostic device at ANKA and is used for investigating the emitted coherent synchrotron radiation in the THz range. A second version of KAPTURE has been developed to improve the performance and flexibility. The new version offers a better sampling accuracy for a pulse repetition rate up to 2 GHz. The higher data rate produced by the sampling system is processed in real-time by a heterogeneous FPGA and GPU architecture operating up to 6.5 GB/s continuously. Results in accelerator physics will be reported and the new design of KAPTURE be discussed.

Bergmann T., Balzer M., Bormann D., Chilingaryan S.A., Eitel K., Kleifges M., Kopmann A., Kozlov V., Menshikov A., Siebenborn B., Tcherniakhovski D., Vogelgesang M., Weber M.

in 2015 IEEE Nuclear Science Symposium and Medical Imaging Conference, NSS/MIC 2015 (2016), 7581841. DOI:10.1109/NSSMIC.2015.7581841

Abstract

© 2015 IEEE. The EDELWEISS experiment, located in the underground laboratory LSM (France), is one of the leading experiments using cryogenic germanium (Ge) detectors for a direct search for dark matter. For the EDELWEISS-III phase, a new scalable data acquisition (DAQ) system was designed and built, based on the ‘IPE4 DAQ system’, which has already been used for several experiments in astroparticle physics.

Rota L., Vogelgesang M., Perez L.E.A., Caselle M., Chilingaryan S., Dritschler T., Zilio N., Kopmann A., Balzer M., Weber M.

in Journal of Instrumentation, 11 (2016), P02007. DOI:10.1088/1748-0221/11/02/P02007

Abstract

© 2016 IOP Publishing Ltd and Sissa Medialab srl.Modern physics experiments produce multi-GB/s data rates. Fast data links and high performance computing stages are required for continuous data acquisition and processing. Because of their intrinsic parallelism and computational power, GPUs emerged as an ideal solution to process this data in high performance computing applications. In this paper we present a high-throughput platform based on direct FPGA-GPU communication. The architecture consists of a Direct Memory Access (DMA) engine compatible with the Xilinx PCI-Express core, a Linux driver for register access, and high- level software to manage direct memory transfers using AMD’s DirectGMA technology. Measurements with a Gen3 x8 link show a throughput of 6.4 GB/s for transfers to GPU memory and 6.6 GB/s to system memory. We also assess the possibility of using the architecture in low latency systems: preliminary measurements show a round-trip latency as low as 1 μs for data transfers to system memory, while the additional latency introduced by OpenCL scheduling is the current limitation for GPU based systems. Our implementation is suitable for real-time DAQ system applications ranging from photon science and medical imaging to High Energy Physics (HEP) systems.

Vogelgesang M., Rota L., Perez L.E.A., Caselle M., Chilingaryan S., Kopmann A.

in Proceedings of SPIE – The International Society for Optical Engineering, 9967 (2016), 996715. DOI:10.1117/12.2237611

Abstract

© Copyright 2016 SPIE. With ever-increasing data rates due to stronger light sources and better detectors, X-ray imaging experiments conducted at synchrotron beamlines face bandwidth and processing limitations that inhibit efficient workflows and prevent real-time operations. We propose an experiment platform comprised of programmable hardware and optimized software to lift these limitations and make beamline setups future-proof. The hardware consists of an FPGA-based data acquisition system with custom logic for data pre-processing and a PCIe data connection for transmission of currently up to 6.6 GB/s. Moreover, the accompanying firmware supports pushing data directly into GPU memory using AMD’s DirectGMA technology without crossing system memory first. The GPUs are used to pre-process projection data and reconstruct final volumetric data with OpenCL faster than possible with CPUs alone. Besides, more efficient use of resources this enables a real-time preview of a reconstruction for early quality assessment of both experiment setup and the investigated sample. The entire system is designed in a modular way and allows swapping all components, e.g. replacing our custom FPGA camera with a commercial system but keep reconstructing data with GPUs. Moreover, every component is accessible using a low-level C library or using a high-level Python interface in order to integrate these components in any legacy environment.

Vogelgesang M., Farago T., Morgeneyer T.F., Helfen L., Dos Santos Rolo T., Myagotin A., Baumbach T.

in Journal of Synchrotron Radiation, 23 (2016) 1254-1263. DOI:10.1107/S1600577516010195

Abstract

© 2016 International Union of Crystallography.Real-time processing of X-ray image data acquired at synchrotron radiation facilities allows for smart high-speed experiments. This includes workflows covering parameterized and image-based feedback-driven control up to the final storage of raw and processed data. Nevertheless, there is presently no system that supports an efficient construction of such experiment workflows in a scalable way. Thus, here an architecture based on a high-level control system that manages low-level data acquisition, data processing and device changes is described. This system is suitable for routine as well as prototypical experiments, and provides specialized building blocks to conduct four-dimensional in situ, in vivo and operando tomography and laminography.