Chilingaryan S., Ametova E., Kopmann A., Mirone A.

in Proceedings – 2018 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018 (2019) 158-166, 8645862. DOI:10.1109/CAHPC.2018.8645862


© 2018 IEEE.Synchrotron X-ray imaging is a powerful method to investigate internal structures down to the micro and nanoscopic scale. Fast cameras recording thousands of frames per second allow time-resolved studies with a high temporal resolution. Fast image reconstruction is essential to provide the synchrotron instrumentation with the imaging information required to track and control the process under study. Traditionally Filtered Back Projection algorithm is used for tomographic reconstruction. In this article, we discuss how to implement the algorithm on nowadays GPGPU architectures efficiently. The key is to achieve balanced utilization of available GPU subsystems. We present two highly optimized algorithms to perform back projection on parallel hardware. One is relying on the texture engine to perform reconstruction, while another one utilizes the Core computational units of the GPU. Both methods outperform current state-of-the-art techniques found in the standard reconstructions codes significantly. Finally, we propose a hybrid approach combining both algorithms to better balance load between G PU subsystems. It further boosts the performance by about 30 % on NVIDIA Pascal micro-architecture.

Harbaum T., Balzer M., Weber M., Becker J.

in International System on Chip Conference, 2018-September (2019) 118-123, 8618493. DOI:10.1109/SOCC.2018.8618493


© 2018 IEEE.Modern high-energy physics experiments such as the Compact Muon Solenoid experiment at CERN produce an extraordinary amount of data every 25ns. To handle a data rate of more than 50Tbit/s a multi-level trigger system is required, which reduces the data rate. Due to the increased luminosity after the Phase-II-Upgrade of the LHC, the CMS tracking system has to be redesigned. The current trigger system is unable to handle the resulting amount of data after this upgrade. Because of the latency of a few microseconds the Level 1 Track Trigger has to be implemented in hardware. State-of-the-art pattern recognition filter the incoming data by template matching on ASICs with a content addressable memory architecture. A first implementation on an FPGA, which replaces the content addressable memory of the ASIC, has been developed. This design combines the advantages of a content addressable memory and an efficient utilization of the logics elements of an FPGA. This paper presents an extension of this FPGA design, which is based on the idea of data compression and assemble the stored data to appropriate packages and drastically reduces the required number of write and read cycles. Furthermore, the extended design meets the strong timing constraints, possesses the required properties of the content addressable memory and enabled a compressed storage of an increased amount of data.

Otte F., Farago T., Moosmann J., Hipp A.C., Hammel J.U., Beckmann F.

in AIP Conference Proceedings, 2054 (2019), 060084. DOI:10.1063/1.5084715


© 2019 Author(s). The Helmholtz-Zentrum Geesthacht, Germany, is operating the user experiments for microtomography at the beamlines P05 and P07 using synchrotron radiation produced in the storage ring PETRA III at DESY, Hamburg, Germany. In recent years the software pipeline and sample changing hardware for performing high throughput experiments were developed. To test and optimize the different measurement techniques together with quantification of the quality of different reconstruction algorithms a software framework to simulate experiments was implemented. Results from simulated microtomography experiments using the photon source characteristics of P05 will be shown.

Chilingaryan S., Ametova E., Kopmann A., Mirone A.

in Journal of Real-Time Image Processing (2019). DOI:10.1007/s11554-019-00883-w


© 2019, The Author(s).Back-Projection is the major algorithm in Computed Tomography to reconstruct images from a set of recorded projections. It is used for both fast analytical methods and high-quality iterative techniques. X-ray imaging facilities rely on Back-Projection to reconstruct internal structures in material samples and living organisms with high spatial and temporal resolution. Fast image reconstruction is also essential to track and control processes under study in real-time. In this article, we present efficient implementations of the Back-Projection algorithm for parallel hardware. We survey a range of parallel architectures presented by the major hardware vendors during the last 10 years. Similarities and differences between these architectures are analyzed and we highlight how specific features can be used to enhance the reconstruction performance. In particular, we build a performance model to find hardware hotspots and propose several optimizations to balance the load between texture engine, computational and special function units, as well as different types of memory maximizing the utilization of all GPU subsystems in parallel. We further show that targeting architecture-specific features allows one to boost the performance 2–7 times compared to the current state-of-the-art algorithms used in standard reconstructions codes. The suggested load-balancing approach is not limited to the back-projection but can be used as a general optimization strategy for implementing parallel algorithms.

Caselle M., Rota L., Kopmann A., Chilingaryan S.A., Mahaveer Patil M., Wang W., Brundermann E., Funkner S., Nasse M., Niehues G., Norbert Balzer M., Weber M., Muller A.S., Bielawski S.

in Proceedings of SPIE – The International Society for Optical Engineering, 10937 (2019), 1093704. DOI:10.1117/12.2508451


© 2019 SPIE.KALYPSO is a novel detector operating at line rates above 10 Mfps. It consists of a detector board connected to FPGA based readout card for real time data processing. The detector board holds a Si or InGaAs linear array sensor, with spectral sensitivity ranging from 400 nm to 2600 nm, which is connected to a custom made front-end ASIC. A FPGA readout framework performs the real time data processing. In this contribution, we present the detector system, the readout electronics and the heterogeneous infrastructure for machine learning processing. The detector is currently in use at several synchrotron facilities for beam diagnostics as well as for single-pulse laser characterizations. Thanks to the shot-to-shot capability over long time scale, new attractive applications are open up for imaging in biological and medical research.

Caselle M., Brundermann E., Dusterer S., Funkner S., Gerth C., Haack D., Kopmann A., Patil M.M., Makowski D., Mielczarek A., Nasse M., Niehues G., Rota L., Steffen B., Wang W., Balzer M.N., Weber M., Muller A.S., Bielawski S.

in Proceedings of SPIE – The International Society for Optical Engineering, 10903 (2019), 1090306. DOI:10.1117/12.2511341


© COPYRIGHT SPIE. Downloading of the abstract is permitted for personal use only.KALYPSO is a novel detector operating at line rates above 10 Mfps. The detector board holds a silicon or InGaAs linear array sensor with spectral sensitivity ranging from 400 nm to 2600 nm. The sensor is connected to a cutting-edge, custom designed, ASIC readout chip, which is responsible for the remarkable frame rate. The FPGA readout architecture enables continuous data acquisition and processing in real time. This detector is currently employed in many synchrotron facilities for beam diagnostics and for the characterization of self-built Ytterbium-doped fiber laser emitting around 1050 nm with a bandwidth of 40 nm.

van de Kamp T., Schwermann A.H., dos Santos Rolo T., Losel P.D., Engler T., Etter W., Farago T., Gottlicher J., Heuveline V., Kopmann A., Mahler B., Mors T., Odar J., Rust J., Tan Jerome N., Vogelgesang M., Baumbach T., Krogmann L.

in Nature Communications, 9 (2018), 3325. DOI:10.1038/s41467-018-05654-y


© 2018, The Author(s). About 50% of all animal species are considered parasites. The linkage of species diversity to a parasitic lifestyle is especially evident in the insect order Hymenoptera. However, fossil evidence for host–parasitoid interactions is extremely rare, rendering hypotheses on the evolution of parasitism assumptive. Here, using high-throughput synchrotron X-ray microtomography, we examine 1510 phosphatized fly pupae from the Paleogene of France and identify 55 parasitation events by four wasp species, providing morphological and ecological data. All species developed as solitary endoparasitoids inside their hosts and exhibit different morphological adaptations for exploiting the same hosts in one habitat. Our results allow systematic and ecological placement of four distinct endoparasitoids in the Paleogene and highlight the need to investigate ecological data preserved in the fossil record.

Steinmann J.L., Boltz T., Brosi M., Brundermann E., Caselle M., Kehrer B., Rota L., Schonfeldt P., Schuh M., Siegel M., Weber M., Muller A.-S.

in Physical Review Accelerators and Beams, 21 (2018), 110705. DOI:10.1103/PhysRevAccelBeams.21.110705


© 2018 authors. Published by the American Physical Society. Published by the American Physical Society under the terms of the. Electron accelerators and synchrotrons can be operated to provide short emission pulses due to longitudinally compressed or substructured electron bunches. Above a threshold current, the high charge density leads to the microbunching instability and the formation of substructures on the bunch shape. These time-varying substructures on bunches of picoseconds-long duration lead to bursts of coherent synchrotron radiation in the terahertz frequency range. Therefore, the spectral information in this range contains valuable information about the bunch length, shape and substructures. Based on the KAPTURE readout system, a 4-channel single-shot THz spectrometer capable of recording 500 million spectra per second and streaming readout is presented. First measurements of time-resolved spectra are compared to simulation results of the Inovesa Vlasov-Fokker-Planck solver. The presented results lead to a better understanding of the bursting dynamics especially above the micro-bunching instability threshold.

Buzmakov A.V., Asadchikov V.E., Zolotov D.A., Roshchin B.S., Dymshits Y.M., Shishkov V.A., Chukalina M.V., Ingacheva A.S., Ichalova D.E., Krivonosov Y.S., Dyachkova I.G., Balzer M., Castele M., Chilingaryan S., Kopmann A.

in Crystallography Reports, 63 (2018) 1057-1061. DOI:10.1134/S106377451806007X


© 2018, Pleiades Publishing, Inc. Abstract: The design of a new automatic X-ray microtomograph is described. The parameters of optical schemes and X-ray detectors in use are presented. Methods for automating experiments, processing tomographic data, and getting access to them are reported.

Funkner S., Brosi M., Briindcrmantr E., Caselle M., Nasse M.J., Niehues G., Rota L., Schonfeldr P., Weber M., Muller A.-S.

in International Conference on Infrared, Millimeter, and Terahertz Waves, IRMMW-THz, 2018-September (2018), 8510080. DOI:10.1109/IRMMW-THz.2018.8510080


© 2018 IEEE. At the KArlsruhe Research Accelerator (KARA), we use electro-optical sampling to measures profiles of compressed electron bunches during the microbunching instability. The observation of the complex dynamics of this instability is of special interest because it leads to intense THz radiation bursts. As the revolution frequency of the storage ring is 2.72 MHz, high detection rates are required to record the bunch profiles for every revolution with single-shot measurements. To achieve fast detection rates, we implemented a KIT-developed ultra-fast line array and recorded the electron bunch charge density for every revolution for 3.6 s with a data throughput of 1.4 GBytes/s.