Publications of the IPE expert group for embedded parallel systems
Rota L., Caselle M., Brundermann E., Funkner S., Gerth C., Kehrer B., Mielczarek A., Makowski D., Mozzanica A., Muller A.-S., Nasse M.J., Niehues G., Patil M., Schmitt B., Schonfeldt P., Steffen B., Weber M.
in Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment (2018). DOI:10.1016/j.nima.2018.10.093
© 2018 Synchrotrons and modern FEL light sources operate with bunch repetition rates in the MHz range. The profile of the electron beam inside the accelerator can be characterized with indirect experimental techniques where linear array detectors are employed to measure the emitted synchrotron radiation or the spectrum of a near-IR laser. To improve the performance of modern beam diagnostics we have developed KALYPSO, a detector system operating with a continuous frame rate of up to 2.7 MHz. To facilitate the integration in different experiments, a modular architecture has been adopted. Different semiconductor micro-strip sensors can be connected to front-end ASICs to optimize the quantum efficiency at different photon energies, ranging from visible light up to near-IR. The front-end electronics are integrated within an heterogeneous DAQ consisting of FPGAs and GPUs, which allows scientists to implement real-time data processing algorithms. The current version of the detector is in operation at the KARA synchrotron light source and at the European XFEL. In this contribution we present the detector architecture, the performance results and the on-going technical developments.
Aggleton R. et al.
in Journal of Instrumentation, 12 (2017), P12019. DOI:10.1088/1748-0221/12/12/P12019
© 2017 CERN. A new tracking detector is under development for use by the CMS experiment at the High-Luminosity LHC (HL-LHC). A crucial requirement of this upgrade is to provide the ability to reconstruct all charged particle tracks with transverse momentum above 2-3 GeV within 4 μs so they can be used in the Level-1 trigger decision. A concept for an FPGA-based track finder using a fully time-multiplexed architecture is presented, where track candidates are reconstructed using a projective binning algorithm based on the Hough Transform, followed by a combinatorial Kalman Filter. A hardware demonstrator using MP7 processing boards has been assembled to prove the entire system functionality, from the output of the tracker readout boards to the reconstruction of tracks with fitted helix parameters. It successfully operates on one eighth of the tracker solid angle acceptance at a time, processing events taken at 40 MHz, each with up to an average of 200 superimposed proton-proton interactions, whilst satisfying the latency requirement. The demonstrated track-reconstruction system, the chosen architecture, the achievements to date and future options for such a system will be discussed.
Kopmann A., Chilingaryan S., Vogelgesang M., Dritschler T., Shkarin A., Shkarin R., Dos Santos Rolo T., Farago T., Van De Kamp T., Balzer M., Caselle M., Weber M., Baumbach T.
in 2016 IEEE Nuclear Science Symposium, Medical Imaging Conference and Room-Temperature Semiconductor Detector Workshop, NSS/MIC/RTSD 2016, 2017-January (2017), 8069895. DOI:10.1109/NSSMIC.2016.8069895
© 2016 IEEE. New imaging stations aim for high spatial and temporal resolution and are characterized by ever increasing sampling rates and demanding data processing workflows. Key to successful imaging experiments is to open up high-performance computing resources. This includes carefully selected components for computing hardware and development of advanced imaging algorithms optimized for efficient use of parallel processor architectures. We present the novel UFO computing platform for online data processing for imaging experiments and image-based feedback. The platform handles the full data life cycle from the X-ray detector to long-term data archives. Core components of this system are an FPGA platform for ultra-fast data acquisition, the GPU-based UFO image processing framework, and the fast control system “Concert”. Reconstruction algorithms implemented in the UFO framework are optimized for the latest GPU architectures and provide a reconstruction throughput in the GB/s-range. The control system “Concert” integrates high-speed computing nodes and fast beamline devices and thus enables image-based control loops and advanced workflow automation for efficient beam time usage. Low latencies are ensured by direct communication between FPGA and GPUs using AMDs DirectGMA technology. Time resolved tomography is supported by cutting edge regularization methods for high quality reconstructions with a reduced number of projections. The new infrastructure at ANKA has dramatically accelerated tomography from hours to second and resulted in new application fields, like high-throughput tomography, pump-probe radiography and stroboscopic tomography. Ultra-fast X-ray cine-tomography for the first time allows one to observe internal dynamics of moving millimeter-sized objects in real-time.
Aggleton R. et al.
in 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017 (2017), 8056825. DOI:10.23919/FPL.2017.8056825
© 2017 Ghent University. The Compact Muon Solenoid (CMS) experiment at CERN is scheduled for a major upgrade in the next decade in order to meet the demands of the new High Luminosity Large Hadron Collider. Amongst others, a new tracking system is under development including an outer tracker capable of rejecting low transverse momentum particles by looking at the coincidences of hits (stubs) in two closely spaced sensor layers in the same tracker module. Accepted stubs are transmitted off-detector for further processing at 40 MHz. In order to maintain under the increased luminosity the Level-1 trigger rate at 750 kHz, tracker data need to be included in the decision making process. For this purpose, a system architecture has to be developed that will be able to identify particles with transverse momentum above 3 GeV/c by building tracks out of stubs, while achieving an overall processing latency of maximum 4us. Targeting these requirements the current paper presents an FPGA-based track finding architecture that identifies track candidates in real-time and bases its functionality on a fully time-multiplexed approach. As a proof of concept, a hardware system has been assembled targeting the MP7 MicroTCA processing card that features a Xilinx Virtex-7 FPGA, demonstrating a realistic slice of the track finder. The paper discusses the algorithms’ implementation and the efficient utilisation of the available FPGA resources, it outlines the system architecture, and presents some of the hardware demonstrator results.
Adam W. et al.
in Journal of Instrumentation, 12 (2017), P06018. DOI:10.1088/1748-0221/12/06/P06018
© 2017 CERN for the benefit of the CMS collaboration.The upgrade of the LHC to the High-Luminosity LHC (HL-LHC) is expected to increase the LHC design luminosity by an order of magnitude. This will require silicon tracking detectors with a significantly higher radiation hardness. The CMS Tracker Collaboration has conducted an irradiation and measurement campaign to identify suitable silicon sensor materials and strip designs for the future outer tracker at the CMS experiment. Based on these results, the collaboration has chosen to use n-in-p type silicon sensors and focus further investigations on the optimization of that sensor type. This paper describes the main measurement results and conclusions that motivated this decision.
Gentsos C., Fedi G., Magazzu G., Magalotti D., Modak A., Storchi L., Palla F., Bilei G.M., Biesuz N., Chowdhury S.R., Crescioli F., Checcucci B., Tcherniakhovski D., Galbit G.C., Baulieu G., Balzer M.N., Sander O., Viret S., Servoli L., Nikolaidis S.
in 2017 6th International Conference on Modern Circuits and Systems Technologies, MOCAST 2017 (2017), 7937676. DOI:10.1109/MOCAST.2017.7937676
© 2017 IEEE. The increase of the luminosity in the High Luminosity upgrade of the CERN Large Hadron Collider (HL-LHC) will require the use of Tracker information in the evaluation of the Level-1 trigger in order to keep the trigger rate acceptable (i.e.: <1MHz). In order to extract the track information within the latency constraints (<5μs), a custom real-time system is necessary. We developed a prototype of the main building block of this system, the Pattern Recognition Mezzanine (PRM) that combines custom Associative Memory ASICs with modern FPGA devices. The architecture, functionality and test results of the PRM are described in the present work.
Mohr H., Dritschler T., Ardila L.E., Balzer M., Caselle M., Chilingaryan S., Kopmann A., Rota L., Schuh T., Vogelgesang M., Weber M.
in Journal of Instrumentation, 12 (2017), C04019. DOI:10.1088/1748-0221/12/04/C04019
© 2017 IOP Publishing Ltd and Sissa Medialab srl. In this work, we investigate the use of GPUs as a way of realizing a low-latency, high-throughput track trigger, using CMS as a showcase example. The CMS detector at the Large Hadron Collider (LHC) will undergo a major upgrade after the long shutdown from 2024 to 2026 when it will enter the high luminosity era. During this upgrade, the silicon tracker will have to be completely replaced. In the High Luminosity operation mode, luminosities of 5-7 × 1034 cm-2s-1 and pileups averaging at 140 events, with a maximum of up to 200 events, will be reached. These changes will require a major update of the triggering system. The demonstrated systems rely on dedicated hardware such as associative memory ASICs and FPGAs. We investigate the use of GPUs as an alternative way of realizing the requirements of the L1 track trigger. To this end we implemeted a Hough transformation track finding step on GPUs and established a low-latency RDMA connection using the PCIe bus. To showcase the benefits of floating point operations, made possible by the use of GPUs, we present a modified algorithm. It uses hexagonal bins for the parameter space and leads to a more truthful representation of the possible track parameters of the individual hits in Hough space. This leads to fewer duplicate candidates and reduces fake track candidates compared to the regular approach. With data-transfer latencies of 2 μs and processing times for the Hough transformation as low as 3.6 μs, we can show that latencies are not as critical as expected. However, computing throughput proves to be challenging due to hardware limitations.
Kaever P., Balzer M., Kopmann A., Zimmer M., Rongen H.
in Journal of Instrumentation, 12 (2017), C04004. DOI:10.1088/1748-0221/12/04/C04004
© 2017 IOP Publishing Ltd and Sissa Medialab srl. Various centres of the German Helmholtz Association (HGF) started in 2012 to develop a modular data acquisition (DAQ) platform, covering the entire range from detector readout to data transfer into parallel computing environments. This platform integrates generic hardware components like the multi-purpose HGF-Advanced Mezzanine Card or a smart scientific camera framework, adding user value with Linux drivers and board support packages. Technically the scope comprises the DAQ-chain from FPGA-modules to computing servers, notably frontend-electronics-interfaces, microcontrollers and GPUs with their software plus high-performance data transmission links. The core idea is a generic and component-based approach, enabling the implementation of specific experiment requirements with low effort. This so called DTS-platform will support standards like MTCA.4 in hard- and software to ensure compatibility with commercial components. Its capability to deploy on other crate standards or FPGA-boards with PCI express or Ethernet interfaces remains an essential feature. Competences of the participating centres are coordinated in order to provide a solid technological basis for both research topics in the Helmholtz Programme “Matter and Technology”: “Detector Technology and Systems” and “Accelerator Research and Development”. The DTS-platform aims at reducing costs and development time and will ensure access to latest technologies for the collaboration. Due to its flexible approach, it has the potential to be applied in other scientific programs.
Caselle M., Perez L.E.A., Balzer M., Dritschler T., Kopmann A., Mohr H., Rota L., Vogelgesang M., Weber M.
in Journal of Instrumentation, 12 (2017), C03015. DOI:10.1088/1748-0221/12/03/C03015
© 2017 IOP Publishing Ltd and Sissa Medialab srl. Modern data acquisition and trigger systems require a throughput of several GB/s and latencies of the order of microseconds. To satisfy such requirements, a heterogeneous readout system based on FPGA readout cards and GPU-based computing nodes coupled by InfiniBand has been developed. The incoming data from the back-end electronics is delivered directly into the internal memory of GPUs through a dedicated peer-to-peer PCIe communication. High performance DMA engines have been developed for direct communication between FPGAs and GPUs using “DirectGMA (AMD)” and “GPUDirect (NVIDIA)” technologies. The proposed infrastructure is a candidate for future generations of event building clusters, high-level trigger filter farms and low-level trigger system. In this paper the heterogeneous FPGA-GPU architecture will be presented and its performance be discussed.
Caselle M., Perez L.E.A., Balzer M., Kopmann A., Rota L., Weber M., Brosi M., Steinmann J., Brundermann E., Muller A.-S.
in Journal of Instrumentation, 12 (2017), C01040. DOI:10.1088/1748-0221/12/01/C01040
© 2017 IOP Publishing Ltd and Sissa Medialab srl. This paper presents a novel data acquisition system for continuous sampling of ultra-short pulses generated by terahertz (THz) detectors. Karlsruhe Pulse Taking Ultra-fast Readout Electronics (KAPTURE) is able to digitize pulse shapes with a sampling time down to 3 ps and pulse repetition rates up to 500 MHz. KAPTURE has been integrated as a permanent diagnostic device at ANKA and is used for investigating the emitted coherent synchrotron radiation in the THz range. A second version of KAPTURE has been developed to improve the performance and flexibility. The new version offers a better sampling accuracy for a pulse repetition rate up to 2 GHz. The higher data rate produced by the sampling system is processed in real-time by a heterogeneous FPGA and GPU architecture operating up to 6.5 GB/s continuously. Results in accelerator physics will be reported and the new design of KAPTURE be discussed.