Publications of the IPE expert group for embedded parallel systems
Category: Electronics
An FPGA-based track finder for the L1 trigger of the CMS experiment at the high luminosity LHC
Amstutz C. et al.
in 2016 IEEE-NPSS Real Time Conference, RT 2016 (2016), 7543102. DOI:10.1109/RTC.2016.7543102
Abstract
© 2016 IEEE.A new tracking system is under development for operation in the CMS experiment at the High Luminosity LHC. It includes an outer tracker which will construct stubs, built by correlating clusters in two closely spaced sensor layers for the rejection of hits from low transverse momentum tracks, and transmit them off-detector at 40 MHz. If tracker data is to contribute to keeping the Level-1 trigger rate at around 750 kHz under increased luminosity, a crucial component of the upgrade will be the ability to identify tracks with transverse momentum above 3 GeV/c by building tracks out of stubs. A concept for an FPGA-based track finder using a fully time-multiplexed architecture is presented, where track candidates are identified using a projective binning algorithm based on the Hough Transform. A hardware system based on the MP7 MicroTCA processing card has been assembled, demonstrating a realistic slice of the track finder in order to help gauge the performance and requirements for a full system. This paper outlines the system architecture and algorithms employed, highlighting some of the first results from the hardware demonstrator and discusses the prospects and performance of the completed track finder.
A high-throughput readout architecture based on PCI-Express Gen3 and DirectGMA technology
Rota L., Vogelgesang M., Perez L.E.A., Caselle M., Chilingaryan S., Dritschler T., Zilio N., Kopmann A., Balzer M., Weber M.
in Journal of Instrumentation, 11 (2016), P02007. DOI:10.1088/1748-0221/11/02/P02007
Abstract
© 2016 IOP Publishing Ltd and Sissa Medialab srl.Modern physics experiments produce multi-GB/s data rates. Fast data links and high performance computing stages are required for continuous data acquisition and processing. Because of their intrinsic parallelism and computational power, GPUs emerged as an ideal solution to process this data in high performance computing applications. In this paper we present a high-throughput platform based on direct FPGA-GPU communication. The architecture consists of a Direct Memory Access (DMA) engine compatible with the Xilinx PCI-Express core, a Linux driver for register access, and high- level software to manage direct memory transfers using AMD’s DirectGMA technology. Measurements with a Gen3 x8 link show a throughput of 6.4 GB/s for transfers to GPU memory and 6.6 GB/s to system memory. We also assess the possibility of using the architecture in low latency systems: preliminary measurements show a round-trip latency as low as 1 μs for data transfers to system memory, while the additional latency introduced by OpenCL scheduling is the current limitation for GPU based systems. Our implementation is suitable for real-time DAQ system applications ranging from photon science and medical imaging to High Energy Physics (HEP) systems.
Low-cost bump-bonding processes for high energy physics pixel detectors
Caselle M., Blank T., Colombo F., Dierlamm A., Husemann U., Kudella S., Weber M.
in Journal of Instrumentation, 11 (2016), C01050. DOI:10.1088/1748-0221/11/01/C01050
Abstract
© 2016 IOP Publishing Ltd and Sissa Medialab srl.In the next generation of collider experiments detectors will be challenged by unprecedented particle fluxes. Thus large detector arrays of highly pixelated detectors with minimal dead area will be required at reasonable costs. Bump-bonding of pixel detectors has been shown to be a major cost-driver. KIT is one of five production centers of the CMS barrel pixel detector for the Phase I Upgrade. In this contribution the SnPb bump-bonding process and the production yield is reported. In parallel to the production of the new CMS pixel detector, several alternatives to the expensive photolithography electroplating/electroless metal deposition technologies are developing. Recent progress and challenges faced in the development of bump-bonding technology based on gold-stud bonding by thin (15 μm) gold wire is presented. This technique allows producing metal bumps with diameters down to 30 μm without using photolithography processes, which are typically required to provide suitable under bump metallization. The short setup time for the bumping process makes gold-stud bump-bonding highly attractive (and affordable) for the flip-chipping of single prototype ICs, which is the main limitation of the current photolithography processes.
Influence of filling pattern structure on synchrotron radiation spectrum at ANKA
Steinmann J.L., Blomley E., Brosi M., Brundermann E., Caselle M., Hiller N., Kehrer B., Muller A.-S., Schedler M., Schonfeldt P., Schuh M., Schwarz M., Siegel M.
in IPAC 2016 – Proceedings of the 7th International Particle Accelerator Conference (2016) 2855-2857.
Abstract
Copyright © 2016 CC-BY-3.0 and by the respective authors. We present the effects of the filling pattern structure in multi-bunch mode on the beam spectrum. This effects can be seen by all detectors whose resolution is better than the RF frequency, ranging from stripline and Schottky measurements to high resolution synchrotron radiation measurements. Our heterodyne measurements of the emitted coherent synchrotron radiation at 270 GHz reveal discrete frequency harmonics around the 100 000th revolution harmonic of ANKA, the synchrotron radiation facility in Karlsruhe, Germany. Significant effects of bunch spacing, gaps between bunch trains and variations in individual bunch currents on the emitted CSR spectrum are described by theory and supported by observations.
Fast mapping of terahertz bursting thresholds and characteristics at synchrotron light sources
Brosi M., Steinmann J.L., Blomley E., Brundermann E., Caselle M., Hiller N., Kehrer B., Mathis Y.-L., Nasse M.J., Rota L., Schedler M., Schonfeldt P., Schuh M., Schwarz M., Weber M., Muller A.-S.
in Physical Review Special Topics – Accelerators and Beams, 19 (2016), 110701. DOI:10.1103/PhysRevAccelBeams.19.110701
Abstract
© 2016, American Physical Society. All rights reserved. Dedicated optics with extremely short electron bunches enable synchrotron light sources to generate intense coherent THz radiation. The high degree of spatial compression in this so-called low-αc optics entails a complex longitudinal dynamics of the electron bunches, which can be probed studying the fluctuations in the emitted terahertz radiation caused by the microbunching instability (“bursting”). This article presents a “quasi-instantaneous” method for measuring the bursting characteristics by simultaneously collecting and evaluating the information from all bunches in a multibunch fill, reducing the measurement time from hours to seconds. This speed-up allows systematic studies of the bursting characteristics for various accelerator settings within a single fill of the machine, enabling a comprehensive comparison of the measured bursting thresholds with theoretical predictions by the bunched-beam theory. This paper introduces the method and presents first results obtained at the ANKA synchrotron radiation facility.
High-throughput data acquisition and processing for real-time X-ray imaging
Vogelgesang M., Rota L., Perez L.E.A., Caselle M., Chilingaryan S., Kopmann A.
in Proceedings of SPIE – The International Society for Optical Engineering, 9967 (2016), 996715. DOI:10.1117/12.2237611
Abstract
© Copyright 2016 SPIE. With ever-increasing data rates due to stronger light sources and better detectors, X-ray imaging experiments conducted at synchrotron beamlines face bandwidth and processing limitations that inhibit efficient workflows and prevent real-time operations. We propose an experiment platform comprised of programmable hardware and optimized software to lift these limitations and make beamline setups future-proof. The hardware consists of an FPGA-based data acquisition system with custom logic for data pre-processing and a PCIe data connection for transmission of currently up to 6.6 GB/s. Moreover, the accompanying firmware supports pushing data directly into GPU memory using AMD’s DirectGMA technology without crossing system memory first. The GPUs are used to pre-process projection data and reconstruct final volumetric data with OpenCL faster than possible with CPUs alone. Besides, more efficient use of resources this enables a real-time preview of a reconstruction for early quality assessment of both experiment setup and the investigated sample. The entire system is designed in a modular way and allows swapping all components, e.g. replacing our custom FPGA camera with a commercial system but keep reconstructing data with GPUs. Moreover, every component is accessible using a low-level C library or using a high-level Python interface in order to integrate these components in any legacy environment.
A PCIe DMA Architecture for Multi-Gigabyte per Second Data Transmission
Rota L., Caselle M., Chilingaryan S., Kopmann A., Weber M.
in IEEE Transactions on Nuclear Science, 62 (2015) 972-976, 7111377. DOI:10.1109/TNS.2015.2426877
Abstract
© 2014 IEEE.We developed a direct memory access (DMA) engine compatible with the Xilinx PCI Express (PCIe) core to provide a high-performance and low-occupancy alternative to commercial solutions. In order to maximize the PCIe throughput while minimizing the FPGA resources utilization, the DMA engine adopts a novel strategy where the DMA address list is stored inside the FPGA and not in the central memory of the host CPU. The FPGA design package is complemented with simple register access to control the DMA engine by a Linux driver. The design is compatible with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint Generation 1 and 2 with all lane configurations (x1, x2, x4, x8). A multi-engine architecture is also presented, where two x8 lanes cores are used in parallel together with a PCIe bridge, to exploit fully the capabilities of a PCIe Gen2 x16 lanes link. A data throughput of 3461 MBytes/s has been achieved with a single PCIe Gen2 x8 lanes endpoint. If the dual-engine architecture is used, the throughput is increased up to 6920 MBytes/s. The presented DMA is currently used in several experiments at the ANKA synchrotron light source.
A control system and streaming DAQ platform with image-based trigger for X-ray imaging
Stevanovic U., Caselle M., Cecilia A., Chilingaryan S., Farago T., Gasilov S., Herth A., Kopmann A., Vogelgesang M., Balzer M., Baumbach T., Weber M.
in IEEE Transactions on Nuclear Science, 62 (2015) 911-918, 7111386. DOI:10.1109/TNS.2015.2425911
Abstract
© 1963-2012 IEEE.High-speed X-ray imaging applications play a crucial role for non-destructive investigations of the dynamics in material science and biology. On-line data analysis is necessary for quality assurance and data-driven feedback, leading to a more efficent use of a beam time and increased data quality. In this article we present a smart camera platform with embedded Field Programmable Gate Array (FPGA) processing that is able to stream and process data continuously in real-time. The setup consists of a Complementary Metal-Oxide-Semiconductor (CMOS) sensor, an FPGA readout card, and a readout computer. It is seamlessly integrated in a new custom experiment control system called Concert that provides a more efficient way of operating a beamline by integrating device control, experiment process control, and data analysis. The potential of the embedded processing is demonstrated by implementing an image-based trigger. It records the temporal evolution of physical events with increased speed while maintaining the full field of view. The complete data acquisition system, with Concert and the smart camera platform was successfully integrated and used for fast X-ray imaging experiments at KIT’s synchrotron radiation facility ANKA.
A PCIe DMA Architecture for Multi-Gigabyte per Second Data Transmission
Rota L., Caselle M., Chilingaryan S., Kopmann A., Weber M.
in IEEE Transactions on Nuclear Science, 62 (2015) 972-976, 7111377. DOI:10.1109/TNS.2015.2426877
Abstract
© 2014 IEEE. We developed a direct memory access (DMA) engine compatible with the Xilinx PCI Express (PCIe) core to provide a high-performance and low-occupancy alternative to commercial solutions. In order to maximize the PCIe throughput while minimizing the FPGA resources utilization, the DMA engine adopts a novel strategy where the DMA address list is stored inside the FPGA and not in the central memory of the host CPU. The FPGA design package is complemented with simple register access to control the DMA engine by a Linux driver. The design is compatible with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint Generation 1 and 2 with all lane configurations (x1, x2, x4, x8). A multi-engine architecture is also presented, where two x8 lanes cores are used in parallel together with a PCIe bridge, to exploit fully the capabilities of a PCIe Gen2 x16 lanes link. A data throughput of 3461 MBytes/s has been achieved with a single PCIe Gen2 x8 lanes endpoint. If the dual-engine architecture is used, the throughput is increased up to 6920 MBytes/s. The presented DMA is currently used in several experiments at the ANKA synchrotron light source.
A control system and streaming DAQ platform with image-based trigger for X-ray imaging
Stevanovic U., Caselle M., Cecilia A., Chilingaryan S., Farago T., Gasilov S., Herth A., Kopmann A., Vogelgesang M., Balzer M., Baumbach T., Weber M.
in IEEE Transactions on Nuclear Science, 62 (2015) 911-918, 7111386. DOI:10.1109/TNS.2015.2425911
Abstract
© 1963-2012 IEEE. High-speed X-ray imaging applications play a crucial role for non-destructive investigations of the dynamics in material science and biology. On-line data analysis is necessary for quality assurance and data-driven feedback, leading to a more efficent use of a beam time and increased data quality. In this article we present a smart camera platform with embedded Field Programmable Gate Array (FPGA) processing that is able to stream and process data continuously in real-time. The setup consists of a Complementary Metal-Oxide-Semiconductor (CMOS) sensor, an FPGA readout card, and a readout computer. It is seamlessly integrated in a new custom experiment control system called Concert that provides a more efficient way of operating a beamline by integrating device control, experiment process control, and data analysis. The potential of the embedded processing is demonstrated by implementing an image-based trigger. It records the temporal evolution of physical events with increased speed while maintaining the full field of view. The complete data acquisition system, with Concert and the smart camera platform was successfully integrated and used for fast X-ray imaging experiments at KIT’s synchrotron radiation facility ANKA.