Publications of the IPE expert group for embedded parallel systems
Rota L., Caselle M., Chilingaryan S., Kopmann A., Weber M.
in IEEE Transactions on Nuclear Science (2015). DOI:10.1109/TNS.2015.2426877
We developed a direct memory access (DMA) engine compatible with the Xilinx PCI Express (PCIe) core to provide a high-performance and low-occupancy alternative to commercial solutions. In order to maximize the PCIe throughput while minimizing the FPGA resources utilization, the DMA engine adopts a novel strategy where the DMA address list is stored inside the FPGA and not in the central memory of the host CPU. The FPGA design package is complemented with simple register access to control the DMA engine by a Linux driver. The design is compatible with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint Generation 1 and 2 with all lane configurations (x1, x2, x4, x8). A multi-engine architecture is also presented, where two x8 lanes cores are used in parallel together with a PCIe bridge, to exploit fully the capabilities of a PCIe Gen2 x16 lanes link. A data throughput of 3461 MBytes/s has been achieved with a single PCIe Gen2 x8 lanes endpoint. If the dual-engine architecture is used, the throughput is increased up to 6920 MBytes/s. The presented DMA is currently used in several experiments at the ANKA synchrotron light source.
Dyroff C., Sanati S., Christner E., Zahn A., Balzer M., Bouquet H., McManus J.B., Gonzalez-Ramos Y., Schneider M.
in Atmospheric Measurement Techniques, 8 (2015) 2037-2049. DOI:10.5194/amt-8-2037-2015
© Author(s) 2015.Vertical profiles of water vapor (H2O) and its isotope ratio D/H expressed as δD(H
Caselle M., Brosi M., Chilingaryan S., Dritschler T., Judin V., Kopmann A., Mueller A.-S., Raasch J., Smale N.J., Steinmann J., Vogelgesang M., Wuensch S., Siegel M., Weber M.
in 2014 19th IEEE-NPSS Real Time Conference, RT 2014 – Conference Records (2015), 7097535. DOI:10.1109/RTC.2014.7097535
© 2014 IEEE. Since a few years Coherent Synchrotron Radiation (CSR) generated by short electron bunches is provided at the ANKA synchrotron light source. To study the THz emission characteristics over multiple revolutions superconducting YBa2Cu3O7-δ (YBCO) thin-film detectors can be used. The intrinsic response time of YBCO thin films is in the order of a few picoseconds only. For fast and continuous sampling of this individual ultra-short terahertz pulses a novel digitizer system has been developed with programmable sampling times in the range of 3 to 100 ps. The Real-time system is based on a heterogeneous FPGA and GPU architecture for on-line pulse reconstruction and evaluations of the peak amplitudes and the time between consecutive bunches. The data is transmitted to a GPU computing node by a fast data transfer link based on a bus master DMA engine connected to PCI express endpoint logic. This new DMA architecture ensures a continuous high data throughput of up to 4 GByte/s. The presented DAQ system is able to resolve the bursting behavior of single bunches even in a multi-bunch environment and to study the bunch-bunch-interactions.
Rota L., Caselle M., Chilingaryan S., Kopmann A., Weber M.
in 2014 19th IEEE-NPSS Real Time Conference, RT 2014 – Conference Records (2015), 7097561. DOI:10.1109/RTC.2014.7097561
© 2014 IEEE. PCI Express (PCIe) is a high-speed serial point-to-point interconnect that delivers high-performance data throughput. KIT has developed a Direct Memory Access (DMA) engine compatible with the Xilinx PCIe core to provide a smart and low-occupancy alternative logic to expensive commercial solutions. In order to maximize the PCIe throughput the DMA engine adopts a new strategy, where the DMA descriptor list is stored inside the FPGA and not in the central memory system. The FPGA design package is complemented with a simple register access to control the DMA engine by a Linux driver. A handshaking sequence between the DMA engine and the Linux driver ensures that no errors occure, even in data transfers of several hundreds of Gigabytes. The design has been tested with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint generation 1 and 2 with all lane configurations (x1, x2, x4, x8, x16). Data throughput of more than 3.4 GB/s has been achieved with a PCIe Gen 2 ×8 lanes endpoint. The proposed DMA is currently used in several experiments at the ANKA synchrotron light source.
Amsbaugh, J.F. et al.
in Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment Volume 778, 1 April 2015, Pages 40-60
The focal-plane detector system for the KArlsruhe TRItium Neutrino (KATRIN) experiment consists of a multi-pixel silicon p-i-n-diode array, custom readout electronics, two superconducting solenoid magnets, an ultra high-vacuum system, a high-vacuum system, calibration and monitoring devices, a scintillating veto, and a custom data-acquisition system. It is designed to detect the low-energy electrons selected by the KATRIN main spectrometer. We describe the system and summarize its performance after its final installation. © 2015 Elsevier B.V. All rights reserved.
Steinmann J.L., Brosi M., Brundermann E., Caselle M., Hertle E., Hiller N., Kehrer B., Muller A.-S., Schonfeldt P., Schuh M., Schutze P., Schwarz M., Hesler J.
in 6th International Particle Accelerator Conference, IPAC 2015 (2015) 1509-1511.
Copyright © 2015 CC-BY-3.0 and by the respective authors. Interferometry is the quasi-standard for spectral measurements in the THz- and IR-range. The frequency resolution, however, is limited by the travel range of the interferometer mirrors. Therefore, a resolution in the low megahertz range would require interferometer arms of about 100 m. As an alternative, heterodyne measurements provide a resolution in the Hertz range, an improvement of 6 orders of magnitude. Here we present measurements done at ANKA with a VDI WR3.4SAX, a mixer that can be tuned to frequencies from 220 GHz to 330 GHz and we show how the bunch filling pattern influences the amplitude of specific frequencies.
Brosi M., Caselle M., Hertle E., Hiller N., Kopmann A., Muller A.-S., Schonfeldt P., Schwarz M., Steinmann J.L., Weber M.
in 6th International Particle Accelerator Conference, IPAC 2015 (2015) 882-884.
Copyright © 2015 CC-BY-3.0 and by the respective authors. The ANKA storage ring of the Karlsruhe Institute of Technology (KIT) operates in the energy range from 0.5 to 2.5 GeV and generates brilliant coherent synchrotron radiation in the THz range with a dedicated bunch length reducing optic. The producing of radiation in the so-called THz-gap is challenging, but this intense THz radiation is very attractive for certain user experiments. The high degree of compression in this so-called low-alpha optics leads to a complex longitudinal dynamics of the electron bunches. The resulting micro-bunching instability leads to time dependent fluctuations and strong bursts in the radiated THz power. The study of these fluctuations in the emitted THz radiation provides insight into the longitudinal beam dynamics. Fast THz detectors combined with KAPTURE, the dedicated KArlsruhe Pulse Taking and Ultrafast Readout Electronics system developed at KIT, allow the simultaneous measurement of the radiated THz intensity for each bunch individually in a multibunch environment. This contribution gives an overview of the first experience gained using this setup as an online diagnostics tool.
Menshikov A., Balzer M., Kleifges M.
in SEI 2015 – 106. Tagung der Studiengruppe Elektronische Instrumentierung im Fruhjahr 2015 (2015) 127-138.
Birk M., Balzer M., Ruiter N.V., Becker J.
in Computers and Electrical Engineering, 40 (2014) 1171-1185. DOI:10.1016/j.compeleceng.2013.11.033
In heterogeneous computing, application developers have to identify the best-suited target platform from a variety of alternatives. In this work, we compare performance and architectural efficiency of Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) for two algorithms taken from a novel medical imaging method named 3D ultrasound computer tomography. From the 40 nm and 28 nm generations, we use top-notch devices and those with similar power consumption values. For our two benchmark algorithms from the signal processing and imaging domain, the results show that if power consumption is not considered, the GPU and FPGA from the 40nm generation give both, a similar performance and efficiency per transistor. In the 28 nm process, in contrast, the FPGA is superior to its GPU counterpart by 86% and 39%, depending on the algorithm. If power is limited, FPGAs outperform GPUs in each investigated case by at least a factor of four. © 2013 Elsevier Ltd. All rights reserved.
Brogna A.S., Balzer M., Smale S., Hartmann J., Bormann D., Hamann E., Cecilia A., Zuber M., Koenig T., Zwerger A., Weber M., Fiederle M., Baumbach T.
in Journal of Instrumentation, 9 (2014), C05047. DOI:10.1088/1748-0221/9/05/C05047
In this work we present a novel readout electronics for an X-ray sensor based on a Si crystal bump-bonded to an array of 3 × 2 Medipix ASICs. The pixel size is 55 μm × 55 μm with a total number of ∼ 400k pixels and a sensitive area of 42 mm × 28 mm. The readout electronics operate Medipix-2 MXR or Timepix ASICs with a clock speed of 125 MHz. The data acquisition system is centered around an FPGA and each of the six ASICs has a dedicated I/O port for simultaneous data acquisition. The settings of the auxiliary devices (ADCs and DACs) are also processed in the FPGA. Moreover, a high-resolution timer operates the electronic shutter to select the exposure time from 8 ns to several milliseconds. A sophisticated trigger is available in hardware and software to synchronize the acquisition with external electro-mechanical motors. The system includes a diagnostic subsystem to check the sensor temperature and to control the cooling Peltier cells and a programmable high-voltage generator to bias the crystal. A network cable transfers the data, encapsulated into the UDP protocol and streamed at 1 Gb/s. Therefore most notebooks or personal computers are able to process the data and to program the system without a dedicated interface. The data readout software is compatible with the well-known Pixelman 2.x running both on Windows and GNU/Linux. Furthermore the open architecture encourages users to write their own applications. With a low-level interface library which implements all the basic features, a MATLAB or Python script can be implemented for special manipulations of the raw data. In this paper we present selected images taken with a microfocus X-ray tube to demonstrate the capability to collect the data at rates up to 120 fps corresponding to 0.76 Gb/s. © 2014 IOP Publishing Ltd and Sissa Medialab srl.