Rota L., Caselle M., Chilingaryan S., Kopmann A., Weber M.

in IEEE Transactions on Nuclear Science (2015). DOI:10.1109/TNS.2015.2426877

Abstract

We developed a direct memory access (DMA) engine compatible with the Xilinx PCI Express (PCIe) core to provide a high-performance and low-occupancy alternative to commercial solutions. In order to maximize the PCIe throughput while minimizing the FPGA resources utilization, the DMA engine adopts a novel strategy where the DMA address list is stored inside the FPGA and not in the central memory of the host CPU. The FPGA design package is complemented with simple register access to control the DMA engine by a Linux driver. The design is compatible with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint Generation 1 and 2 with all lane configurations (x1, x2, x4, x8). A multi-engine architecture is also presented, where two x8 lanes cores are used in parallel together with a PCIe bridge, to exploit fully the capabilities of a PCIe Gen2 x16 lanes link. A data throughput of 3461 MBytes/s has been achieved with a single PCIe Gen2 x8 lanes endpoint. If the dual-engine architecture is used, the throughput is increased up to 6920 MBytes/s. The presented DMA is currently used in several experiments at the ANKA synchrotron light source.

Rota L., Caselle M., Chilingaryan S., Kopmann A., Weber M.

in 2014 19th IEEE-NPSS Real Time Conference, RT 2014 – Conference Records (2015), 7097561. DOI:10.1109/RTC.2014.7097561

Abstract

© 2014 IEEE. PCI Express (PCIe) is a high-speed serial point-to-point interconnect that delivers high-performance data throughput. KIT has developed a Direct Memory Access (DMA) engine compatible with the Xilinx PCIe core to provide a smart and low-occupancy alternative logic to expensive commercial solutions. In order to maximize the PCIe throughput the DMA engine adopts a new strategy, where the DMA descriptor list is stored inside the FPGA and not in the central memory system. The FPGA design package is complemented with a simple register access to control the DMA engine by a Linux driver. A handshaking sequence between the DMA engine and the Linux driver ensures that no errors occure, even in data transfers of several hundreds of Gigabytes. The design has been tested with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint generation 1 and 2 with all lane configurations (x1, x2, x4, x8, x16). Data throughput of more than 3.4 GB/s has been achieved with a PCIe Gen 2 ×8 lanes endpoint. The proposed DMA is currently used in several experiments at the ANKA synchrotron light source.

Caselle M., Brosi M., Chilingaryan S., Dritschler T., Judin V., Kopmann A., Mueller A.-S., Raasch J., Smale N.J., Steinmann J., Vogelgesang M., Wuensch S., Siegel M., Weber M.

in 2014 19th IEEE-NPSS Real Time Conference, RT 2014 – Conference Records (2015), 7097535. DOI:10.1109/RTC.2014.7097535

Abstract

© 2014 IEEE. Since a few years Coherent Synchrotron Radiation (CSR) generated by short electron bunches is provided at the ANKA synchrotron light source. To study the THz emission characteristics over multiple revolutions superconducting YBa2Cu3O7-δ (YBCO) thin-film detectors can be used. The intrinsic response time of YBCO thin films is in the order of a few picoseconds only. For fast and continuous sampling of this individual ultra-short terahertz pulses a novel digitizer system has been developed with programmable sampling times in the range of 3 to 100 ps. The Real-time system is based on a heterogeneous FPGA and GPU architecture for on-line pulse reconstruction and evaluations of the peak amplitudes and the time between consecutive bunches. The data is transmitted to a GPU computing node by a fast data transfer link based on a bus master DMA engine connected to PCI express endpoint logic. This new DMA architecture ensures a continuous high data throughput of up to 4 GByte/s. The presented DAQ system is able to resolve the bursting behavior of single bunches even in a multi-bunch environment and to study the bunch-bunch-interactions.

Steinmann J.L., Brosi M., Brundermann E., Caselle M., Hertle E., Hiller N., Kehrer B., Muller A.-S., Schonfeldt P., Schuh M., Schutze P., Schwarz M., Hesler J.

in 6th International Particle Accelerator Conference, IPAC 2015 (2015) 1509-1511.

Abstract

Copyright © 2015 CC-BY-3.0 and by the respective authors. Interferometry is the quasi-standard for spectral measurements in the THz- and IR-range. The frequency resolution, however, is limited by the travel range of the interferometer mirrors. Therefore, a resolution in the low megahertz range would require interferometer arms of about 100 m. As an alternative, heterodyne measurements provide a resolution in the Hertz range, an improvement of 6 orders of magnitude. Here we present measurements done at ANKA with a VDI WR3.4SAX, a mixer that can be tuned to frequencies from 220 GHz to 330 GHz and we show how the bunch filling pattern influences the amplitude of specific frequencies.

Brosi M., Caselle M., Hertle E., Hiller N., Kopmann A., Muller A.-S., Schonfeldt P., Schwarz M., Steinmann J.L., Weber M.

in 6th International Particle Accelerator Conference, IPAC 2015 (2015) 882-884.

Abstract

Copyright © 2015 CC-BY-3.0 and by the respective authors. The ANKA storage ring of the Karlsruhe Institute of Technology (KIT) operates in the energy range from 0.5 to 2.5 GeV and generates brilliant coherent synchrotron radiation in the THz range with a dedicated bunch length reducing optic. The producing of radiation in the so-called THz-gap is challenging, but this intense THz radiation is very attractive for certain user experiments. The high degree of compression in this so-called low-alpha optics leads to a complex longitudinal dynamics of the electron bunches. The resulting micro-bunching instability leads to time dependent fluctuations and strong bursts in the radiated THz power. The study of these fluctuations in the emitted THz radiation provides insight into the longitudinal beam dynamics. Fast THz detectors combined with KAPTURE, the dedicated KArlsruhe Pulse Taking and Ultrafast Readout Electronics system developed at KIT, allow the simultaneous measurement of the radiated THz intensity for each bunch individually in a multibunch environment. This contribution gives an overview of the first experience gained using this setup as an online diagnostics tool.

Birk M., Balzer M., Ruiter N.V., Becker J.

in Computers and Electrical Engineering, 40 (2014) 1171-1185. DOI:10.1016/j.compeleceng.2013.11.033

Abstract

In heterogeneous computing, application developers have to identify the best-suited target platform from a variety of alternatives. In this work, we compare performance and architectural efficiency of Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) for two algorithms taken from a novel medical imaging method named 3D ultrasound computer tomography. From the 40 nm and 28 nm generations, we use top-notch devices and those with similar power consumption values. For our two benchmark algorithms from the signal processing and imaging domain, the results show that if power consumption is not considered, the GPU and FPGA from the 40nm generation give both, a similar performance and efficiency per transistor. In the 28 nm process, in contrast, the FPGA is superior to its GPU counterpart by 86% and 39%, depending on the algorithm. If power is limited, FPGAs outperform GPUs in each investigated case by at least a factor of four. © 2013 Elsevier Ltd. All rights reserved.

Brogna A.S., Balzer M., Smale S., Hartmann J., Bormann D., Hamann E., Cecilia A., Zuber M., Koenig T., Zwerger A., Weber M., Fiederle M., Baumbach T.

in Journal of Instrumentation, 9 (2014), C05047. DOI:10.1088/1748-0221/9/05/C05047

Abstract

In this work we present a novel readout electronics for an X-ray sensor based on a Si crystal bump-bonded to an array of 3 × 2 Medipix ASICs. The pixel size is 55 μm × 55 μm with a total number of ∼ 400k pixels and a sensitive area of 42 mm × 28 mm. The readout electronics operate Medipix-2 MXR or Timepix ASICs with a clock speed of 125 MHz. The data acquisition system is centered around an FPGA and each of the six ASICs has a dedicated I/O port for simultaneous data acquisition. The settings of the auxiliary devices (ADCs and DACs) are also processed in the FPGA. Moreover, a high-resolution timer operates the electronic shutter to select the exposure time from 8 ns to several milliseconds. A sophisticated trigger is available in hardware and software to synchronize the acquisition with external electro-mechanical motors. The system includes a diagnostic subsystem to check the sensor temperature and to control the cooling Peltier cells and a programmable high-voltage generator to bias the crystal. A network cable transfers the data, encapsulated into the UDP protocol and streamed at 1 Gb/s. Therefore most notebooks or personal computers are able to process the data and to program the system without a dedicated interface. The data readout software is compatible with the well-known Pixelman 2.x running both on Windows and GNU/Linux. Furthermore the open architecture encourages users to write their own applications. With a low-level interface library which implements all the basic features, a MATLAB or Python script can be implemented for special manipulations of the raw data. In this paper we present selected images taken with a microfocus X-ray tube to demonstrate the capability to collect the data at rates up to 120 fps corresponding to 0.76 Gb/s. © 2014 IOP Publishing Ltd and Sissa Medialab srl.

Caselle M., Brosi M., Chilingaryan S., Dritschler T., Hertle E., Judin V., Kopmann A., Muller A.-S., Raasch J., Schleicher M., Smale N.J., Steinmann J., Vogelgesang M., Wuensch S., Siegel M., Weber M.

in IPAC 2014: Proceedings of the 5th International Particle Accelerator Conference (2014) 3497-3499.

Abstract

Copyright © 2014 CC-BY-3.0 and by the respective authors.The commissioning of a new real-time and high-accuracy data acquisition system suitable for recording individual ultra-short coherent pulses detected by fast terahertz detectors will be presented. The Karlsruhe Pulse Taking Ultra-fast Readout Electronics (KAPTURE) is able to monitor turn-by-turn all buckets in streaming mode. KAPTURE is based on a direct sampling pulse operating with a minimum sampling time of 3 ps and a total time jitter less than 1.7 ps. A very low noise layout design combined with wide dynamic range and bandwidth of the analog front-end enables the sampling of signals generated by different GHz/THz detectors. The system has already been used with NbN and YBCO superconductor film detectors as well as zero biased Schottky diode detectors. The digitized data is transmitted to a DAQ system by a FPGA high throughput board with data transfer rates of 4 GByte/s. The setup is accomplished by a real-time data processing unit based on high-end graphics processor units (GPUs) for on-line analysis of the frequency behaviour of the coherent synchrotron emission. The system has been successfully used to study the beam properties of the ANKA synchrotron radiation source located at the Karlsruhe Institute of Technology.

Rota L., Caselle M., Hiller N., Muller A.-S., Weber M.

in International Beam Instrumentation Conference, IBIC 2014 (2014).

Abstract

A new spectrometer system has been developed at ANKA for near-field single-shot Electro-Optical (EO) bunch profile measurements with a frame rate of 5 Mfps. The frame rate of commercial line detectors is limited to several tens of kHz, unsuitable for measuring fast dynamic changes of the bunch conditions. The new system aims to realize continuous data acquisition and over long observation periods without dead time. InGaAs or Si linear array pixel sensors are used to detect the near IR and visible spectrum radiation. The detector signals are fed via wire-bonding connections to the GOTTHARD ASIC, a charge-sensitive amplifier with analog outputs. The front-end board is also equipped with an array of fast ADCs. The digital samples are then acquired by an FPGA-based readout card and transmitted to an external DAQ system via a high-speed PCI-Express data link. The DAQ system uses high-end Graphics Processors Units (GPUs) to perform a real-time analysis of the beam conditions. In this paper we present the concept, the first prototype and the low-noise layout techniques used for fast linear detectors.

Caselle M., Brosi M., Chilingaryan S., Dritschler T., Hiller N., Judin V., Kopmann A., Muller A.-S., Raasch J., Rota L., Petzold L., Smale N.J., Steinmann J.L., Vogelgesang M., Wuensch S., Siegel M., Weber M.

in International Beam Instrumentation Conference, IBIC 2014 (2014).

Abstract

The ANKA storage ring generates brilliant coherent synchrotron radiation (CSR) in the THz range due to a dedicated low-ac-optics with reduced bunch length. At higher electron currents the radiation is not stable but is emitted in powerful bursts caused by micro-bunching instabilities. This intense THz radiation is very attractive for users. However, the experimental conditions cannot be easily reproduced due to those power fluctuations. To study the bursting CSR in multi-bunch operation an ultra- fast and high-accuracy data acquisition system for recording of individual ultra-short coherent pulses has been developed. The Karlsruhe Pulse Taking Ultra-fast Readout Electronics (KAPTURE) is able to monitor all buckets turn-by-turn in streaming mode. KAPTURE provides real-time sampling of the pulse with a minimum sampling time of 3 ps and a total time jitter of less than 1.7 ps. The KAPTURE system, the synchrotron operation modes and beam test results are presented in this paper.