Master Thesis, Faculty for Physics, Karlsruhe Institute of Technology, 2016.
In this work we present an evaluation of GPUs as a possible L1 Track Trigger for the High Luminosity LHC, effective after Long Shutdown 3 around 2025.
The novelty lies in presenting an implementation based on calculations done entirely in software, in contrast to currently discussed solutions relying on specialized hardware, such as FPGAs and ASICs.
Our solution relies on using GPUs for the calculation instead, offering floating point calculations as well as flexibility and adaptability. Normally the involved data transfer latencies make GPUs unfeasible for use in low latency environments. To this end we use a data transfer scheme based on RDMA technology. This mitigates the normally involved overheads.
We based our efforts on previous work by the collaboration of the KIT and the English track trigger group [An FPGA-based track finder for the L1 trigger of the CMS experiment at the high luminosity LHC] whose algorithm was implemented on FPGAs.
In addition to the Hough transformation used regularly, we present our own version of the algorithm based on a hexagonal layout of the binned parameter space. With comparable computational latency and workload, the approach produces significantly less fake track candidates than the traditionally used method. This comes at a cost of efficiency of around 1 percent.
This work focuses on the track finding part of the proposed L1 Track Trigger and only looks at the result of a least squares fit to make an estimate of the performance of said seeding step. We furthermore present our results in terms of overall latency of this novel approach.
While not yet competitive, our implementation has surpassed initial expectations and are on the same order of magnitude as the FPGA approach in terms of latencies. Some caveats apply at the moment. Ultimately, more recent technology, not yet available to us in the current discussion will have to be tested and benchmarked to come to a more complete assessment of the feasibility of GPUs as a means of track triggering
at the High-Luminosity-LHC’s CMS experiment.
First assessor: Prof. Dr. Marc Weber
Second assessor: Prof. Dr. Ulrich Husemann
Supervised by Dipl.-Inform. Timo Dritschler
Diploma Thesis, Faculty for Computer Science, Karlsruhe Institute of Technology, 2015.
NVIDIAs recently presented GPUDirect RDMA technology allows direct communication on the PCIe bus between NVIDIA GPUs and other devices. The ability to bypass main memory and write and read directly into/from the GPU memory is expected to decrease the latency of data tranfer actions. KIRO (KITs InfiniBand remote communication library) is used to provide high-performance communication for control systems at the image beam line of the ANKA synchrotron. To improve the reaction time of control systems and be ready for cameras with throughput of several gigabytes per second, we have modified KIRO to use the GPUDirect RDMA technology. Using this approach we were able to reach throughput rates of 40 GBit/s and could nearly halve the latency. The GPUDirect technology and the updated architecture of KIRO will be presented in this work. The achieved performance and feasability of the integration in the current workflow will be discussed.
First assessor: Prof. Dr. Wolfgang Karl
Second assessor: Prof. Dr. Marc Weber
Supervised by Dipl.-Inform. Timo Dritschler, Dr. Ing. Mario Kicherer
Master thesis, Faculty of Computer Science, Karlsruhe Institute of Technology, 2015.
An ever increasing number of large tomographic images is recorded at synchrotron facilities world wide. Due to the drastic increase of data volumes, there is a recent trend to provide data analysis services at the facilities as well. The ASTOR project aims to realize a cloud-based infrastructure for remote data analysis and visualization of tomographic data. A key component is a web-based data browser to select data sets and request a virtual machine for analysis of this data. One of the challenges is to provide a fast preview of 3D volumes but also 3D sequences. Since a standard data sets exceed 10 gigabytes, standard visualization techniques are not feasible and new data reduction techniques have to be developed.
First assessor: Prof. Dr.-Ing. Carsten Dachsbacher
Second assessor: Dr. Suren Chilingaryan
Supervised by Dr. Andreas Kopmann
Internship report, Institute for Data Processing and Electronics, Karlsruhe Institute of Technology, 2014.
High speed tracking of fluorescent nano particles enables scientists to study the drying process of fluids. A better understanding of this drying process will help develop new techniques to obtain homogeneous surfaces. Images are recorded via CMOS cameras to observe the particle flow. The challenge is to find particles 3rd coordinate from a 2D image. Depending on the distance to the objective lens of the microscope, rings of different radii appear in the images. By detecting the rings radii and coordinates, both velocity and 3D trajectories can be established for each particle. To achieve almost real-time particle tracking, highly parallel systems, such as GPUs, are used.
Supervised by Dr. Suren Chilingaryan
PhD thesis, Faculty of Computer Science, Karlsruhe Institute of Technology, 2014.
Moore’s law stays the driving force behind higher chip integration density and an ever- increasing number of transistors. However, the adoption of massively parallel hardware architectures widens the gap between the potentially available microprocessor performance and the performance a developer can make use of. is thesis tries to close this gap by solving the problems that arise from the challenges of achieving optimal performance on parallel compute systems, allowing developers and end-users to use this compute performance in a transparent manner and using the compute performance to enable data-driven processes.
A general solution cannot realistically achieve optimal operation which is why we will focus on streamed data processing in this thesis. Data streams lend themselves to describe high-throughput data processing tasks such as audio and video processing. With this specific data stream use case, we can systematically improve the existing designs and optimize the execution from the instruction-level parallelism up to node-level task parallelism. In particular, we want to focus on X-ray imaging applications used at synchrotron light sources. These large-scale facilities provide an X-ray beam that enables scanning samples at much higher spatial and temporal resolution compared to conventional X-ray sources. The increased data rate inevitably requires highly parallel processing systems as well as an optimized data acquisition and control environment.
To solve the problem of high-throughput streamed data processing we developed, modeled and evaluated system architectures to acquire and process data streams on parallel and heterogeneous compute systems. We developed a method to map general task descriptions onto heterogeneous compute systems and execute them with optimizations for local multi-machines and clusters of multi-user compute nodes. We also proposed an source-to-source translation system to simplify the development of task descriptions.
We have shown that it is possible to acquire and compute tomographic reconstructions on a heterogeneous compute system consisting of CPUs and GPUs in soft real-time. The end-user’s only responsibility is to describe the problem correctly. With the proposed system architectures, we paved the way for novel in-situ and in-vivo experiments and a much smarter experiment setup in general. Where existing experiments depend on a static environment and process sequence, we established the possibility to control the experiment setup in a closed feedback loop.
First assessor: Prof. Dr. Achim Streit Dachsbacher
Second assessor: Prof. Dr. Marc Weber