Rota, Lorenzo

PhD thesis, Faculty of Electrical Engineering and Information Technology, Karlsruhe Institute of Technology, 2017.

Abstract

In modern particle accelerators, a precise control of the particle beam is essential for the correct operation of the facility. The experimental observation of the beam behavior relies on dedicated techniques, which are often described by the term “beam diagnostics”. Cutting-edge beam diagnostics systems, in particular several experimental setups currently installed at KIT’s synchrotron light source ANKA, employ line scan detectors to characterize and monitor the beam parameters precisely. Up to now, the experimental resolution of these setups has been limited by the line rate of existing detectors, which is limited to a few hundreds of kHz.

This thesis addresses this limitation with the development a novel line scan detector system named KALYPSO – KArlsruhe Linear arraY detector for MHz rePetition-rate SpectrOscopy. The goal is to provide scientists at ANKA with a complete detector system which will enable real-time measurements at MHz repetition rates. The design of both front-end and back-end electronics suitable for beam diagnostic experiments is a challenging task, because the detector must achieve low-noise performance at high repetition rates and with a large number of channels. Moreover, the detector system must sustain continuous data taking and introduce low-latency. To meet these stringent requirements, several novel components have been developed by the author of this thesis, such as a novel readout ASIC and a high-performance DAQ system.

The front-end ASIC has been designed to readout different types of microstrip sensors for the detection of visible and near-infrared light. The ASIC is composed of 128 analog channels which are operated in parallel, plus additional mixed-signal stages which interface external devices. Each channel consists of a Charge Sensitive Amplifier (CSA), a Correlated Double Sampling (CDS) stage and a channel buffer. Moreover, a high-speed output driver has been implemented to interface directly an off-chip ADC. The first version of the ASIC with a reduced number of channels has been produced in a 110 nm CMOS technology. The chip is fully functional and achieves a line rate of 12 MHz with an equivalent noise charge of 417 electrons when connected to a detector capacitance of 1.3 pF.

Moreover, a dedicated DAQ system has been developed to connect directly FPGA readout cards and GPU computing nodes. The data transfer is handled by a novel DMA engine implemented on FPGA. The performance of the DMA engine compares favorably with the current state-of-the-art, achieving a throughput of more than 7 GB/s and latencies as low as 2 us. The high-throughput and low-latency performance of the DAQ system enables real-time data processing on GPUs, as it has been demonstrated with extensive measurements. The DAQ system is currently integrated with KALYPSO and with other detector systems developed at the Institute for Data Processing and Electronics (IPE).

In parallel with the development of the ASIC, a first version of the KALYPSO detector system has been produced. This version is based on a Si or InGaAs microstrip sensor with 256 channels and on the GOTTHARD chip. A line rate of 2.7 MHz has been achieved, and experimental measurements have established KALYPSO as a powerful line scan detector operating at high line rates. The final version of the KALYPSO detector system, which will achieve a line rate of 10 MHz, is anticipated for early 2018.

Finally, KALYPSO has been installed at two different experimental setups at ANKA during several commissioning campaigns. The KALYPSO detector system allowed scientists to observe the beam behavior with unprecedented experimental resolution. First exciting and widely recognized scientific results were obtained at ANKA and at the European XFEL, demonstrating the benefits brought by the KALYPSO detector system in modern beam diagnostics.

 

First assessor: Prof. Dr. M. Weber
Second assessor: Prof. Dr.-Ing. Dr. h.c. J. Becker

Farago, Tomas

PhD thesis, Faculty of Computer Science, Karlsruhe Institute of Technology, 2017.

Abstract

X-ray imaging experiments shed light on internal material structures. The success of an experiment depends on the properly selected experimental conditions, mechanics and the behavior of the sample or process under study. Up to now, there is no autonomous data acquisition scheme which would enable us to conduct a broad range of X-ray imaging experiments driven by image-based feedback. This thesis aims to close this gap by solving problems related to the selection of experimental parameters, fast data processing and automatic feedback to the experiment based on image metrics applied to the processed data.

In order to determine the best initial experimental conditions, we study the X-ray image formation principles and develop a framework for their simulation. It enables us to conduct a broad range of X-ray imaging experiments by taking into account many physical principles of the full light path from the X-ray source to the detector. Moreover, we focus on various sample geometry models and motion, which allows simulations of experiments such as 4D time-resolved tomography.

We further develop an autonomous data acquisition scheme which is able to fine-tune the initial conditions and control the experiment based on fast image analysis. We focus on high-speed experiments which require significant data processing speed, especially when the control is based on compute-intensive algorithms. We employ a highly parallelized framework to implement an efficient 3D reconstruction algorithm whose output is plugged into various image metrics which provide information about the acquired data. Such metrics are connected to a decision-making scheme which controls the data acquisition hardware in a closed loop.

We demonstrate the simulation framework accuracy by comparing virtual and real grating interferometry experiments. We also look into the impact of imaging conditions on the accuracy of the filtered back projection algorithm and how it can guide the optimization of experimental conditions. We also show how simulation together with ground truth can help to choose data processing parameters for motion estimation by a high-speed experiment.

We demonstrate the autonomous data acquisition system on an in-situ tomographic experiment, where it optimizes the camera frame rate based on tomographic reconstruction. We also use our system to conduct a high-throughput tomography experiment, where it scans many similar biological samples, finds the tomographic rotation axis for every sample and reconstructs a full 3D volume on-the-fly for quality assurance. Furthermore, we conduct an in-situ laminography experiment studying crack formation in a material. Our system performs the data acquisition and reconstructs a central slice of the sample to check its alignment and data quality.

Our work enables selection of the optimal initial experimental conditions based on high-fidelity simulations, their fine-tuning during a real experiment and its automatic control based on fast data analysis. Such a data acquisition scheme enables novel high-speed and in-situ experiments which cannot be controlled by a human operator due to high data rates.

First assessor: Prof. Dr.-Ing. R. Dillmann
Second assessor: Prof. Dr. Tilo Baumbach

M. Heethoff, V. Heuveline, H. Hartenstein, W. Mexner, T. van de Kamp, A. Kopmann

Final report, BMBF Programme: “Erforschung kondensierter Materie”, 2016.

Executive summary

Die Synchrotron-Röntgentomographie ist eine einzigartige Abbildungsmethode zur Untersuchung innerer Strukturen – insbesondere in undurchsichtigen Proben. In den letzten Jahren konnte die räumliche und zeitliche Auflösung der Methode stark verbessert werden. Die Auswertung der Datensätze ist allerdings bedingt durch ihre Größe und die Komplexität der abgebildeten Strukturen herausfordernd. Der Verbund für Funktionsmorphologie und Systematik hat sich mit dem Projekt ASTOR das Ziel gesetzt, den Zugang zur Röntgentomographie durch eine integrierte Analyseumgebung für biologische Nutzer zu erleichtern.
Durch den interdisziplinären Zusammenschluss von Biologen, Informatikern, Mathematikern und Ingenieuren war es möglich, die gesamte Datenverarbeitungskette zu betrachten. Es sind weitgehend automatisierte Datenverarbeitungs- und -transfermethoden entstanden. Die tomographischen Aufnahmen werden online rekonstruiert und in die ASTOR Analyseumgebung transferiert. Die Daten stehen anschließend über virtuelle Rechner den Nutzern sowohl bei ANKA als auch außerhalb zur Verfügung. Ein Autorisierungsschema für den Zugriff wurde erarbeitet. Die Analyseinfrastruktur besteht aus einem temporären Datenspeicher, dem Virtualisierungsserver, sowie der Anbindung an Beamlines und Langzeitarchiv. Die Analyseumgebung bietet neben kostenintensiven kommerziellen Programmen neu entwickelte Werkzeuge an. Hervorzuheben sind hier die ASTOR- Segmentierungsfunktionen, die den bislang sehr zeit- und arbeitsintensiven Arbeitsschritt um ein Vielfaches beschleunigen. Die automatische Segmentierung lässt sich transparent über in nur wenigen Schichten markierte Bereiche steuern und erzielt ein bislang unerreichtes automatisches Segmentierungsergebnis.
Die Analyseumgebung hat sich als sehr effizient für die Datenauswertung und Methodenentwicklung erwiesen. Neben den Antragstellern wird das System inzwischen von weiteren Nutzern erfolgreich eingesetzt. Im Verlauf des Projektes wurde in mehreren Strahlzeiten ein umfangreicher Satz an Beispielaufnahmen über einen breiten Bereich von Organismen aufgenommen. Ausgewählte Proben wurden als Vorlage für die Methodenentwicklung segmentiert und klassifiziert. Im Verlauf des Projektes konnte die Zahl der Aufnahmen innerhalb einer Messwoche auf zunächst 400 und zum Schluss sogar auf bis zu 1000 drastisch erhöht werden.
Mit ASTOR ist es gelungen, eine durchgehende Analyseumgebung aufzubauen, und damit den nächsten Schritt im Ausbau solcher Experimentiereinrichtungen aufzuzeigen. Für die gewählte Anwendung, die Funktionsmorphologie, ist es erstmals möglich, auch quantitative Reihenuntersuchungen an kleinen Organismen durchzuführen. Die Auswertesystematik ist nicht auf diese Anwendung beschränkt, sondern vielmehr ein generelles Beispiel für datenintensive Experimente. Das ebenfalls von der BMBF-Verbundforschung geförderte Projekt NOVA setzt die begonnenen Aktivitäten in diesem Sinne fort und beabsichtigt durch synergistische Zusammenarbeit einen offenen Datenkatalog für eine gesamte Community zu erstellen.

Mohr, Hannes

Master Thesis, Faculty for Physics, Karlsruhe Institute of Technology, 2016.

Abstract

In this work we present an evaluation of GPUs as a possible L1 Track Trigger for the High Luminosity LHC, effective after Long Shutdown 3 around 2025.

The novelty lies in presenting an implementation based on calculations done entirely in software, in contrast to currently discussed solutions relying on specialized hardware, such as FPGAs and ASICs.
Our solution relies on using GPUs for the calculation instead, offering floating point calculations as well as flexibility and adaptability. Normally the involved data transfer latencies make GPUs unfeasible for use in low latency environments. To this end we use a data transfer scheme based on RDMA technology. This mitigates the normally involved overheads.
We based our efforts on previous work by the collaboration of the KIT and the English track trigger group [An FPGA-based track finder for the L1 trigger of the CMS experiment at the high luminosity LHC] whose algorithm was implemented on FPGAs.
In addition to the Hough transformation used regularly, we present our own version of the algorithm based on a hexagonal layout of the binned parameter space. With comparable computational latency and workload, the approach produces significantly less fake track candidates than the traditionally used method. This comes at a cost of efficiency of around 1 percent.

This work focuses on the track finding part of the proposed L1 Track Trigger and only looks at the result of a least squares fit to make an estimate of the performance of said seeding step. We furthermore present our results in terms of overall latency of this novel approach.

While not yet competitive, our implementation has surpassed initial expectations and are on the same order of magnitude as the FPGA approach in terms of latencies. Some caveats apply at the moment. Ultimately, more recent technology, not yet available to us in the current discussion will have to be tested and benchmarked to come to a more complete assessment of the feasibility of GPUs as a means of track triggering
at the High-Luminosity-LHC’s CMS experiment.

 

First assessor: Prof. Dr. Marc Weber
Second assessor: Prof. Dr. Ulrich Husemann

Supervised by Dipl.-Inform. Timo Dritschler

T. Baumbach, V. Altapova, D. Hänschke, T. dos Santos Rolo, A. Ershov, L. Helfen, T. van de Kamp, J.-T. Reszat, M. Weber, M. Caselle, M. Balzer, S. Chilingaryan, A. Kopmann, I. Dalinger, A. Myagotin, V. Asadchikov, A. Buzmakov, S. Tsapko, I. Tsapko, V. Vichugov, M. Sukhodoev, UFO collaboration

Final report, BMBF Programme: “Development and Use of Accelerator-Based Photon Sources”, 2016

Executive summary

Recent progress in X-ray optics, detector technology, and the tremendous increase of processing speed of commodity computational architectures gave rise to a paradigm shift in synchrotron X-ray imaging. In order to explore these technologies within the two UFO projects the UFO experimental station for ultra-fast X-ray imaging has been developed. Key components, an intelligent detector system, vast computational power, and sophisticated algorithms have been designed, optimized and integrated for best overall performance. New methods like 4D cine-tomography for in-vivo measurements have been established. This online assessment of sample dynamics not only made active image-based control possible, but also resulted in unprecedented image quality and largely increased throughput. Typically 400-500 high-quality datasets with 3D images and image sequences are recorded with the UFO experimental station during a beam time of about 3-4 days.

A flexible and fully automated sample environment and a detector system for a set of up to three complementary cameras has been realized. It can be equipped with commercial available scientific visible-light cameras or a custom UFO camera. To support academic sensor development a novel platform for scientific cameras, the UFO camera framework, has been developed. It is a unique rapid-prototyping environment to turn scientific image sensors into intelligent smart camera systems. All beamline components, sample environment, detector station and the computing infrastructure are seamlessly integrates into the high-level control system “Concert” designed for online data evaluation and feedback control.

As a new element computing nodes for online data assessment have been introduced in UFO. A powerful computing infrastructure based on GPUs and real-time storage has been developed. Optimized reconstruction algorithms reach a throughput of several GB/s with a single GPU server. For scalability also clusters are supported. Highly optimized reconstruction and image processing algorithms are key for real-time monitoring and efficient data analysis. In order to manage these algorithms the UFO parallel computing framework has been designed. It supports the implementation of efficient algorithms as well as the development of data processing workflows based on these. The library of optimized algorithms supports all modalities of operation at the UFO experimental station: tomography laminography and diffraction imaging as well as numerous pre- and post-processing steps.

The results of the UFO project have been reported at several national and international workshops and conferences. The UFO project contributes with developments like the UFO- camera framework or its GPU computing environment to other hard- and software projects in the synchrotron community (e.g. Tango Control System, High Data Rate Processing and Analysis Initiative, Nexus data format, Helmholtz Detector Technology and Systems Initiative DTS). Further follow-up projects base on the UFO results and improve imaging methods (like STROBOS-CODE) or add sophisticated analysis environments (like ASTOR).

The UFO project has successfully developed key components for ultra-fast X-ray imaging and serves as an example for future data intense applications. It demonstrates KIT’s role as technology center for novel synchrotron instrumentation.

Riechelmann, Max

Diploma Thesis, Faculty for Computer Science, Karlsruhe Institute of Technology, 2015.

Abstract

NVIDIAs recently presented GPUDirect RDMA technology allows direct communication on the PCIe bus between NVIDIA GPUs and other devices. The ability to bypass main memory and write and read directly into/from the GPU memory is expected to decrease the latency of data tranfer actions. KIRO (KITs InfiniBand remote communication library) is used to provide high-performance communication for control systems at the image beam line of the ANKA synchrotron. To improve the reaction time of control systems and be ready for cameras with throughput of several gigabytes per second, we have modified KIRO to use the GPUDirect RDMA technology. Using this approach we were able to reach throughput rates of 40 GBit/s and could nearly halve the latency. The GPUDirect technology and the updated architecture of KIRO will be presented in this work. The achieved performance and feasability of the integration in the current workflow will be discussed.

 

First assessor: Prof. Dr. Wolfgang Karl
Second assessor: Prof. Dr. Marc Weber

Supervised by Dipl.-Inform. Timo Dritschler,  Dr. Ing. Mario Kicherer

Schultze, Felix

Master thesis, Faculty of Computer Science, Karlsruhe Institute of Technology, 2015.

Abstract

An ever increasing number of large tomographic images is recorded at synchrotron facilities world wide. Due to the drastic increase of data volumes, there is a recent trend to provide data analysis services at the facilities as well. The ASTOR project aims to realize a cloud-based infrastructure for remote data analysis and visualization of tomographic data. A key component is a web-based data browser to select data sets and request a virtual machine for analysis of this data. One of the challenges is to provide a fast preview of 3D volumes but also 3D sequences. Since a standard data sets exceed 10 gigabytes, standard visualization techniques are not feasible and new data reduction techniques have to be developed.

 

First assessor: Prof. Dr.-Ing. Carsten Dachsbacher
Second assessor: Dr. Suren Chilingaryan

Supervised by  Dr. Andreas Kopmann

Lewkowicz, Alexander

Internship report, Institute for Data Processing and Electronics, Karlsruhe Institute of Technology, 2014.

Abstract

High speed tracking of fluorescent nano particles enables scientists to study the drying process of fluids. A better understanding of this drying process will help develop new techniques to obtain homogeneous surfaces. Images are recorded via CMOS cameras to observe the particle flow. The challenge is to find particles 3rd coordinate from a 2D image. Depending on the distance to the objective lens of the microscope, rings of different radii appear in the images. By detecting the rings radii and coordinates, both velocity and 3D trajectories can be established for each particle. To achieve almost real-time particle tracking, highly parallel systems, such as GPUs, are used.

Supervised by  Dr. Suren Chilingaryan

Vogelgesang, Matthias

PhD thesis, Faculty of Computer Science, Karlsruhe Institute of Technology, 2014.

Abstract

Moore’s law stays the driving force behind higher chip integration density and an ever- increasing number of transistors. However, the adoption of massively parallel hardware architectures widens the gap between the potentially available microprocessor performance and the performance a developer can make use of. is thesis tries to close this gap by solving the problems that arise from the challenges of achieving optimal performance on parallel compute systems, allowing developers and end-users to use this compute performance in a transparent manner and using the compute performance to enable data-driven processes.

A general solution cannot realistically achieve optimal operation which is why we will focus on streamed data processing in this thesis. Data streams lend themselves to describe high-throughput data processing tasks such as audio and video processing. With this specific data stream use case, we can systematically improve the existing designs and optimize the execution from the instruction-level parallelism up to node-level task parallelism. In particular, we want to focus on X-ray imaging applications used at synchrotron light sources. These large-scale facilities provide an X-ray beam that enables scanning samples at much higher spatial and temporal resolution compared to conventional X-ray sources. The increased data rate inevitably requires highly parallel processing systems as well as an optimized data acquisition and control environment.

To solve the problem of high-throughput streamed data processing we developed, modeled and evaluated system architectures to acquire and process data streams on parallel and heterogeneous compute systems. We developed a method to map general task descriptions onto heterogeneous compute systems and execute them with optimizations for local multi-machines and clusters of multi-user compute nodes. We also proposed an source-to-source translation system to simplify the development of task descriptions.

We have shown that it is possible to acquire and compute tomographic reconstructions on a heterogeneous compute system consisting of CPUs and GPUs in soft real-time. The end-user’s only responsibility is to describe the problem correctly. With the proposed system architectures, we paved the way for novel in-situ and in-vivo experiments and a much smarter experiment setup in general. Where existing experiments depend on a static environment and process sequence, we established the possibility to control the experiment setup in a closed feedback loop.
First assessor: Prof. Dr. Achim Streit
Second assessor: Prof. Dr. Marc Weber

T. Baumbach, V. Altapova, D. Hänschke, T. dos Santos Rolo, A. Ershov, L. Helfen, T. van de Kamp, M. Weber, A. Kopmann, S. Chilingaryan, I. Dalinger, A. Myagotin, V. Asadchikov, A. Buzmakov, S. Tsapko, UFO collaboration

Final report, BMBF Programme: “Development and Use of Accelerator-Based Photon Sources”, 2014

Executive summary

Recent progress in X-ray optics, detector technology, and the tremendous increase of processing speed of commodity computational architectures gives rise to a paradigm shift in synchrotron X-ray imaging. The UFO project aims to enable a novel class of experiments combining intelligent detector systems, vast computational power, and so- phisticated algorithms. The on-line assessment of sample dynamics will make active image-based control possible, give rise to unprecedented image quality, and will provide new insights into so far inaccessible scientific phenomena.

A demonstrator for high-speed tomography has been developed and extensively used. The system includes critical components like computation infrastructure, reconstruction algorithms and detector system and proved that time-resolved tomography is feasible. Based on these results the final design of the UFO experimental station has been revised and several upgrades have been included to enable further imaging techniques.

A flexible and fully automated detector system for a set of up to three complementary cameras has been designed, constructed and commissioned. A new platform for smart scientific cameras, the UFO-DAQ framework, has been realized. It is a unique rapid-prototyping environment to turn scientific image sensors into intelligent smart cam- era systems. Central features are the modular sensor interface, an open embedded processing framework and high-speed PCI Express links to the readout server. The UFO-DAQ framework seamlessly integrates in the UFO parallel computing framework.
The UFO project demonstrated that high-end graphics processor units (GPUs) are an ideal platform for a new generation of online monitoring systems for synchrotron appli- cations with high data rates. A powerful computing infrastructure based on GPUs and real-time storage has been developed. Optimized reconstruction algorithms reach a throughput of 1 GB/s with a single GPU server. Generalized reconstruction algorithms include also laminography with tilted rotation axis.

Highly optimized reconstruction and image processing algorithms are key for real-time monitoring and efficient data analysis. In order to manage these algorithms the UFO parallel computing framework has been developed. It supports the implementation of efficient algorithms as well as the development of data processing workflows based on these. It automatically selects the best code depending on the available comput- ing resources. With its clear modular structure the framework is ideally suited as an exchange platform for optimized algorithms for parallel computing architectures. The code published under open source license is well-recognized by the synchrotron community.

The UFO project has been performed in close collaboration with three Russian part- ners. Various collaborating meetings have been organized and a number of scientists visited the partners partner institutions. The focus of the Russian contribution has been the smart camera platform and algorithm development. The results of the UFO project have been reported at several national and international workshops and conferences. The UFO project contributes with developments like the UFO-DAQ framework or its GPU computing environment to other hard- and software projects in the synchrotron community (e.g. Tango Control System, High Data Rate Processing and Analysis Initiative, Nexus data format, Helmholtz Detector Technology and Systems Initiative DTS).

In summary, within the UFO project it was possible to developed key components for future data intense applications. Most important are the X-ray detector system, a smart camera platform, GPU-based computing infrastructure and the parallel com- puting framework including various optimized algorithms. The potential and feasibility of high-speed X-ray tomography has been demonstrated by prototypes of experimental stations at the ANKA beamlines TOPO-TOMO and IMAGE.