Farago, Tomas

PhD thesis, Faculty of Computer Science, Karlsruhe Institute of Technology, 2017.

Abstract

X-ray imaging experiments shed light on internal material structures. The success of an experiment depends on the properly selected experimental conditions, mechanics and the behavior of the sample or process under study. Up to now, there is no autonomous data acquisition scheme which would enable us to conduct a broad range of X-ray imaging experiments driven by image-based feedback. This thesis aims to close this gap by solving problems related to the selection of experimental parameters, fast data processing and automatic feedback to the experiment based on image metrics applied to the processed data.

In order to determine the best initial experimental conditions, we study the X-ray image formation principles and develop a framework for their simulation. It enables us to conduct a broad range of X-ray imaging experiments by taking into account many physical principles of the full light path from the X-ray source to the detector. Moreover, we focus on various sample geometry models and motion, which allows simulations of experiments such as 4D time-resolved tomography.

We further develop an autonomous data acquisition scheme which is able to fine-tune the initial conditions and control the experiment based on fast image analysis. We focus on high-speed experiments which require significant data processing speed, especially when the control is based on compute-intensive algorithms. We employ a highly parallelized framework to implement an efficient 3D reconstruction algorithm whose output is plugged into various image metrics which provide information about the acquired data. Such metrics are connected to a decision-making scheme which controls the data acquisition hardware in a closed loop.

We demonstrate the simulation framework accuracy by comparing virtual and real grating interferometry experiments. We also look into the impact of imaging conditions on the accuracy of the filtered back projection algorithm and how it can guide the optimization of experimental conditions. We also show how simulation together with ground truth can help to choose data processing parameters for motion estimation by a high-speed experiment.

We demonstrate the autonomous data acquisition system on an in-situ tomographic experiment, where it optimizes the camera frame rate based on tomographic reconstruction. We also use our system to conduct a high-throughput tomography experiment, where it scans many similar biological samples, finds the tomographic rotation axis for every sample and reconstructs a full 3D volume on-the-fly for quality assurance. Furthermore, we conduct an in-situ laminography experiment studying crack formation in a material. Our system performs the data acquisition and reconstructs a central slice of the sample to check its alignment and data quality.

Our work enables selection of the optimal initial experimental conditions based on high-fidelity simulations, their fine-tuning during a real experiment and its automatic control based on fast data analysis. Such a data acquisition scheme enables novel high-speed and in-situ experiments which cannot be controlled by a human operator due to high data rates.

First assessor: Prof. Dr.-Ing. R. Dillmann
Second assessor: Prof. Dr. Tilo Baumbach

T. Baumbach, V. Altapova, D. Hänschke, T. dos Santos Rolo, A. Ershov, L. Helfen, T. van de Kamp, J.-T. Reszat, M. Weber, M. Caselle, M. Balzer, S. Chilingaryan, A. Kopmann, I. Dalinger, A. Myagotin, V. Asadchikov, A. Buzmakov, S. Tsapko, I. Tsapko, V. Vichugov, M. Sukhodoev, UFO collaboration

Final report, BMBF Programme: “Development and Use of Accelerator-Based Photon Sources”, 2016

Executive summary

Recent progress in X-ray optics, detector technology, and the tremendous increase of processing speed of commodity computational architectures gave rise to a paradigm shift in synchrotron X-ray imaging. In order to explore these technologies within the two UFO projects the UFO experimental station for ultra-fast X-ray imaging has been developed. Key components, an intelligent detector system, vast computational power, and sophisticated algorithms have been designed, optimized and integrated for best overall performance. New methods like 4D cine-tomography for in-vivo measurements have been established. This online assessment of sample dynamics not only made active image-based control possible, but also resulted in unprecedented image quality and largely increased throughput. Typically 400-500 high-quality datasets with 3D images and image sequences are recorded with the UFO experimental station during a beam time of about 3-4 days.

A flexible and fully automated sample environment and a detector system for a set of up to three complementary cameras has been realized. It can be equipped with commercial available scientific visible-light cameras or a custom UFO camera. To support academic sensor development a novel platform for scientific cameras, the UFO camera framework, has been developed. It is a unique rapid-prototyping environment to turn scientific image sensors into intelligent smart camera systems. All beamline components, sample environment, detector station and the computing infrastructure are seamlessly integrates into the high-level control system “Concert” designed for online data evaluation and feedback control.

As a new element computing nodes for online data assessment have been introduced in UFO. A powerful computing infrastructure based on GPUs and real-time storage has been developed. Optimized reconstruction algorithms reach a throughput of several GB/s with a single GPU server. For scalability also clusters are supported. Highly optimized reconstruction and image processing algorithms are key for real-time monitoring and efficient data analysis. In order to manage these algorithms the UFO parallel computing framework has been designed. It supports the implementation of efficient algorithms as well as the development of data processing workflows based on these. The library of optimized algorithms supports all modalities of operation at the UFO experimental station: tomography laminography and diffraction imaging as well as numerous pre- and post-processing steps.

The results of the UFO project have been reported at several national and international workshops and conferences. The UFO project contributes with developments like the UFO- camera framework or its GPU computing environment to other hard- and software projects in the synchrotron community (e.g. Tango Control System, High Data Rate Processing and Analysis Initiative, Nexus data format, Helmholtz Detector Technology and Systems Initiative DTS). Further follow-up projects base on the UFO results and improve imaging methods (like STROBOS-CODE) or add sophisticated analysis environments (like ASTOR).

The UFO project has successfully developed key components for ultra-fast X-ray imaging and serves as an example for future data intense applications. It demonstrates KIT’s role as technology center for novel synchrotron instrumentation.

Vogelgesang M., Farago T., Morgeneyer T.F., Helfen L., Dos Santos Rolo T., Myagotin A., Baumbach T.

in Journal of Synchrotron Radiation, 23 (2016) 1254-1263. DOI:10.1107/S1600577516010195

Abstract

© 2016 International Union of Crystallography.Real-time processing of X-ray image data acquired at synchrotron radiation facilities allows for smart high-speed experiments. This includes workflows covering parameterized and image-based feedback-driven control up to the final storage of raw and processed data. Nevertheless, there is presently no system that supports an efficient construction of such experiment workflows in a scalable way. Thus, here an architecture based on a high-level control system that manages low-level data acquisition, data processing and device changes is described. This system is suitable for routine as well as prototypical experiments, and provides specialized building blocks to conduct four-dimensional in situ, in vivo and operando tomography and laminography.

Shkarin A., Ametova E., Chilingaryan S., Dritschler T., Kopmann A., Vogelgesang M., Shkarin R., Tsapko S.

in Fundamenta Informaticae, 141 (2015) 259-274. DOI:10.3233/FI-2015-1275

Abstract

© 2015 Fundamenta Informaticae 141. The recent developments in detector technology made possible 4D (3D + time) X-ray microtomographywith high spatial and time resolutions. The resolution and duration of such experiments is currently limited by destructive X-ray radiation. Algebraic reconstruction technique (ART) can incorporate a priori knowledge into a reconstruction model that will allow us to apply some approaches to reduce an imaging dose and keep a good enough reconstruction quality. However, these techniques are very computationally demanding. In this paper we present a framework for ART reconstruction based on OpenCL technology. Our approach treats an algebraic method as a composition of interacting blocks which performdifferent tasks, such as projection selection, minimization, projecting and regularization. These tasks are realised using multiple algorithms differing in performance, the quality of reconstruction, and the area of applicability. Our framework allows to freely combine algorithms to build the reconstruction chain. All algorithms are implemented with OpenCL and are able to run on a wide range of parallel hardware. As well the framework is easily scalable to clustered environment with MPI. We will describe the architecture of ART framework and evaluate the quality and performance on latest generation of GPU hardware from NVIDIA and AMD.

Shkarin R., Ametova E., Chilingaryan S., Dritschler T., Kopmann A., Mirone A., Shkarin A., Vogelgesang M., Tsapko S.

in Fundamenta Informaticae, 141 (2015) 245-258. DOI:10.3233/FI-2015-1274

Abstract

© 2015 Fundamenta Informaticae 141.On-line monitoring of synchrotron 3D-imaging experiments requires very fast tomographic reconstruction. Direct Fourier methods (DFM) have the potential to be faster than standard Filtered Backprojection. We have evaluated multiple DFMs using various interpolation techniques. We compared reconstruction quality and studied the parallelization potential. A method using Direct Fourier Inversion (DFI) and a sinc-based interpolation was selected and parallelized for execution on GPUs. Several optimization steps were considered to boost the performance. Finally we evaluated the achieved performance for the latest generation of GPUs from NVIDIA and AMD. The results show that tomographic reconstruction with a throughput of more than 1.5 GB/sec on a single GPU is possible.

Vogelgesang, Matthias

PhD thesis, Faculty of Computer Science, Karlsruhe Institute of Technology, 2014.

Abstract

Moore’s law stays the driving force behind higher chip integration density and an ever- increasing number of transistors. However, the adoption of massively parallel hardware architectures widens the gap between the potentially available microprocessor performance and the performance a developer can make use of. is thesis tries to close this gap by solving the problems that arise from the challenges of achieving optimal performance on parallel compute systems, allowing developers and end-users to use this compute performance in a transparent manner and using the compute performance to enable data-driven processes.

A general solution cannot realistically achieve optimal operation which is why we will focus on streamed data processing in this thesis. Data streams lend themselves to describe high-throughput data processing tasks such as audio and video processing. With this specific data stream use case, we can systematically improve the existing designs and optimize the execution from the instruction-level parallelism up to node-level task parallelism. In particular, we want to focus on X-ray imaging applications used at synchrotron light sources. These large-scale facilities provide an X-ray beam that enables scanning samples at much higher spatial and temporal resolution compared to conventional X-ray sources. The increased data rate inevitably requires highly parallel processing systems as well as an optimized data acquisition and control environment.

To solve the problem of high-throughput streamed data processing we developed, modeled and evaluated system architectures to acquire and process data streams on parallel and heterogeneous compute systems. We developed a method to map general task descriptions onto heterogeneous compute systems and execute them with optimizations for local multi-machines and clusters of multi-user compute nodes. We also proposed an source-to-source translation system to simplify the development of task descriptions.

We have shown that it is possible to acquire and compute tomographic reconstructions on a heterogeneous compute system consisting of CPUs and GPUs in soft real-time. The end-user’s only responsibility is to describe the problem correctly. With the proposed system architectures, we paved the way for novel in-situ and in-vivo experiments and a much smarter experiment setup in general. Where existing experiments depend on a static environment and process sequence, we established the possibility to control the experiment setup in a closed feedback loop.
First assessor: Prof. Dr. Achim Streit
Second assessor: Prof. Dr. Marc Weber

Van De Kamp T., Dos Santos Rolo T., Vagovic P., Baumbach T., Riedel A.

in PLoS ONE, 9 (2014), e102355. DOI:10.1371/journal.pone.0102355

Abstract

Digital surface mesh models based on segmented datasets have become an integral part of studies on animal anatomy and functional morphology; usually, they are published as static images, movies or as interactive PDF files. We demonstrate the use of animated 3D models embedded in PDF documents, which combine the advantages of both movie and interactivity, based on the example of preserved Trigonopterus weevils. The method is particularly suitable to simulate joints with largely deterministic movements due to precise form closure. We illustrate the function of an individual screw-and-nut type hip joint and proceed to the complex movements of the entire insect attaining a defence position. This posture is achieved by a specific cascade of movements: Head and legs interlock mutually and with specific features of thorax and the first abdominal ventrite, presumably to increase the mechanical stability of the beetle and to maintain the defence position with minimal muscle activity. The deterministic interaction of accurately fitting body parts follows a defined sequence, which resembles a piece of engineering. © 2014 van de Kamp et al.

T. Baumbach, V. Altapova, D. Hänschke, T. dos Santos Rolo, A. Ershov, L. Helfen, T. van de Kamp, M. Weber, A. Kopmann, S. Chilingaryan, I. Dalinger, A. Myagotin, V. Asadchikov, A. Buzmakov, S. Tsapko, UFO collaboration

Final report, BMBF Programme: “Development and Use of Accelerator-Based Photon Sources”, 2014

Executive summary

Recent progress in X-ray optics, detector technology, and the tremendous increase of processing speed of commodity computational architectures gives rise to a paradigm shift in synchrotron X-ray imaging. The UFO project aims to enable a novel class of experiments combining intelligent detector systems, vast computational power, and so- phisticated algorithms. The on-line assessment of sample dynamics will make active image-based control possible, give rise to unprecedented image quality, and will provide new insights into so far inaccessible scientific phenomena.

A demonstrator for high-speed tomography has been developed and extensively used. The system includes critical components like computation infrastructure, reconstruction algorithms and detector system and proved that time-resolved tomography is feasible. Based on these results the final design of the UFO experimental station has been revised and several upgrades have been included to enable further imaging techniques.

A flexible and fully automated detector system for a set of up to three complementary cameras has been designed, constructed and commissioned. A new platform for smart scientific cameras, the UFO-DAQ framework, has been realized. It is a unique rapid-prototyping environment to turn scientific image sensors into intelligent smart cam- era systems. Central features are the modular sensor interface, an open embedded processing framework and high-speed PCI Express links to the readout server. The UFO-DAQ framework seamlessly integrates in the UFO parallel computing framework.
The UFO project demonstrated that high-end graphics processor units (GPUs) are an ideal platform for a new generation of online monitoring systems for synchrotron appli- cations with high data rates. A powerful computing infrastructure based on GPUs and real-time storage has been developed. Optimized reconstruction algorithms reach a throughput of 1 GB/s with a single GPU server. Generalized reconstruction algorithms include also laminography with tilted rotation axis.

Highly optimized reconstruction and image processing algorithms are key for real-time monitoring and efficient data analysis. In order to manage these algorithms the UFO parallel computing framework has been developed. It supports the implementation of efficient algorithms as well as the development of data processing workflows based on these. It automatically selects the best code depending on the available comput- ing resources. With its clear modular structure the framework is ideally suited as an exchange platform for optimized algorithms for parallel computing architectures. The code published under open source license is well-recognized by the synchrotron community.

The UFO project has been performed in close collaboration with three Russian part- ners. Various collaborating meetings have been organized and a number of scientists visited the partners partner institutions. The focus of the Russian contribution has been the smart camera platform and algorithm development. The results of the UFO project have been reported at several national and international workshops and conferences. The UFO project contributes with developments like the UFO-DAQ framework or its GPU computing environment to other hard- and software projects in the synchrotron community (e.g. Tango Control System, High Data Rate Processing and Analysis Initiative, Nexus data format, Helmholtz Detector Technology and Systems Initiative DTS).

In summary, within the UFO project it was possible to developed key components for future data intense applications. Most important are the X-ray detector system, a smart camera platform, GPU-based computing infrastructure and the parallel com- puting framework including various optimized algorithms. The potential and feasibility of high-speed X-ray tomography has been demonstrated by prototypes of experimental stations at the ANKA beamlines TOPO-TOMO and IMAGE.

Rolo T.D.S., Ershov A., Van De Kamp T., Baumbach T.

in Proceedings of the National Academy of Sciences of the United States of America, 111 (2014) 3921-3926. DOI:10.1073/pnas.1308650111

Abstract

Scientific cinematography using ultrafast optical imaging is a common tool to study motion. In opaque organisms or structures, X-ray radiography captures sequences of 2D projections to visualize morphological dynamics, but for many applications full fourdimensional (4D) spatiotemporal information is highly desirable. We introduce in vivo X-ray cine-tomography as a 4D imaging technique developed to study real-time dynamics in small living organisms with micrometer spatial resolution and subsecond time resolution. The method enables insights into the physiology of small animals by tracking the 4D morphological dynamics of minute anatomical features as demonstrated in this work by the analysis of fast-moving screw-and-nut-type weevil hip joints. The presented method can be applied to a broad range of biological specimens and biotechnological processes.

Vogelgesang M., Chilingaryan S., Rolo T.D.S., Kopmann A.

in Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 – 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012 (2012) 824-829, 6332254. DOI:10.1109/HPCC.2012.116

Abstract

Current synchrotron experiments require state-of-the-art scientific cameras with sensors that provide several million pixels, each at a dynamic range of up to 16 bits and the ability to acquire hundreds of frames per second. The resulting data bandwidth of such a data stream reaches several Gigabits per second. These streams have to be processed in real-time to achieve a fast process response. In this paper we present a computation framework and middleware library that provides re-usable building blocks to implement high-performance image processing algorithms without requiring profound hardware knowledge. It is based on a graph structure of computation nodes that process image transformation kernels on either CPU or GPU using the OpenCL sub-system. This system architecture allows deployment of the framework on a large range of computational hardware, from netbooks to hybrid compute clusters. We evaluated the library with standard image processing algorithms required for high quality tomographic reconstructions. The results show that speed-ups from 7x to 37x compared to traditional CPU-based solutions can be achieved with our approach, hence providing an opportunity for real-time on-line monitoring at synchrotron beam lines. © 2012 IEEE.