Lewkowicz, Alexander

Internship report, Institute for Data Processing and Electronics, Karlsruhe Institute of Technology, 2014.


High speed tracking of fluorescent nano particles enables scientists to study the drying process of fluids. A better understanding of this drying process will help develop new techniques to obtain homogeneous surfaces. Images are recorded via CMOS cameras to observe the particle flow. The challenge is to find particles 3rd coordinate from a 2D image. Depending on the distance to the objective lens of the microscope, rings of different radii appear in the images. By detecting the rings radii and coordinates, both velocity and 3D trajectories can be established for each particle. To achieve almost real-time particle tracking, highly parallel systems, such as GPUs, are used.

Supervised by  Dr. Suren Chilingaryan

Vogelgesang, Matthias

PhD thesis, Faculty of Computer Science, Karlsruhe Institute of Technology, 2014.


Moore’s law stays the driving force behind higher chip integration density and an ever- increasing number of transistors. However, the adoption of massively parallel hardware architectures widens the gap between the potentially available microprocessor performance and the performance a developer can make use of. is thesis tries to close this gap by solving the problems that arise from the challenges of achieving optimal performance on parallel compute systems, allowing developers and end-users to use this compute performance in a transparent manner and using the compute performance to enable data-driven processes.

A general solution cannot realistically achieve optimal operation which is why we will focus on streamed data processing in this thesis. Data streams lend themselves to describe high-throughput data processing tasks such as audio and video processing. With this specific data stream use case, we can systematically improve the existing designs and optimize the execution from the instruction-level parallelism up to node-level task parallelism. In particular, we want to focus on X-ray imaging applications used at synchrotron light sources. These large-scale facilities provide an X-ray beam that enables scanning samples at much higher spatial and temporal resolution compared to conventional X-ray sources. The increased data rate inevitably requires highly parallel processing systems as well as an optimized data acquisition and control environment.

To solve the problem of high-throughput streamed data processing we developed, modeled and evaluated system architectures to acquire and process data streams on parallel and heterogeneous compute systems. We developed a method to map general task descriptions onto heterogeneous compute systems and execute them with optimizations for local multi-machines and clusters of multi-user compute nodes. We also proposed an source-to-source translation system to simplify the development of task descriptions.

We have shown that it is possible to acquire and compute tomographic reconstructions on a heterogeneous compute system consisting of CPUs and GPUs in soft real-time. The end-user’s only responsibility is to describe the problem correctly. With the proposed system architectures, we paved the way for novel in-situ and in-vivo experiments and a much smarter experiment setup in general. Where existing experiments depend on a static environment and process sequence, we established the possibility to control the experiment setup in a closed feedback loop.


First assessor: Prof. Dr. Achim Streit
Second assessor: Prof. Dr. Marc Weber

Van De Kamp T., Dos Santos Rolo T., Vagovic P., Baumbach T., Riedel A.

in PLoS ONE, 9 (2014), e102355. DOI:10.1371/journal.pone.0102355


Digital surface mesh models based on segmented datasets have become an integral part of studies on animal anatomy and functional morphology; usually, they are published as static images, movies or as interactive PDF files. We demonstrate the use of animated 3D models embedded in PDF documents, which combine the advantages of both movie and interactivity, based on the example of preserved Trigonopterus weevils. The method is particularly suitable to simulate joints with largely deterministic movements due to precise form closure. We illustrate the function of an individual screw-and-nut type hip joint and proceed to the complex movements of the entire insect attaining a defence position. This posture is achieved by a specific cascade of movements: Head and legs interlock mutually and with specific features of thorax and the first abdominal ventrite, presumably to increase the mechanical stability of the beetle and to maintain the defence position with minimal muscle activity. The deterministic interaction of accurately fitting body parts follows a defined sequence, which resembles a piece of engineering. © 2014 van de Kamp et al.

T. Baumbach, V. Altapova, D. Hänschke, T. dos Santos Rolo, A. Ershov, L. Helfen, T. van de Kamp, M. Weber, A. Kopmann, S. Chilingaryan, I. Dalinger, A. Myagotin, V. Asadchikov, A. Buzmakov, S. Tsapko, UFO collaboration

Final report, BMBF Programme: “Development and Use of Accelerator-Based Photon Sources”, 2014

Executive summary

Recent progress in X-ray optics, detector technology, and the tremendous increase of processing speed of commodity computational architectures gives rise to a paradigm shift in synchrotron X-ray imaging. The UFO project aims to enable a novel class of experiments combining intelligent detector systems, vast computational power, and so- phisticated algorithms. The on-line assessment of sample dynamics will make active image-based control possible, give rise to unprecedented image quality, and will provide new insights into so far inaccessible scientific phenomena.

A demonstrator for high-speed tomography has been developed and extensively used. The system includes critical components like computation infrastructure, reconstruction algorithms and detector system and proved that time-resolved tomography is feasible. Based on these results the final design of the UFO experimental station has been revised and several upgrades have been included to enable further imaging techniques.

A flexible and fully automated detector system for a set of up to three complementary cameras has been designed, constructed and commissioned. A new platform for smart scientific cameras, the UFO-DAQ framework, has been realized. It is a unique rapid-prototyping environment to turn scientific image sensors into intelligent smart cam- era systems. Central features are the modular sensor interface, an open embedded processing framework and high-speed PCI Express links to the readout server. The UFO-DAQ framework seamlessly integrates in the UFO parallel computing framework.
The UFO project demonstrated that high-end graphics processor units (GPUs) are an ideal platform for a new generation of online monitoring systems for synchrotron appli- cations with high data rates. A powerful computing infrastructure based on GPUs and real-time storage has been developed. Optimized reconstruction algorithms reach a throughput of 1 GB/s with a single GPU server. Generalized reconstruction algorithms include also laminography with tilted rotation axis.

Highly optimized reconstruction and image processing algorithms are key for real-time monitoring and efficient data analysis. In order to manage these algorithms the UFO parallel computing framework has been developed. It supports the implementation of efficient algorithms as well as the development of data processing workflows based on these. It automatically selects the best code depending on the available comput- ing resources. With its clear modular structure the framework is ideally suited as an exchange platform for optimized algorithms for parallel computing architectures. The code published under open source license is well-recognized by the synchrotron community.

The UFO project has been performed in close collaboration with three Russian part- ners. Various collaborating meetings have been organized and a number of scientists visited the partners partner institutions. The focus of the Russian contribution has been the smart camera platform and algorithm development. The results of the UFO project have been reported at several national and international workshops and conferences. The UFO project contributes with developments like the UFO-DAQ framework or its GPU computing environment to other hard- and software projects in the synchrotron community (e.g. Tango Control System, High Data Rate Processing and Analysis Initiative, Nexus data format, Helmholtz Detector Technology and Systems Initiative DTS).

In summary, within the UFO project it was possible to developed key components for future data intense applications. Most important are the X-ray detector system, a smart camera platform, GPU-based computing infrastructure and the parallel com- puting framework including various optimized algorithms. The potential and feasibility of high-speed X-ray tomography has been demonstrated by prototypes of experimental stations at the ANKA beamlines TOPO-TOMO and IMAGE.

Brogna A.S., Balzer M., Smale S., Hartmann J., Bormann D., Hamann E., Cecilia A., Zuber M., Koenig T., Zwerger A., Weber M., Fiederle M., Baumbach T.

in Journal of Instrumentation, 9 (2014), C05047. DOI:10.1088/1748-0221/9/05/C05047


In this work we present a novel readout electronics for an X-ray sensor based on a Si crystal bump-bonded to an array of 3 × 2 Medipix ASICs. The pixel size is 55 μm × 55 μm with a total number of ∼ 400k pixels and a sensitive area of 42 mm × 28 mm. The readout electronics operate Medipix-2 MXR or Timepix ASICs with a clock speed of 125 MHz. The data acquisition system is centered around an FPGA and each of the six ASICs has a dedicated I/O port for simultaneous data acquisition. The settings of the auxiliary devices (ADCs and DACs) are also processed in the FPGA. Moreover, a high-resolution timer operates the electronic shutter to select the exposure time from 8 ns to several milliseconds. A sophisticated trigger is available in hardware and software to synchronize the acquisition with external electro-mechanical motors. The system includes a diagnostic subsystem to check the sensor temperature and to control the cooling Peltier cells and a programmable high-voltage generator to bias the crystal. A network cable transfers the data, encapsulated into the UDP protocol and streamed at 1 Gb/s. Therefore most notebooks or personal computers are able to process the data and to program the system without a dedicated interface. The data readout software is compatible with the well-known Pixelman 2.x running both on Windows and GNU/Linux. Furthermore the open architecture encourages users to write their own applications. With a low-level interface library which implements all the basic features, a MATLAB or Python script can be implemented for special manipulations of the raw data. In this paper we present selected images taken with a microfocus X-ray tube to demonstrate the capability to collect the data at rates up to 120 fps corresponding to 0.76 Gb/s. © 2014 IOP Publishing Ltd and Sissa Medialab srl.

Birk M., Balzer M., Ruiter N.V., Becker J.

in Computers and Electrical Engineering, 40 (2014) 1171-1185. DOI:10.1016/j.compeleceng.2013.11.033


In heterogeneous computing, application developers have to identify the best-suited target platform from a variety of alternatives. In this work, we compare performance and architectural efficiency of Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) for two algorithms taken from a novel medical imaging method named 3D ultrasound computer tomography. From the 40 nm and 28 nm generations, we use top-notch devices and those with similar power consumption values. For our two benchmark algorithms from the signal processing and imaging domain, the results show that if power consumption is not considered, the GPU and FPGA from the 40nm generation give both, a similar performance and efficiency per transistor. In the 28 nm process, in contrast, the FPGA is superior to its GPU counterpart by 86% and 39%, depending on the algorithm. If power is limited, FPGAs outperform GPUs in each investigated case by at least a factor of four. © 2013 Elsevier Ltd. All rights reserved.

Rolo T.D.S., Ershov A., Van De Kamp T., Baumbach T.

in Proceedings of the National Academy of Sciences of the United States of America, 111 (2014) 3921-3926. DOI:10.1073/pnas.1308650111


Scientific cinematography using ultrafast optical imaging is a common tool to study motion. In opaque organisms or structures, X-ray radiography captures sequences of 2D projections to visualize morphological dynamics, but for many applications full fourdimensional (4D) spatiotemporal information is highly desirable. We introduce in vivo X-ray cine-tomography as a 4D imaging technique developed to study real-time dynamics in small living organisms with micrometer spatial resolution and subsecond time resolution. The method enables insights into the physiology of small animals by tracking the 4D morphological dynamics of minute anatomical features as demonstrated in this work by the analysis of fast-moving screw-and-nut-type weevil hip joints. The presented method can be applied to a broad range of biological specimens and biotechnological processes.

Cecilia A., Jary V., Nikl M., Mihokova E., Hanschke D., Hamann E., Douissard P.-A., Rack A., Martin T., Krause B., Grigorievc D., Baumbach T., Fiederle M.

in Radiation Measurements, 62 (2014) 28-34. DOI:10.1016/j.radmeas.2013.12.005


In this work, a group of Lu2SiO5:Tb (LSO:Tb) scintillating layers with a Tb concentration between 8% and 19% were investigated by means of synchrotron and laboratory techniques. The scintillation efficiency measurements proved that the highest light yield is obtained for a Tb concentration equal to 15%. At higher concentration, quenching processes occur which lower the light emission. The analysis of the reciprocal space maps of the (082) (280) and (040) Bragg reflections showed that LSO:Tb epilayers are well adapted on YbSO substrates for all the investigated concentrations. The spatial resolution tests demonstrated the possibility to achieve a resolution of 1 μm with a 6 μm thick scintillating layer. © 2014 Elsevier Inc. All rights reserved.

Moosmann J., Ershov A., Weinhardt V., Baumbach T., Prasad M.S., Labonne C., Xiao X., Kashef J., Hofmann R.

in Nature Protocols, 9 (2014) 294-304. DOI:10.1038/nprot.2014.033


X-ray phase-contrast microtomography (XPCμT) is a label-free, high-resolution imaging modality for analyzing early development of vertebrate embryos in vivo by using time-lapse sequences of 3D volumes. Here we provide a detailed protocol for applying this technique to study gastrulation in Xenopus laevis (African clawed frog) embryos. In contrast to μMRI, XPCμT images optically opaque embryos with subminute temporal and micrometer-range spatial resolution. We describe sample preparation, culture and suspension of embryos, tomographic imaging with a typical duration of 2 h (gastrulation and neurulation stages), intricacies of image pre-processing, phase retrieval, tomographic reconstruction, segmentation and motion analysis. Moreover, we briefly discuss our present understanding of X-ray dose effects (heat load and radiolysis), and we outline how to optimize the experimental configuration with respect to X-ray energy, photon flux density, sample-detector distance, exposure time per tomographic projection, numbers of projections and time-lapse intervals. The protocol requires an interdisciplinary effort of developmental biologists for sample preparation and data interpretation, X-ray physicists for planning and performing the experiment and applied mathematicians/computer scientists/physicists for data processing and analysis. Sample preparation requires 9-48 h, depending on the stage of development to be studied. Data acquisition takes 2-3 h per tomographic time-lapse sequence. Data processing and analysis requires a further 2 weeks, depending on the availability of computing power and the amount of detail required to address a given scientific problem. © 2014 Nature America, Inc. All rights reserved.

Krause B., Miljevic B., Aschenbrenner T., Piskorska-Hommel E., Tessarek C., Barchuk M., Buth G., Donfeu Tchana R., Figge S., Gutowski J., Hanschke D., Kalden J., Laurus T., Lazarev S., Magalhaes-Paniago R., Sebald K., Wolska A., Hommel D., Falta J., Holy V., Baumbach T.

in Journal of Alloys and Compounds, 585 (2014) 572-579. DOI:10.1016/j.jallcom.2013.09.005


The structure and morphology of uncapped and capped InGaN quantum dots formed by spinodal decomposition was studied by AFM, SEM, XRD, and EXAFS. As result of the spinodal decomposition, the uncapped samples show a meander structure with low Indium content which is strained to the GaN template, and large, relaxed Indium-rich islands. The thin meander structure is responsible for the quantum dot emission. A subsequently deposited low-temperature GaN cap layer forms small and nearly unstrained islands on top of the meander structure which is a sharp interface between the GaN template and the cap layer. For an InGaN cap layer deposited with similar growth parameters, a similar morphology but lower crystalline quality was observed. After deposition of a second GaN cap at a slightly higher temperature, the surface of the quantum dot structure is smooth. The large In-rich islands observed for the uncapped samples are relaxed, have a relatively low crystalline quality and a broad size distribution. They are still visible after capping with a low-temperature InGaN or GaN cap at 700 C but dissolve after deposition of the second cap layer. The low crystalline quality of the large islands does not influence the quantum dot emission but is expected to increase the number of defects in the cap layer. This might reduce the performance of complex devices based on the stacking of several functional units. © 2013 Elsevier B.V. All rights reserved.