T. Baumbach, V. Altapova, D. Hänschke, T. dos Santos Rolo, A. Ershov, L. Helfen, T. van de Kamp, M. Weber, A. Kopmann, S. Chilingaryan, I. Dalinger, A. Myagotin, V. Asadchikov, A. Buzmakov, S. Tsapko, UFO collaboration

Final report, BMBF Programme: “Development and Use of Accelerator-Based Photon Sources”, 2014

Executive summary

Recent progress in X-ray optics, detector technology, and the tremendous increase of processing speed of commodity computational architectures gives rise to a paradigm shift in synchrotron X-ray imaging. The UFO project aims to enable a novel class of experiments combining intelligent detector systems, vast computational power, and so- phisticated algorithms. The on-line assessment of sample dynamics will make active image-based control possible, give rise to unprecedented image quality, and will provide new insights into so far inaccessible scientific phenomena.

A demonstrator for high-speed tomography has been developed and extensively used. The system includes critical components like computation infrastructure, reconstruction algorithms and detector system and proved that time-resolved tomography is feasible. Based on these results the final design of the UFO experimental station has been revised and several upgrades have been included to enable further imaging techniques.

A flexible and fully automated detector system for a set of up to three complementary cameras has been designed, constructed and commissioned. A new platform for smart scientific cameras, the UFO-DAQ framework, has been realized. It is a unique rapid-prototyping environment to turn scientific image sensors into intelligent smart cam- era systems. Central features are the modular sensor interface, an open embedded processing framework and high-speed PCI Express links to the readout server. The UFO-DAQ framework seamlessly integrates in the UFO parallel computing framework.
The UFO project demonstrated that high-end graphics processor units (GPUs) are an ideal platform for a new generation of online monitoring systems for synchrotron appli- cations with high data rates. A powerful computing infrastructure based on GPUs and real-time storage has been developed. Optimized reconstruction algorithms reach a throughput of 1 GB/s with a single GPU server. Generalized reconstruction algorithms include also laminography with tilted rotation axis.

Highly optimized reconstruction and image processing algorithms are key for real-time monitoring and efficient data analysis. In order to manage these algorithms the UFO parallel computing framework has been developed. It supports the implementation of efficient algorithms as well as the development of data processing workflows based on these. It automatically selects the best code depending on the available comput- ing resources. With its clear modular structure the framework is ideally suited as an exchange platform for optimized algorithms for parallel computing architectures. The code published under open source license is well-recognized by the synchrotron community.

The UFO project has been performed in close collaboration with three Russian part- ners. Various collaborating meetings have been organized and a number of scientists visited the partners partner institutions. The focus of the Russian contribution has been the smart camera platform and algorithm development. The results of the UFO project have been reported at several national and international workshops and conferences. The UFO project contributes with developments like the UFO-DAQ framework or its GPU computing environment to other hard- and software projects in the synchrotron community (e.g. Tango Control System, High Data Rate Processing and Analysis Initiative, Nexus data format, Helmholtz Detector Technology and Systems Initiative DTS).

In summary, within the UFO project it was possible to developed key components for future data intense applications. Most important are the X-ray detector system, a smart camera platform, GPU-based computing infrastructure and the parallel com- puting framework including various optimized algorithms. The potential and feasibility of high-speed X-ray tomography has been demonstrated by prototypes of experimental stations at the ANKA beamlines TOPO-TOMO and IMAGE.

Rolo T.D.S., Ershov A., Van De Kamp T., Baumbach T.

in Proceedings of the National Academy of Sciences of the United States of America, 111 (2014) 3921-3926. DOI:10.1073/pnas.1308650111

Abstract

Scientific cinematography using ultrafast optical imaging is a common tool to study motion. In opaque organisms or structures, X-ray radiography captures sequences of 2D projections to visualize morphological dynamics, but for many applications full fourdimensional (4D) spatiotemporal information is highly desirable. We introduce in vivo X-ray cine-tomography as a 4D imaging technique developed to study real-time dynamics in small living organisms with micrometer spatial resolution and subsecond time resolution. The method enables insights into the physiology of small animals by tracking the 4D morphological dynamics of minute anatomical features as demonstrated in this work by the analysis of fast-moving screw-and-nut-type weevil hip joints. The presented method can be applied to a broad range of biological specimens and biotechnological processes.

Cecilia A., Jary V., Nikl M., Mihokova E., Hanschke D., Hamann E., Douissard P.-A., Rack A., Martin T., Krause B., Grigorievc D., Baumbach T., Fiederle M.

in Radiation Measurements, 62 (2014) 28-34. DOI:10.1016/j.radmeas.2013.12.005

Abstract

In this work, a group of Lu2SiO5:Tb (LSO:Tb) scintillating layers with a Tb concentration between 8% and 19% were investigated by means of synchrotron and laboratory techniques. The scintillation efficiency measurements proved that the highest light yield is obtained for a Tb concentration equal to 15%. At higher concentration, quenching processes occur which lower the light emission. The analysis of the reciprocal space maps of the (082) (280) and (040) Bragg reflections showed that LSO:Tb epilayers are well adapted on YbSO substrates for all the investigated concentrations. The spatial resolution tests demonstrated the possibility to achieve a resolution of 1 μm with a 6 μm thick scintillating layer. © 2014 Elsevier Inc. All rights reserved.

Krause B., Miljevic B., Aschenbrenner T., Piskorska-Hommel E., Tessarek C., Barchuk M., Buth G., Donfeu Tchana R., Figge S., Gutowski J., Hanschke D., Kalden J., Laurus T., Lazarev S., Magalhaes-Paniago R., Sebald K., Wolska A., Hommel D., Falta J., Holy V., Baumbach T.

in Journal of Alloys and Compounds, 585 (2014) 572-579. DOI:10.1016/j.jallcom.2013.09.005

Abstract

The structure and morphology of uncapped and capped InGaN quantum dots formed by spinodal decomposition was studied by AFM, SEM, XRD, and EXAFS. As result of the spinodal decomposition, the uncapped samples show a meander structure with low Indium content which is strained to the GaN template, and large, relaxed Indium-rich islands. The thin meander structure is responsible for the quantum dot emission. A subsequently deposited low-temperature GaN cap layer forms small and nearly unstrained islands on top of the meander structure which is a sharp interface between the GaN template and the cap layer. For an InGaN cap layer deposited with similar growth parameters, a similar morphology but lower crystalline quality was observed. After deposition of a second GaN cap at a slightly higher temperature, the surface of the quantum dot structure is smooth. The large In-rich islands observed for the uncapped samples are relaxed, have a relatively low crystalline quality and a broad size distribution. They are still visible after capping with a low-temperature InGaN or GaN cap at 700 C but dissolve after deposition of the second cap layer. The low crystalline quality of the large islands does not influence the quantum dot emission but is expected to increase the number of defects in the cap layer. This might reduce the performance of complex devices based on the stacking of several functional units. © 2013 Elsevier B.V. All rights reserved.

Vogelgesang M., Chilingaryan S., Rolo T.D.S., Kopmann A.

in Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 – 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012 (2012) 824-829, 6332254. DOI:10.1109/HPCC.2012.116

Abstract

Current synchrotron experiments require state-of-the-art scientific cameras with sensors that provide several million pixels, each at a dynamic range of up to 16 bits and the ability to acquire hundreds of frames per second. The resulting data bandwidth of such a data stream reaches several Gigabits per second. These streams have to be processed in real-time to achieve a fast process response. In this paper we present a computation framework and middleware library that provides re-usable building blocks to implement high-performance image processing algorithms without requiring profound hardware knowledge. It is based on a graph structure of computation nodes that process image transformation kernels on either CPU or GPU using the OpenCL sub-system. This system architecture allows deployment of the framework on a large range of computational hardware, from netbooks to hybrid compute clusters. We evaluated the library with standard image processing algorithms required for high quality tomographic reconstructions. The results show that speed-ups from 7x to 37x compared to traditional CPU-based solutions can be achieved with our approach, hence providing an opportunity for real-time on-line monitoring at synchrotron beam lines. © 2012 IEEE.

Caselle M., Chilingaryan S., Herth A., Kopmann A., Stevanovic U., Vogelgesang M., Balzer M., Weber M.

in 2012 18th IEEE-NPSS Real Time Conference, RT 2012 (2012), 6418369. DOI:10.1109/RTC.2012.6418369

Abstract

X-ray computed tomography (CT) is a method for non-destructive investigation. Three-dimensional images of internal structure can be reconstructed using a two-dimensional detector. The poly-chromatic high density photon flux in the modern synchrotron light sources offers hard X-ray imaging with a spatio-temporal resolution up to the μm-μs range. Existing indirect X-ray image detectors can be adapted for fast image acquisition by employing CMOS-based digital high speed camera. In this paper, we propose a high-speed visible light camera based on commercial CMOS sensor with embedded processing implemented in FPGA. This platform has been used to develop a novel architecture for a self-event trigger. This feature is able to increase the original frame rate of the CMOS sensor and reduce the amount of the received data. Thanks to a low noise design, high frame rate (kilohertz range) and high speed data transfer, this camera can be employed in modern synchrotron ultra-fast X-ray radiography and computed tomography. The camera setup is accomplished by high-throughput Linux drivers and a seamless integration in our GPU computing framework. Selected applications from life sciences and materials research underline the high potential of this high-speed camera in a hard X-ray micro-imaging approach. © 2012 IEEE.

Chilingaryan S., Mirone A., Hammersley A., Ferrero C., Helfen L., Kopmann A., Dos Santos Rolo T., Vagovic P.

in IEEE Transactions on Nuclear Science, 58 (2011) 1447-1455, 5766797. DOI:10.1109/TNS.2011.2141686

Abstract

Advances in digital detector technology leads presently to rapidly increasing data rates in imaging experiments. Using fast two-dimensional detectors in computed tomography, the data acquisition can be much faster than the reconstruction if no adequate measures are taken, especially when a high photon flux at synchrotron sources is used. We have optimized the reconstruction software employed at the micro-tomography beamlines of our synchrotron facilities to use the computational power of modern graphic cards. The main paradigm of our approach is the full utilization of all system resources. We use a pipelined architecture, where the GPUs are used as compute coprocessors to reconstruct slices, while the CPUs are preparing the next ones. Special attention is devoted to minimize data transfers between the host and GPU memory and to execute memory transfers in parallel with the computations. We were able to reduce the reconstruction time by a factor 30 and process a typical data set of 20 GB in 40 seconds. The time needed for the first evaluation of the reconstructed sample is reduced significantly and quasi real-time visualization is now possible. © 2006 IEEE.