Progress of detector technology in recent years enables largely increased temporal and spatial resolution of scientific experiments. This innovation has led to large amounts of data that need to be transferred, processed and analyzed. The UFO platform has been founded in 2010 to apply the latest technologies in order to face the challenges of data-intensive sciences.
UFO aims to enable a new types of smart experiment using the vast computational power of massively parallel computation units, fast interconnects and scalable data processing algorithms. We are convinced that the collaboration of interdisciplinary groups is required throughout the whole data processing workflow from detector, readout to analysis intelligent archival of the data. Thus UFO focus on generic core technologies to solve challenges arising from datasets the region of Gigabytes, Terabytes and beyond.
1. High-throughput DAQ Electronics
Recent detector developments enable scientific experiments with unprecedented temporal and spatial resolution. In order to realize this instruments vast amounts of data need to be managed by the DAQ electronics. We use the latest commercial available technologies and products, in order to focus with our custom developments on the specific requirements of our applications.
UFO high-throughput DAQ platform
We have developed a modular FPGA-based high-throughput DAQ platform that is intended for rapid development of scientific data acquisition systems. The main design goals include the following:
- full data streaming architecture—continuous data acquisition at full sensor speed with low latency;
- online pre-processing; automatic adoption to varying experimental conditions;
- easily extendable to experiment-specific sensors and analog filters.
The readout architecture can be divided into three main parts: the daughter card containing application specific sensors, filters and digitizers, the mother-board based on powerful FPGAs, and the PC used for data acquisition system (DAQ) and further data analysis. Daughter and mother boards are connected by a high-speed, high-density FMC-Samtec connector. In order to read out the data as fast as possible, a standard PCI Express (PCIe) cable connection is used to transfer the data directly from the DAQ electronics to the main computer memory. There are passive copper cables and active optical links available for this interface. In order to benefit fully from the high bandwidth of the PCIe link, we use direct memory access (DMA) to transfer the data from the camera to the main computer memory, and vice versa. The infrastructure includes a library of IP blocks that are often used in DAQ applications. This includes DMA engines also supporting AMD GMAdirect and Nvidias GPUdirect, fast serializer, deserializer blocks for interfacing image sensors or ADCs and a DDR memory module for intermediate storage using the on-board DDR memory.
Ultra-fast digitizer with picosecond sampling
Based on the high-throughput DAQ platform a fast digitizer has been developed. The system is intended for any scientific applications that requires the measurement of fast pulses with down to a FWHM of 100 ps. The fast data streaming of the UFO platform allows to sample and to store individual pulses with an pulse frequencies in the range of kHz up to 5oo MHz. In other words, the system is able to record individual pulses in continuous mode with a resolution of mV and to resolve the relative pulse time jitter between two consecutive pulses with picosecond time resolution.
Different detectors like YBCO, NbN, Zero Biased Schottky Diode are supported. An LNA and a wideband power divider have been developed in order to amplify and then divide the analog pulse in four identical analog signals with a minimum insertion loss. The fast digitizer board receives the four analog signals from the power divider and samples the incoming signals by four individual samples. The timing between adjacent samples is controlled by the picosecond delay chips and is configurable by FPGA. Due to the very low noise layout design the total time jitter distribution shows a Gaussian shape with no systematical components and a very low standard deviation (Std-Dev) measured to be less than 1.7 ps. The high throughput readout board receives the 4 digital samples every 2 ns with 12 bits ADC resolution and transfers the acquired data directly to the high-end Graphics Processing Units (GPU) for the real-time data analysis. Therefore, the readout board is based on a bus master DMA architecture connected to PCI Express core logic developed to ensure high data throughput of up to 4 GByte/s.
Modular scientific streaming camera platform
High-speed image-based process control requires continuous recording and analysis of image streams. For this purpose the UFO scientific camera platform provides fast interconnects, customizable image-based triggers and embedded process control logic. All this feature are not available in commercial cameras. It is based on various CMOS active pixel sensors, mounted on mezzanine daughter cards. The daughter card is connected to the main readout board. For further data analysis the camera platform is seamlessly integrated in the UFO parallel processing framework.
The main benefits of the high-throughput camera platform are:
- Continuous data acquisition at full speed
- On-line image-based self-event trigger architecture (e.g. for fast rejection of use-less image)
- Automatic region-of-interest readout strategies
- Easily extendable to any available CMOS image sensor
- Fully programmable and controllable
For the UFO camera platform several image sensor boards are existing with a spatial resolution of 1 to 20 Megapixel and frame rates of up to 5000 frames per second in full sensor resolution. Even higher rates are possible at reduced or interpolated resolution by using the fast reject logic. Frame rate and resolution are only limited by the current sensor data throughput of about 6.5 GByte/s. The readout architecture itself is open for other sensors in order to match the specific requirements of scientific experiments. Due to its modular structure the camera electronics is easily extendable, e.g. for optical data transmission links.
2. Online processing of fast data streams
Scientific-intensive sciences produce always larger streams of data. Only massive parallel computing architectures provide a solution for the demanding data processing and analysis. The UFO group adopts technologies from high-performance computing and uses them to build up advanced data acquisition systems.
High-performance computing infrastructure for distributed data acquisition and control
In order to scale-up data stream processing with increasing data rates a flexible extension of the processing system is desirable. For high data rates an efficient data transfer is challenging. We aim to build up a high-speed data acquisition network connecting detectors, processing nodes and distributed storage systems. We integrate various high-speed protocols like Infiniband in a common data exchange layer.
Tools for efficient parallel programming
Processing of large data streams requires a software infrastructure that supports various groups with different interests, like software developers, algorithms developers, administrators and users.
Primary design goals for efficient an efficient parallel programming environment are:
- Build a fast image processor for arbitrary n-dimensional floating point data, although 2-, 3- and 4-dimensional data makes up most of the encountered data sets.
- Using all CPU cores, GPUs and architecture-specific optimizations, the framework should process data fast enough so that near real-time monitoring and subsequent process control becomes feasible.
- Because users operating the experiments are most often untrained programmers, the end-user API should be as simple as possible without requiring any low-level knowledge about accessing GPU hardware or multi-threading.
- Interoperability is necessary to interface with legacy code that is written in C, C++ and Python.
- Due to a heterogeneous experimental environment, platform independence is a must and Linux-support mandatory.
From a software engineering standpoint, further goals are desirable:
- Extending the library with new algorithmic building blocks should be easy and not require too much integration work.
- The overall system should be designed in a way that it can distribute the workload on several machines in a compute cluster.
- The code must be open sourced to serve as a common platform for shared algorithm development.
- It should support an open interface for plugin schedulers, so that run-time behavior can be tuned to any performance requirements.
The UFO parallel computing framework organizes data stream processing task as a graph, that describes the data flowing from detector to storage. It uses OpenCL to exploit the parallel processing capabilities of current GPU devices. The framework contains a broad library of tomographic reconstruction algorithms. All filter nodes that are necessary for the computation of analytical and algebraic reconstruction methods but also the generalized laminography are available. The most time critical operations have optimized for different NVIDIA and ATI architectures.
Low-latency image-based process control and trigger
Efficient data processing with massive parallel computing architectures saves valuable time of scientists. Even more complex algorithms might be considered that increase the quality of the results. But the development of efficient algorithms becomes ground breaking when the algorithms are fast enough to be processed in real-time. Online data stream processing enables immediate data quality control, online adjustment of experimental parameters and even automatic control loops. Intelligent trigger systems can be used to reduce data sizes.
In order to realize image-based control loops the UFO framework is seamlessly integrated in the fast control system “Concert”.
For control loops and trigger systems beside the computational throughput also the latency of data transfer and processing matters. We investigate remote DMA technologies to directly transfer data from the DAQ electronics to the processing devices or network adapters. We try to remove unnecessary copies of data in the main memory of the computers. Our results show that data transfer times down to the microsecond level are possible.
3. Analysis services for data intensive sciences
Data-intensive applications like synchrotron tomography have reached data volumes that often exceed the capacities that users can handle. The datasets recorded at imaging beamlines are getting too large to copy them on single disks. With in the project ASTOR we have started to develop an analysis environment, to provide users access to their data and offer tailored analysis function. Also standard analysis functions are often not suitable anymore and need to be replaced.
Visualization of complex scientific data
International collaboration require an open web-based access to their data. But display of complex data structures in web environments is often limited. We aim to develop tools to visualize complex dataset in a web browser. We aim for intelligent data catalogs that are designed according to inherent meaning of the data. The data browser should enable the scientist to select only this data that is needed for a certain question and thus avoid download of unused copies of large data chunks.
This efford includes analysis and generalization of typical data structures and the development of libraries for the visualization of typical data objects.
High-throughput algorithms for parallel computing architectures
Key to efficient programming of parallel architectures is an interplay of mathematical algorithms and hardware-aware programming. By considered both algorithms can often be speed-up by orders of magnitude. Examples of applications are image correlation techniques for optical inspection systems or tomographic image reconstruction. Both applications have been turned from time consuming offline processing to online execution.
Collaborative tools for international research communities
Large research infrastructures offer unique opportunities for scientists. The available data is often only analyzed with respect to a single question. Large parts of the datasets (e.g. for synchrotron tomography) are often not analyzed. So far, the enormous potential of collaborative work on the available datasets is mostly untapped. By using a new collaborative approach, this project aims to create new possibilities, that allow for a more efficient use of the valuable experimental time facilities for example at tomographic synchrotron beamlines through the coordination of research of complementary research groups and a regulation of the data usage by a common data policy.
Technically data catalogs needs to be extended to support collaborative analysis. Results of different sources need to be evaluated and merged. The intended cooperative analysis infrastructure makes research data available to a broad scientific audience, allows embedding recent research in teaching, and even provides a platform for ‘citizen science’ projects.