Digitisers plus Nvidia GPUs enable fast signal capture & processing

Digitisers plus Nvidia GPUs enable fast signal capture & processing

New Products |
Spectrum Instrumentation (Grosshansdorf, Germany) has developed a software package that bridges its signal-capture digitisers with graphical processors using Nvidia’s CUDA architecture. The combination implements GPGPU – general purpose computing on graphical processing units – to provide high-speed, highly-parallel signal processing without the need for either extremely powerful conventional host processors, or for custom programming of FPGAs.
By eeNews Europe


The software maps the inherent parallelism of DSP functions – that is, similar operations applied to many samples – to the many-core structure of the GPU, originally designed for pixel processing. Spectrum’s SCAPP option – Spectrum CUDA Access for Parallel Processing – allows, the company says, fast and easy signal processing, “Currently digitizers have a bottleneck caused by having to use either the host PC’s central processor with 8 or 16 cores or a FPGA that is complex to program.” The SCAPP software option offers a powerful way to digitize, process and analyze signals. SCAPP allows a CUDA-based Graphical Processing Unit (GPU) to be used directly between any Spectrum digitizer and the PC. The advantage is that data is passed directly from the digitizer to the GPU where high-speed parallel processing is possible using the GPU board’s multiple (up to 5000) processing cores rather than a typical PC processor’s eight or 16 cores. A representative benefit of applying the parallel resources of the GPU and matching them to the signal processing task is shown by real-time FFTs, wich can run with a block size of 512k rather than 4k or 8k. It becomes even more important when signals are being digitized at high-speeds such as 50, 500 or even 5 Gsamples/sec.


The Spectrum approach uses a standard off-the-shelf GPU, based on Nvidia’s CUDA Standard. The GPU connects directly with the Spectrum digitizer card, with no further CPU interaction, accessing the parallel core architecture of the CUDA card for signal processing. The structure of a CUDA graphics card fits the task as it is designed for parallel data processing, which most signal processing jobs require. For example, the processing tasks of data conversion, filtering, averaging, baseline suppression, FFT window functions or even FFTs themselves can all be readily parallelized.


Spectrum adds; Until today, there have basically been two different approaches for processing data for high speed digitizers. The first and most common method simply uses the CPU for calculations. This approach offers a straightforward way to create processing programs using a variety of different programming languages and nearly no extra cost. However, the performance is often limited by the CPU’s resources as it must share its processing power with the rest of the PC system, the operating system and the GUI components.


The second approach is to use Field Programmable Gate Array (FPGA) technology, either with fixed processing packages from the vendor (Spectrum also offers this, for example with its Block Average package) or using an open FPGA with a Firmware Development Kit (FDK). This is a powerful solution but it comes with a much higher cost and complexity. Large FPGAs are expensive and to use them requires an FDK from the digitizer vendor along with other implementation tools from the FPGA vendor. Also, the level of knowledge to implement signal processing into an FPGA using VHDL isn’t a skill everybody has. This soon results in very long development cycles. Even worse, it is easy to run into the limits of the FPGA that is soldered onto the card. For example, if the block RAM is at the limit, there is no scope for further improvement.


A suitable CUDA graphics card ranges from around €150 to €3000 and the necessary software development kits (SDKs) are free of charge. However, the largest cost saver is the development time. There is no requirement to become familiar with an FDK, or the structure of the FPGA firmware, the FPGA design suite and the Simulation tools; the user can immediately apply a C-Code approach, and common design tools.


The SCAPP driver package consists of the driver extension for Remote Direct Memory Access (RDMA) that allows the direct data transfer from Digitizer to GPU. It includes a set of examples for interaction with the digitizer and the CUDA-card and another set of CUDA parallel processing examples with easy building blocks for basic functions such as filtering, averaging, data de-multiplexing, data conversion or FFT. All the software is based on C/C++ and can be implemented and improved with normal programming skills. Starting with tested and optimized parallel processing examples gives first results within minutes.


The interconnection between digitizer and GPU is based on PCI Express. Depending on the selected Spectrum digitizer card, a continuous throughput of more than 3.0 GByte/sec between the digitizer and GPU can be achieved. That is enough to support continuous acquisition from a 1 channel 8-bit digitizer sampling at 2.5 Gsample/sec or a 2 channel 14-bit unit running at 500 Msample/sec. By using one of Spectrum’s transfer-bandwidth saving data acquisition modes, such as Multiple Recording, the sampling speeds can be even much higher.


CUDA cards are scalable with processing cores between 256 and 5000 (in comparison, a dual Quad-Core Xeon CPU with Hyperthreading will only give 16 cores), with memory of several GByte and up to 12.0 TFLOP (1012 -Trillion Floating Point Operations per second). A small sized card with 1k cores and 3.0 TFLOP is already capable of doing continuous data conversion, multiplexing, windowing, FFT and averaging at 2 channels 500 Msample/sec with a FFT block size of 512k – and that can run for hours. In contrast, an FFT package from other digitizer vendors will typically limit the FFT block size to a maximum of 4k or 8k as this is the limitation of the FPGA.


The SCAPP package is a driver extension for all Spectrum cards. It can be used with the ultra-fast digitizers of the M4i platform (250 Msample/sec 16 bit, 500 Msample/sec 14 bit or 5 Msample/sec 8 bit) as well as the latest medium performance M2p platform (20 to 80 M Msample/sec multi-channel 16 bit). The basic RDMA functionality is available under a Linux operating system.


A video clip is at www.youtube.com/watch?v=HK5eZb65nlY


Spectrum Instrumentation; www.spectrum-instrumentation.com



Linked Articles
eeNews Embedded