Leading CNN performance per watt in a midrange FPGA

Leading CNN performance per watt in a midrange FPGA

Omnitek launched a new Convolutional Neural Network, which it claims offers world-leading performance per watt at full FP32 accuracy in a midrange SoC FPGA.
By eeNews Europe


The Omnitek Deep Learning Processing Unit (DPU) is optimised for Intel’s Arria 10 GX architecture and delivers 135 GOPS/W at full 32-bit floating point accuracy when running the VGG-16 CNN in an Arria 10 GX 1150. The DPU design uses a new mathematical framework combining low-precision fixed point maths with floating point maths to achieve very high compute density with no accuracy loss.

Scalable across a wide range of Arria 10 GX and Stratix 10 GX devices, the DPU can be tuned for low cost or high performance. The DPU is fully software programmable in C/C++ or Python using standard frameworks such as TensorFlow, enabling it to be configured for a range of standard CNN models including GoogLeNet, ResNet-50 and VGG-16 as well as custom models. No FPGA design expertise is required to do this.

FPGAs are ideally suited to Machine Learning applications due to their massively parallel DSP architecture, distributed memory and ability to reconfigure the logic and connectivity for different algorithms. Omnitek’s DPU can be configured to provide optimal compute performance for CNNs, RNNs, MLPs and other neural network topologies existing today, or ones that still have to be developed.

More information


Related news

Omnitek releases a 3D LUT for colour space conversions

Flex Logix now licensing neural network acceleration core

Embedded vision processor IP comes with boosted CNN engine

ASIC AI-accelerator with embedded MRAM is now production ready


eeNews Embedded