Push the UVM start button then hit the accelerator, Part 2

Push the UVM start button then hit the accelerator, Part 2

Feature articles |
So, you’re STILL not using UVM? This article won’t help you learn any of the features of UVM, but it will point you to how you might speed up your UVM learning, your UVM adoption and even your UVM execution throughput.
By eeNews Europe


Editor’s note; This commences part 2 of the article, of which part 1 appears on the EDN-Europe website, here. For convenience, this link accesses a single pdf file of the complete article.

Faster interfaces using transactors

Taking a look at the top-level ports of a typical DUT, in many cases these will be standard peripheral or bus interfaces (such as USB, SATA, APB etc.); the behaviour of each is well understood. We can use this known behaviour to agree short cuts in the communication between simulator and hardware. For example, instead of relying on the simulator to drive every signal change for the writing of a data value over a standard port, we place some extra hardware alongside the DUT in the FPGA(s) which makes those changes for us locally. This is shown in Figure 3 as a BFM, or Bus Functional Model.

A single command or function call on the simulator side could then initiate all the necessary changes on the hardware side. The mechanism by which this all happens is called a transactor, and in Figure 3 this comprises the BFM interface on the simulator side, a transaction layer and the BFM in the FPGA hardware.

Figure 3. Partitioning using a transactor

Not only does the transactor simplify the communication, it also allows the hardware to run faster because it is not reliant on slavishly following signal-by-signal events in the simulator. If all the interfaces between the simulator and the DUT employ the relevant transactor, then we can achieve greater acceleration overall.

UVM already employs transactional level communications, but how do we convert our simulator-only UVM test environment to use transactors and FPGA-based hardware?



Linking UVM and FPGA

Preferably, we should have written our UVM/SystemVerilog testbench in a style that allows easy inclusion of transactors. As it happens, the style recommended by Easier UVM is exactly such a style, and with a few simple substitutions, the verification team can re-compile and run the design using FPGA-based acceleration. In Aldec’s case, this adaptation of the UVM is performed mostly automatically in its Design Verification Manager (DVM) tool.

At the heart of a transactor-based acceleration is the representation of interface transactions as function calls within UVM agents. The UVM agent’s driver makes a single function call, resulting in – sometimes – hundreds of signal changes in the hardware, at a much higher clock rate than the simulator event rate. The same effect is happening on outputs from the DUT into the simulator via the UVM agent’s monitors. It is this ratio of calls-to-signal changes that increases throughput.

We might also think of the boundary as dividing the timed and untimed domains, between events and clocks, and the clock is never used to synchronise communication across the boundary.

We see this simplified in Figure 4, which also differentiates the two domains by the languages used to describe them, i.e. a Hardware Verification Language (HVL) – in this case SystemVerilog on the one side; and the transactors written in Hardware Description Language (HDL) such as Verilog or VHDL, on the other.

Figure 4. HVL and HDL communication via function calls

Of course, SystemVerilog can be employed as both an HDL and an HVL, but if there are any SystemVerilog items appearing on the right-hand side of Figure 4 then it must be synthesisable, or somehow interpreted as such during set-up.



We wouldn’t get far without SCE-MI (and DPI-C)

The approach of HVL-HDL partitioning has been captured in Accelera’s Standard Co-Emulation Modeling Interface (SCE-MI) which that body describe as “allowing a model developed for simulation to run in an emulation environment and vice versa.” Aldec follows the SCE-MI standard which recommends that the cross-boundary function calls are made via SystemVerilog’s Direct Programming Interface (DPI), which allows it to communicate with other programming languages. In the case of the C language, we term this DPI-C for short.

DPI-C allows us to make calls to externally defined (and imported) C functions from within a SystemVerilog testbench, and to export SystemVerilog items allowing them to be accessed from C. This is very helpful for accelerating UVM as we can use DPI-C as the wrapper between the testbench calls and the transactors which will be synthesised along with the DUT into FPGA. We can see this in Figure 5 (note the naming convention used by Aldec).

Figure 5. DPI-C acts as the boundary between simulator and hardware

Having said that SystemVerilog code on the HDL side must be synthesisable, the definition of what is and isn’t synthesisable varies from tool to tool depending on how much effort each tool vendor has put into implementing each HVL and HDL standard. Aldec has taken the view that, to allow easier conversion of UVM to use transactors, some traditionally non-synthesisable SystemVerilog needs to be handled. For example, Figure 6 shows part of the code for the driver in the BFM for the trivial example outlined above.

Figure 6. BFM Code partitioned and made synthesisable

This code was previously in the UVM agent but must be moved to the HDL side of the boundary when using hardware acceleration. Notice the use of “while” statements and an implicit state machine, both typically non-synthesisable SystemVerilog code. Other non-synthesisable code constructs appear in UVM code which might be interpreted as registers which are driven by multiple sources, and hence illegal.

Aldec’s DVM handles this by employing a specific SCE-MI2 Compiler, which interprets the code, converts non-synthesisable items into equivalent HDL, and extracts the DPI-C elements into so-called Emulation Bridge code. This is done automatically as part of the DVM flow, which also does all those other jobs involved in making FPGA-hostile SoC HDL ready for implementation in an FPGA-based emulator, including gated-clock conversion, memory modelling, partitioning and synchronising to a single clock in order to allow clock stopping and single-stepping during emulator runtime.


Time to step on the accelerator

This article has aimed to show that UVM need not be hard to learn and need not take a long time to run. The use of Easier UVM and easy access to FPGA-based hardware acceleration, can bring UVM to a wider user base, including teams which are creating today’s most complex FPGA and SoC designs.

Further detail is to found by viewing the Doulos’s Easier UVM introduction webinars here and Aldec’s UVM Acceleration information here.

About the author

Doug Amos is an FPGA Consultant who works closely with Aldec around its FPGA-based prototyping solutions; he is also the (UK’s) National Microelectronics Institute’s FPGA Network Manager

Linked Articles
eeNews Embedded