Wall Street & Technology is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Infrastructure

05:10 PM
David Buechner, Impulse Accelerated Technologies
David Buechner, Impulse Accelerated Technologies
News
Connect Directly
RSS
E-Mail
50%
50%

Living With Hardware Acceleration

A quick review of what software developers ask about moving C code to FPGA.

David Buechner
"Early adopter" trading firms deployed FPGA-based systems in the last few years as a means of reducing latency. In 2011 the number of teams considering this strategy has doubled according to Impulse Accelerated Technologies, one of the solutions providers in this industry. This article addresses common questions and/or misunderstandings about what is really possible and how much effort it entails.

How do FPGAs compare to Standard Microprocessors (CPUs)?

Single-core CPUs are hitting the limits of physics in terms of geometry, and hence the current trend toward multiple core processors and parallelism in applications. FPGAs focus on flexibility in parallel programming when compared to CPUs, at the expense of more challenging programming. Like CPUs, FPGAs are devices that are manufactured in high volumes and have many different applications. As such, they benefit from the same process improvements and density increases that Moore predicted. Since FPGAs are architecturally more like memory devices than like CPUs, they have exceeded CPUs in growth of some usable on-chip resources.

FPGAs have some power efficiency advantages over CPUs because they operate at lower clock frequencies while supporting more processing streams. CPUs are bottlenecked by their fixed instruction pipelines, with performance gains coming from complex cache architectures and other power-demanding technologies. FPGAs adapt the computing architecture to the application, reducing the need for supporting features that are not used by all applications. For financial applications, FPGAs use significantly less power than CPUs and have the performance to handle large amounts of data without significant jitter. By eliminating the operating system and CPU instruction pipeline, an FPGA-based feed handling system can process data in line with a network feed, with low latency and predictable performance.

What type of logic fits where?

We have worked with GigE, 10GigE in to UDP parser, to PCIe kernel bypass to user space in FPGA. Outbound orders continue to be executed through the host system but are targeted for next step of applying FPGAs, specifically for lower latency order packaging. The middle portion, order books, are also targeted in the middle of inbound and outbound for complete solution. Integration of analytics is envisioned but has yet to come. GigE and 10GigE to HDL based TCP/IP stack for a FIX Engine application is currently quickest in FPGA. The considerations when implementing in an FPGA design include available space in the selected FPGA, the amount of available memory (inside and outside the FPGA), and support from design tools. Using a good quality design tool can enable fast prototyping and allow easier migration between different-sized FPGAs. FPGAs are growing dramatically and have more chip and board memory. Some hardware has up to 12 Gigabytes of board-level memory (Convey has 128GB in the system available to each FPGA). This makes it possible to architect a system to use board memory and overall data flow efficiently and execute with very high performance.

Stuff that fits poorly in FPGA includes high volume, high speed, un-parallelizable sequential algorithms. If an algorithm cannot benefit from high levels of pipelining then it is not going to go in an FPGA. Applications that required random accesses to large amounts of physical memory - for example a database - may not be appropriate for FPGAs, or may require significant algorithm refactoring to schedule and pipeline the memory accesses. If the application does not require in-line network processing for the benefits of determinism (i.e. expected latency) and/or low latency, it is probably not worth the trouble to port to an FPGA.

What engineering challenges can be encountered when solving problems using FPGAs?

Size restrictions. Unlike CPUs that can load and execute an essentially unlimited amount of code, processes in FPGA hardware utilize a fixed amount of space on the chip. Place and route time can be considerable, with run times of an hour or more being common. This means that the development cycle is different, and simulation/emulation methods of design are important. HDL hand coding takes time. A higher level tool flow such as a C-to-FPGA compiler can reduce this dramatically - as much as 50% - but it is still important to learn FPGA coding best practices for increased productivity.

Starting with a software model of the application, for example a C-language implementation, this model gets written with an awareness of limitations of the FPGA, such as the memory architecture. Then a standard C environment such as GCC/GDB or Visual Studio is used to validate the algorithm. Then a C-to-FPGA compiler creates a synthesizable, FPGA-compatible version of the application. Iterations on the original C-code ensure that refactoring done for hardware compatibility is still reflected in the original model.

Test fixture code is created simultaneously for both the software model and for the hardware version. (C-language FPGA compiler tools make this easier.) Optionally a hardware simulator such as Mentor's ModelSim can be used to validate the hardware model.

Synthesis occurs with FPGA vendor supplied tools which map the application to the target hardware device. Information from this process (for example clock rates and resource constraints) is used to go back and refine/optimize the original application. FPGA firmware and software libraries provided by the FPGA platform/board vendor. This is used to set up FPGA-to-host communications. Note, in many cases these firmware and software libraries are integrated already into higher-level design tools such as Impulse C.

What does the future look like? We expect to see improvements such as:

  • Increased libraries of ready to run parsing environments for finance
  • 10GigE, 40 GigE interfaces
  • TCP/IP HDL stacks more ubiquitous
  • More board and on-chip memory
  • More configurable NIC applications
  • Moving more towards being like SW development
  • More available talent - developers with both software and hardware experience

David Buechner is a Vice President at Impulse Accelerated Technologies.

Register for Wall Street & Technology Newsletters
Video
7 Unusual Behaviors That Indicate Security Breaches
7 Unusual Behaviors That Indicate Security Breaches
Breaches create outliers. Identifying anomalous activity can help keep firms in compliance and out of the headlines.