PCIe Controlled from GPU without host CPU

Comments and questions related to the "Down to the TLP" pages

PCIe Controlled from GPU without host CPU

Postby Guest » Thu Jun 07, 2018 1:26 pm

Good day,

First of all thank you very much for the post "Down to the TLP: How PCI express devices talk." It really helped me a lot te get a better understanding of how things work.

I would like to know if you could perhaps point me in in the right direction towards writing actual code to send/receive TLPs. The end goal will be to share data between the memory of a GPU and our Netronome Network cards while bypassing the host CPU. Is this possible and will I for example be able to make use of PCIe functions directly from a opencl 'kernel' running on the GPU?

I was told that the network card can be setup to intercept a specific TLP, but I don't have any knowledge of this.

Any documentation, tutorials or keywords to search for will be greatly appreciate.

Kind regards

Re: PCIe Controlled from GPU without host CPU

Postby support » Thu Jun 07, 2018 2:19 pm


TLPs are bus operations. They carry the command and/or data to read or write from memory space. So unless you happen to implement the PCIe bus interface on an FPGA, you won't get the sense of "sending" or "receiving" TLPs. Rather, you perform a read or write, possibly with a burst of data, through the PCIe bus.

I suppose the question you're asking is how to transport data directly between two devices that are attached to the same PCIe bus without the processor being involved -- this is often referred to as peer-to-peer traffic. Essentially, this means that one of the devices initiates bus transactions of writing or reading from an address space that is mapped to another device, rather to the processor.

There are many variants for this, however the typical problem is that if neither sides were designed with peer-to-peer traffic in mind, it might be tricky to, for example, make one side write data in the way that the other side expects it to arrive. If you have a NIC on one side and a GPU on the other, you might be able to tell the NIC to write its packet's content to a memory space that belongs to the GPU, and which causes the GPU to store the written data to its internal RAM. Odds are however that the host will still need to be involved in coordinating this operation, so it's not clear how much this saves.

To make things even less pleasant, vendors of GPUs tend to present a software API, and not give away their hardware's internals. This can be a significant obstacle when trying to do unconventional stuff with your GPU, and a lot of effort might be spent on a specific piece of hardware.

Other variants include an FPGA as one of the sides for peer-to-peer traffic, or possibly as the mediator between two other sides. Or, for simple cases, you may use a PCIe bus switch with a DMA controller feature, so the switch itself initiates bus transactions.

In short, not sure if it's worth the effort, but I wouldn't rule out this being possible.

Posts: 762
Joined: Tue Apr 24, 2012 3:46 pm

Return to General PCIe