by support »
Hi,
First of all, the latency figures you gave make sense of the PCIe delay explanation. The reason the write operation is so quick is that it's a "posted request", meaning that a TLP packet is sent, and there's no need to wait for a response. So it's quick. The read operations requires waiting for the whole back and forth.
As for TLP packeting, there isn't much to do: Any read or write operation made by the processor ends up with a single TLP packet sent, handling that single operation. As far as I know, there isn't a single processor out there which has the ability to go more efficient than that. The rationale is simple: If you want efficiency, use DMA. So there's no point in implementing anything smarter on the processor.
So it doesn't looks like there is much you can do software-wise to improve this. There is no "kernel TLP packeting".
As for how DMA can help, I'll give Xillybus' interrupt handling routine as an example: Usually, interrupt service routines (ISRs) read status registers from the hardware to determine the whereabouts of the interrupt. Since I wanted to avoid reads in my driver, I turned the whole concept around: Before the hardware issues an interrupt, it fills a dedicated RAM buffer with information about why the interrupt was issued. The ISR reads from this buffer, which is in RAM, so there's no PCIe reads made. It does confirm the reception of the interrupt with a write operation to the BAR region, but as you've seen, that's almost costless.
I don't know how well this may fit your application.
DMA may sound a bit scary, but it's actually easier to implement DMA writes in FPGA logic than responding to read requests from the host. You form a TLP write packet on the FPGA, and submit it through the PCIe core's interface, supplied by the FPGA's vendor. But I suppose that you used some sample design implementing plain register I/O, so implementing anything on the PCIe interface is an obstacle.
Regards,
Eli
Hi,
First of all, the latency figures you gave make sense of the PCIe delay explanation. The reason the write operation is so quick is that it's a "posted request", meaning that a TLP packet is sent, and there's no need to wait for a response. So it's quick. The read operations requires waiting for the whole back and forth.
As for TLP packeting, there isn't much to do: Any read or write operation made by the processor ends up with a single TLP packet sent, handling that single operation. As far as I know, there isn't a single processor out there which has the ability to go more efficient than that. The rationale is simple: If you want efficiency, use DMA. So there's no point in implementing anything smarter on the processor.
So it doesn't looks like there is much you can do software-wise to improve this. There is no "kernel TLP packeting".
As for how DMA can help, I'll give Xillybus' interrupt handling routine as an example: Usually, interrupt service routines (ISRs) read status registers from the hardware to determine the whereabouts of the interrupt. Since I wanted to avoid reads in my driver, I turned the whole concept around: Before the hardware issues an interrupt, it fills a dedicated RAM buffer with information about why the interrupt was issued. The ISR reads from this buffer, which is in RAM, so there's no PCIe reads made. It does confirm the reception of the interrupt with a write operation to the BAR region, but as you've seen, that's almost costless.
I don't know how well this may fit your application.
DMA may sound a bit scary, but it's actually easier to implement DMA writes in FPGA logic than responding to read requests from the host. You form a TLP write packet on the FPGA, and submit it through the PCIe core's interface, supplied by the FPGA's vendor. But I suppose that you used some sample design implementing plain register I/O, so implementing anything on the PCIe interface is an obstacle.
Regards,
Eli