I have a project design on ZYNQ platform that uses the axi-pcie ip and connects to the DDR in the Processing System. I have a BAR enabled in the PCIe to be of 512MB to be able to write/read into the DDR memory space. We have a Linux kernel based driver that does “write” of a constant 8-byte data in a for loop (upto 256MB). I tested the boot image file for different PCIe configurations and it seems like the speed seems to be always around 55-66MBps irrespective of the lanes/speed. For lanes greater than 1 and speed greater than 2.5Gt/s, we witness spikes upto 120MBps but they are not stable. I used your concept to calculate the bandwidth for lane 1 at 2.5Gtps (Bandwidth * 8/30) and it comes out to be 66MBps (so that’s ideal for lane 1 at 2.5Gt/s). However, the bandwidth doesn’t scale up base on the lane/speed. I have read all your blogs/forums but wasn’t able to understand this behavior. I see that suggestion to use the DMA will improve the bandwidth, however, shouldn’t the write performance scale based on configuration?
Other questions –
1) Does a single TLP use a single lane or multiple lanes if the PCIe is configured for multiple lanes?
2) How did you create the fpga sniffer and what data are you exactly putting in it?
Please suggest what might be going on. I would really appreciate your help.