Hello,
I wrote a test script to measure how much throughput I was getting through the PCIE connection. I modified the demo code for the VC707 to count the number of clock cycles (bus_clk) it took the host to send some n 32 bit words (using the 32 bit devices files) into the FIFO. On the TX1 side, I have a C script that just writes n 32 bit words as fast as it can. Once the FPGA receives the n words, it prints back the number of clock cycles it took to send the n words.
For small numbers of words like 100, it takes 100 clock cycles to send all the words. So in this case, I'm seeing 400bytes/(100 clock cycles/250Mhz) = 1GB/s
I think by default, the vc707 demo uses 250MHz with x8 lane width, right? The TX1 only has 4 Gen2 lanes, so the FPGA and the TX1 should be using 4 lanes. I figured that with 5GT/s from Gen2 and 4 lanes, I should be seeing rates closer to 2GB/s. Is there a flaw in my analysis or testing methodology? thanks!