VC707 speed testing

Post a reply

Confirmation code
Enter the code exactly as it appears. All letters are case insensitive.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:
BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON
Topic review
   

Expand view Topic review: VC707 speed testing

Re: VC707 speed testing

Post by support »

Hi,

Xilinx' PCIe block in the Xillybus bundle for VC707 is configured to advertise Gen1 (2.5 GT/s) x8. This supplies by far more bandwidth than the published 800 MB/s limit, so no point stepping up to Gen2. All Gen2 PCIe interfaces are backward compatible to Gen1.

Regards,
Eli

Re: VC707 speed testing

Post by Guest »

Hi Eli,
I was going through my numbers again trying to make sense of what I'm seeing and I realized this when I did lspci -vvv:
Code: Select all
01:00.0 Unassigned class [ff00]: Xilinx Corporation Device ebeb (rev 07)
   Subsystem: Xilinx Corporation Device ebeb
   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
   Interrupt: pin ? routed to IRQ 549
   Region 0: Memory at 13000000 (64-bit, non-prefetchable) [disabled] [size=128]
   Capabilities: [40] Power Management version 3
      Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
      Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
   Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
      Address: 0000000000000000  Data: 0000
   Capabilities: [60] Express (v2) Endpoint, MSI 00
      DevCap:   MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
      DevCtl:   Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
         MaxPayload 128 bytes, MaxReadReq 512 bytes
      DevSta:   CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
      LnkCap:   Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
      LnkCtl:   ASPM Disabled; RCB 64 bytes Disabled- CommClk-
         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
      LnkSta:   Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
      DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported
      DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
      LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
          Compliance De-emphasis: -6dB
      LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
   Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
   Kernel driver in use: xillybus
   Kernel modules: xillybus_pcie


I noticed the linkcap is at 2.5GT/s. Is the link cap set by the FPGA device or the host I'm on? I'm using an NVIDIA TX1 which supports Gen2, so I'm not sure why I'm only getting 2.5GT/s. Any ideas? Thanks

Re: VC707 speed testing

Post by support »

Hello,

It's a bit difficult to speculate on the short-term throughput, and I'm not sure there's much point in doing so. This is in the domain of topics for which there is no specification.

If you need 1 GB/s, get a revision B core.

Regards,
Eli

Re: VC707 speed testing

Post by Guest »

Hi Eli,
Thanks for your quick response. Do you know why I was able to measure a speed higher than the bandwidth limit? My guess is that counting clock cycles on the FPGA somehow does not capture the overhead of sending the words. Is this true?

Re: VC707 speed testing

Post by support »

Hello,

The common methodology to measure throughput is writing large amounts of data, and measuring how much data was written. But counting arriving data should yield the same, if not more accurate results.

The stated bandwidth limit for Virtex-7 is 800 MB/s, so there's no reason to expect anything much higher on a proper test. Indeed, Xillybus doesn't utilize the full capacity of the PCIe lanes on the bundles currently available for download from the site (revision A IP cores). If higher bandwidths are required, please contact support for access to revision B and XL as necessary.

For an explanation on Xillybus' available bandwidth: http://xillybus.com/doc/xillybus-bandwidth
Xillbus' guidelines for high bandwidth applications and testing: http://xillybus.com/doc/bandwidth-guidelines

Regards,
Eli

VC707 speed testing

Post by Guest »

Hello,
I wrote a test script to measure how much throughput I was getting through the PCIE connection. I modified the demo code for the VC707 to count the number of clock cycles (bus_clk) it took the host to send some n 32 bit words (using the 32 bit devices files) into the FIFO. On the TX1 side, I have a C script that just writes n 32 bit words as fast as it can. Once the FPGA receives the n words, it prints back the number of clock cycles it took to send the n words.

For small numbers of words like 100, it takes 100 clock cycles to send all the words. So in this case, I'm seeing 400bytes/(100 clock cycles/250Mhz) = 1GB/s

I think by default, the vc707 demo uses 250MHz with x8 lane width, right? The TX1 only has 4 Gen2 lanes, so the FPGA and the TX1 should be using 4 lanes. I figured that with 5GT/s from Gen2 and 4 lanes, I should be seeing rates closer to 2GB/s. Is there a flaw in my analysis or testing methodology? thanks!

Top