Xillybus bandwidth can't reach more than 75MB/s

Post a reply

Confirmation code
Enter the code exactly as it appears. All letters are case insensitive.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:
BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON
Topic review
   

Expand view Topic review: Xillybus bandwidth can't reach more than 75MB/s

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by support »

Hi,

Just a final check: The bundle's readme file requires that ISE 13.1 is used to generate Xilinx' PCIe IP core (so that the version of PCIe Block Plus generator is 1.14). Just wanted to confirm that you followed that.

Regards,
Eli

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by mjuzwiak »

Hello,

Done steps you requested.

$ dd if=/dev/zero of=/dev/xillybus_write_test bs=16kB count=64kB
64000+0 records in
64000+0 records out
1024000000 bytes (1.0 GB) copied, 13.5872 s, 75.4 MB/s


Code: Select all
------- /dev/xillybus_read_test

  Upstream (FPGA to host):
    Data width: 32 bits
    DMA buffers: 32 x 128 kB = 4 MB
    Flow control: Asynchronous, select() and non-blocking read() supported
    Seekable: No

------- /dev/xillybus_write_test

  Downstream (host to FPGA):
    Data width: 32 bits
    DMA buffers: 32 x 128 kB = 4 MB
    Flow control: Asynchronous
    Seekable: No
    FPGA RAM for DMA acceleration: 4 segments x 512 bytes = 2 kB

------- /dev/xillybus_read_8

  Upstream (FPGA to host):
    Data width: 8 bits
    DMA buffers: 4 x 4 kB = 16 kB
    Flow control: Asynchronous, select() and non-blocking read() supported
    Seekable: No

------- /dev/xillybus_write_8

  Downstream (host to FPGA):
    Data width: 8 bits
    DMA buffers: 4 x 4 kB = 16 kB
    Flow control: Asynchronous
    Seekable: No
    FPGA RAM for DMA acceleration: None

------- /dev/xillybus_mem_8

  Upstream (FPGA to host):
    Data width: 8 bits
    DMA buffers: 4 x 4 kB = 16 kB
    Flow control: Synchronous
    Seekable: Yes, with 5 address bits

  Downstream (host to FPGA):
    Data width: 8 bits
    DMA buffers: 4 x 4 kB = 16 kB
    Flow control: Synchronous
    Seekable: Yes, with 5 address bits
    FPGA RAM for DMA acceleration: None



Updated bios to lastest version, no result. Also i tried to manual assign IRQ, still no effect.
I'll try to change PC.
I'm going to holidays tommorow for one week; hope we'll continue when I'll get back.

Thank You very much,
Mathew

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by support »

Hi,

I took a look on the xillybus.v file you sent me. There is indeed nothing special about it. Your pinout also seems OK.

Large DMA buffers can't reduce bandwidth, and in fact there is nothing, except synchronous streams, that can explain that bandwidth. Really.

I would also be quite surprised if it turned out to be a poor PCIe link. But I would make a quick try on another computer, if that is possible for you. Even though I wouldn't put my money on that this is the issue.

In all previous cases I had of this kind of black magic, it turned out that there was some confusion with old and new files. So to be absolutely sure that there really is a problem, please try this. Even though it appears like you've been through this already:

(1) Please create a new custom core for Virtex-5, based upon the default configuration. Change nothing except for the name of xillybus_write_32, to something else, say xillybus_write_test.
(2) Then adopt the downloaded core into a freshly downloaded demo project (replace the files), and adjust it: Rewire the full signal as you previously did, and search-replace write_32 to write_test.
(3) Adjust the UCF file
(4) Implement the project and load it to the FPGA
(5) Run the dd test you did above.

The idea behind this checkup is that we'll be absolutely sure that what is loaded into the FPGA is what we think it was. Otherwise, we start to suspect the hardware. So the next step is to try using another FPGA board and/or computer.

Regards,
Eli

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by mjuzwiak »

Forgot to login above.

Forgot:
I connected PCIE_PERST_B_LS to dip switch.

Mathew

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by Guest »

Hello,

I'm really not suprised that I hit rare case ;)

clock:
I've downloaded version for ML506, and changed ucf to:

Code: Select all
INST "*/pcie_ep0/pcie_blk/SIO/.pcie_gt_wrapper_i/GTD[0].GT_i" LOC = GTX_DUAL_X0Y2;
NET  "PCIE_REFCLK_P"       LOC = "AF4"  ;
NET  "PCIE_REFCLK_N"       LOC = "AF3"  ;
NET  "PCIE_PERST_B_LS"     LOC = "AC24"  ;


AF4, AF3 are taken from my Virtex5 device ucf:

Code: Select all
NET  PCIE_CLK_QO_N        LOC="AF3";   # Bank 118, MGTREFCLKN_118, GTP_DUAL_X0Y1
NET  PCIE_CLK_QO_P        LOC="AF4";   # Bank 118, MGTREFCLKP_118, GTP_DUAL_X0Y1


detalied info: http://www.xilinx.com/support/documenta ... /ug347.pdf page 43

dd command:

'file' is 1.1GB file previously generated by dd.
Code: Select all
$ dd if=file of=/dev/xillybus_write_32
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 12.2857 s, 87.4 MB/s


and

Code: Select all
$ dd if=/dev/zero of=/dev/xillybus_write_32 bs=16k count=100KB
100000+0 records in
100000+0 records out
1638400000 bytes (1.6 GB) copied, 18.2155 s, 89.9 MB/s


dd consumes ~16% of CPU.

I downloaded example project for ML506 again, so my xillydemo.v is very close to example, just assigned full signal to 0.
ad 1. Sending it right now
ad 2. Just like I said above, downloaded example project, and before building changed files to my last core (with large buffers). Results above from dd come from my new core. Could large DMA buffers now cause low bandwidth?
ad 3. Ough. Should I look for new motherboard? Can we check retransmission ratio?

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by support »

Hi,

Let me first say that what you have there is extremely rare. Xillybus is running on a lot of platforms, and I've never seen anything like this.

You mentioned something about a 100 MHz clock to PCIE_REFCLK. The clock you should use is the one coming from the motherboard, not just some oscillator. Have you changed the pinout relative to the demo bundle you downloaded?

I can see that you used 512 bytes' blocks in your dd attempt. Could you please try it again with bs=16k or something? Just to be sure it's for real?

Also, please add the dd command you used. As we are in a debugging session, the devil is probably somewhere in some very tiny detail...

I can see a few possibilities:

(1) There is something overlooked in the xillydemo.v file. Could you please send the file you're running with to Xillybus' main email?
(2) For some reason, the stream you're using is synchronous, despite what it appears. These things can happen when you've downloaded one IP core and then another, and though that the new ngc file is used, but it's actually the old. The data rates you have there are typical for synchronous streams.
(3) Unlikely (unless you're using the wrong reference clock): Your PCIe hardware is faulty at the physical level, causing a high rate of retransmissions of packets. I would expect a lot of other trouble if this was the case.

Regards,
Eli

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by mjuzwiak »

Hello,

I had same anxiety (about allocating huge amount of memory) before, i tested it as first thing some time ago.
Anyway, checked it once again, I changed code to send small chunks of data (512KB-1MB) in loop. Same result.
I used also dd to copy 1GB of data, here's report:

Code: Select all
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 11.7009 s, 89.6 MB/s


and 2nd time

Code: Select all
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 11.8763 s, 88.3 MB/s


I ignore all coming data in fpga, i've assigned user_w_write_32_full to 0.
Maybe i should change linux distro? Or it could be problem with clock on fpga (dont think so, PCIE transmission should not work). I connected differential 100MHz to PCIE_REFCLK_P and PCIE_REFCLK_N.

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by support »

Hi,

First, yes, since the stream is asynchronous, a write() call may return immediately without the data being actually sent to the FPGA. If you want to time the arrival of the data, it's enough to include the close() call, which returns only after the data has been sent (or a timeout).

The situation you have there is indeed odd. You shouldn't need to allocate DMA buffers with a total larger than 1 MB to achieve full throughput. And the chances that something is wrong with your hardware are extremely slim.

I took a look on the C code. It appears like you're allocating huge buffers in memory. Memory allocation can slow down significantly, not to mention if pages are flushed to disk. I would suggest allocating a relatively small buffer (say, 512 kB) and use it several times to reach your data transmission goal.

I would also suggest using the dd Linux utility for measuring data rates. This is how I check these figures.

Regards,
Eli

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by Guest »

Thank You for reply.

I've tried all of shown devices :)
Connecting user_w_md_download_full has no effect - still same time.

I generated new core, edited xillybus_read_32 and xillybus_write_32 for large buffers (see below). It's now little faster (~10MB/s)
Each of them has now 64x1MB DMA buffers.
I did some tests:

Code: Select all
data sent - time - bandwidth
64MB - 121ms : 528MB/s
128MB - 776ms : 167MB/s
256MB - 2310ms : 110MB/s
512MB - 5396ms : 94MB/s
1024MB - 11525ms : 89MB/s
2048MB - 23851ms: 85MB/s
4096MB - 48497ms: 84MB/s


I ignore incoming data in fpga.
I'm not sure, but probably write function returns after filling DMA buffer, not transfering all data - that explain why 64MB is so fast.
I tried on 64bit Ubuntu 12.04 - same results.
I'm going crazy with it. Could it be poor motherboard? Its Gigabyte GA-M61SME-S2 ( http://www.gigabyte.us/products/product ... id=2507#sp ).

Of course, i can generate core with ~1GB of DMA buffer, but I'm almost sure it will not fix the problem, and it's not good way.

My devices now:
Code: Select all
------- /dev/xillybus_read_32 or \\.\xillybus_read_32

  Upstream (FPGA to host):
    Data width: 32 bits
    DMA buffers: 64 x 1 MB = 64 MB
    Flow control: Asynchronous, select() and non-blocking read() supported (on Linux)
    Seekable: No

------- /dev/xillybus_write_32 or \\.\xillybus_write_32

  Downstream (host to FPGA):
    Data width: 32 bits
    DMA buffers: 64 x 1 MB = 64 MB
    Flow control: Asynchronous
    Seekable: No
    FPGA RAM for DMA acceleration: 8 segments x 512 bytes = 4 kB

------- /dev/xillybus_read_8 or \\.\xillybus_read_8

  Upstream (FPGA to host):
    Data width: 8 bits
    DMA buffers: 4 x 4 kB = 16 kB
    Flow control: Asynchronous, select() and non-blocking read() supported (on Linux)
    Seekable: No

------- /dev/xillybus_write_8 or \\.\xillybus_write_8

  Downstream (host to FPGA):
    Data width: 8 bits
    DMA buffers: 4 x 4 kB = 16 kB
    Flow control: Asynchronous
    Seekable: No
    FPGA RAM for DMA acceleration: None

------- /dev/xillybus_mem_8 or \\.\xillybus_mem_8

  Upstream (FPGA to host):
    Data width: 8 bits
    DMA buffers: 4 x 4 kB = 16 kB
    Flow control: Synchronous
    Seekable: Yes, with 5 address bits

  Downstream (host to FPGA):
    Data width: 8 bits
    DMA buffers: 4 x 4 kB = 16 kB
    Flow control: Synchronous
    Seekable: Yes, with 5 address bits
    FPGA RAM for DMA acceleration: None

Re: Xillybus bandwidth can't reach more than 75MB/s

Post by support »

Hello,

Yes, this is the correct place to ask. (:

There is no reason why 150 MB/s shouldn't be reached. What you didn't mention, is where the data is going. The first thing I would look at, is if the destination FIFO is being emptied by the logic at the required rate. The most plausible explanation is that the rate you see is the rate at which data is being fetched from that FIFO. If this is the case, it's the user_w_md_download_full signal that goes high and hence stalls the data. You may try to assign zero to it, rather than connecting it to the FIFO. This will break the system's functionality, but show clearly if this was the issue or not.

It's not 100% clear from your question which of the devices you tried with. Anyhow, for fast data transmission, only 32-bit wide interfaces should be used. It appears like you went for /dev/xillybus_md_download, which is fine.

Besides, the 1:160000 ratio between the data rates is explained by the code you attached: You're counting the 32-bit data items from PC to FPGA, and send a single byte every time the counter reaches 40000. One byte for each chunk of 40000 words of 4 bytes each, that's exactly the ratio.

I hope this helps.
Eli

Top