Xillybus to DDR3 RAM to GTX Transcievers

Post a reply

Confirmation code
Enter the code exactly as it appears. All letters are case insensitive.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:
BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON
Topic review
   

Expand view Topic review: Xillybus to DDR3 RAM to GTX Transcievers

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by mwayne »

Hi,

I only posted a section of the code. There is another part that handles if the amount left is less than 500000.
If it is, it only writes that much, not always the 500000.

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by support »

Hello,

Reading and writing large chunks saves the overhead related to system calls of the operating system, so yes, there's some increased efficiency there. On the other hand, there's also a negative effect with large buffers, probably related to the CPU cache: If the chunk is small enough to be kept in the CPU's cache until it's written, it's faster. My own experience is that 64kB is the optimal point, but YMMV.

Taking a second glance on your C code, I don't think it's written well. At the very least, I would expect bytesleft to be subtracted with bytesread, and not a constant. The whole point is that bytesread might be smaller than the required length. And then the whole thing with bytesleft >= 500000 doesn't seem to make sense. It look like this code is working just because all _read() calls always return with bytesread == 500000, which is a common situation when reading from plain files, but is not guaranteed.

Regards,
Eli

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by mwayne »

Ok, thank you.

I downloaded a program (IMDisk) that mounts your RAM as a hard drive. Then I just moved the files from C:\ (my SSD) to the new virtual drive R:\, and this seemed to fix the problem. This lets me preload my bit files to output .... except now I'm of course limited by the amount of RAM I have.

Would fiddling with the size of the DMA buffer in the core or the size of my buffer in the C code (current 500000 bytes) improve performance? Aren't burst operations faster than a lot of smaller ones?

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by support »

Hello,

Please refer to this page: http://xillybus.com/doc/bandwidth-guidelines

In particular item 2, saying "Don't involve the disk or other storage". The problem is that your disk is too slow to keep up with the data rate you want.

The reason it works anyhow most of the time is that any modern operating system has a disk cache in RAM, so most of the time you actually read the data from RAM. Solution: In your program, read the data into a RAM buffer before starting, and write to the device file from the buffer.

Regards,
Eli

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by mwayne »

Hi,

So, this worked. I was able to make an arbitrary waveform generator that reads from a binary file, writes to an FPGA over PCIe, and outputs the binary bit pattern over the high speed serial transceivers. We are trying to push the limits of what it can do, and have run into a weird problem.

I have something like this in my code

Code: Select all
bytesleft = streamsize;                                              //the number of bytes in the binary file
char device_stream[32] = "\\\\.\\xillybus_qfpdstream";                                 //my xillybus device

if((fd_device = _open(device_stream, O_WRONLY | _O_BINARY)) < 0)
         printf("Couldn't open device stream.\n");                       
fd_stream = _open(filename, _O_RDONLY | _O_BINARY);
while(bytesleft > 0)
{
     if(bytesleft >= 500000)
     {
          bytesread = _read(fd_stream, buf, 500000);
          allwrite(fd_device, buf, bytesread);
          bytesleft = bytesleft - 500000;
     }
{


We change filename to suit our needs, and 90% of the time everything works correctly. However, it seems as if the FIRST time we load up a new file the _read operation is taking much longer than usual and the streaming is interrupted.

i.e., if I want the FPGA to output {file1, file1, file2, file2, file3, file3, file4, file4, file1, file1} then sometimes the first instance of fileX will take longer. Not always, but sometimes. The second, third, etc always seem to take the correct amount of time. If the same file is called again (i.e., file4, file4, file1}, the first file1 will take longer sometimes even though it was used earlier.

I don't know if it matters but we are using file sizes ranging from 0.1 Gb to 100 Gb (bits not bytes), and they all seem to do this.

Oh, buf is declared in the main function and passed to _read. I tried making it a static global, but that didn't do anything.

Is there something weird about _read that I'm missing, or is there some weird caching thing where once a file is loaded into RAM windows knows its there and doesn't load it again?

Thanks,

Michael

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by support »

Hello,

As for writing the data is a cyclic manner, I suggest taking a look on Xillybus' programming guide for Linux. Or just copy the allwrite() function in the streamwrite.c demo program, which gets the writing done correctly. Just while-loop giving allwrite() the entire buffer, and you're done.

The allwrite() function guarantees that all data is indeed written before it returns. A plain _write() call may return after less than what was required -- this is an acceptable result per POSIX standard.

If the entire file is stored in (physical) RAM, there will be no problem, as long as the stream is asynchronous, which the demo bundle's write_32 is. You may want to generate a custom IP with a larger DMA buffer. Just pick a buffer time of say, 500 ms, and that means that you'll have 500 ms worth of data in the DMA buffers almost all the time, which is usually far more than enough to handle those moments of CPU starvation by the OS. If you pick "data acquisition and playback" as the use of the stream, it will turn asynchronous.

Regarding closing files: When an host-to-FPGA file is closed, there's an attempt to flush all data in the stream towards the FPGA before the _close() call returns (or it gives up after 1000 ms). So it's one way to push data forward immediately (which isn't relevant in your case, I guess).

Besides, the *_open signal on the FPGA side will go low as the file descriptor closes, which is yet another possible reason to close a file explicitly.

This way or another, when a process quits, all its file descriptors are closed automatically, so calling _close() as the last thing before quitting a program is more of coding convention thing, as it makes no practical difference. Unless you run some memory leak test tool on your program, which may complain that memory resources were not returned when the program exited if files remained opened. Which in turn can cause people to look for the missing free() or undestroyed object, wasting their time in vain.

So close the file. ;)

Regards,
Eli

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by mwayne »

Ah, another question. :)

How does _close behave when closing a xilly_bus handle? Is it necessary / good programming practice to close a xillybus device after use?

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by mwayne »

Awesome, thank you. Things seem to be working well.

I (hope) my last question is now dealing with repeating the file. Is there any optimal way to interface with the core for having the FPGA output the data file in a cyclic fashion? I.e., write the same 3 Gb to the FPGA repeatedly?

Currently I'm just doing something like

bytesread = _read(file, buf, sizeof(buf))

while(1)
{
_write(//./xillybus_stream, buf, bytesread)
}

Am I likely to encounter problems at the end of one cycle and the beginning of another as the large file is read into xillybus, or will I be ok since the file contents is already stored in an array. Would writing in smaller chunks be better?

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by support »

Hello,
mwayne wrote:If I want to read this large file into the RAM space that xillybus has allocated for it.... do I just declare a large char [] array or whatever as normal, write the file into that, and then use the _write function to my //./xillybus_ device and , and xillybus handles everything else internally?

Yes, exactly. There's no need to get down to the technicalities.

But since you did:
(1) "Autoset internals" is indeed the preferred choice. It protects you against the issue you cited.
(2) The segment you found in the control panel (128 bytes long) is the memory mapped register segment (the PCI BAR range). This is where the driver writes to for setting the hardware registers. It has nothing to do with the DMA memory, and you had no chance writing to it, since the application has no access to this region (your program would get a kick in the bottom, exactly as with a null pointer).

In short, plain _write(). That's the whole story.

Regards,
Eli

Re: Xillybus to DDR3 RAM to GTX Transcievers

Post by mwayne »

Hello again, I took a break on this project for a while but have begun again and believe I am almost done.

To recap: The goal of the project was to make a 3 Gb/s function generator by reading a binary file, transmitting that file over PCIe to my ML605 board, and then transmitting that bit pattern out over the high-speed GTX transceivers. I have successfully set up the GTX transceivers and made my custom IP core, and when using the example 'streamwrite.c' file in the windowspack, I can see the ASCII values of the keys I press output over the GTX transceiver. This is great! Now I just need to have the input be the file.

Currently I am wanting to read in one 3 Gb file, and just have it repeat over and over at the output. You had previously suggested just dumping that file directly to RAM and then writing it to the FPGA.

When reading the host programming guide it says

Precautions should however be taken to avoid a shortage of kernel RAM. Xillybus’
IP Core Factory’s automatic memory allocation (“autoset internals”) algorithm is de-
signed not to consume more than 50% of the relevant memory pool, i.e. 512 MB,
based upon the assumption that a modern PC has more than 1 GB of RAM installed.
It’s probably safe to go as high as 75% as well, which can be done by setting the buffer sizes manually


So, when creating my custom IP core I chose the 'autoset internals' option, so the DMA buffer size is chosen automatically right? When I go into control panel and right click / properties on the Xillybus generic FPGA item, it says that my Memory Range is 0x0000 0000 F710 0000 to 0x0000 0000 F710 007F. Does that mean I only have 0x7F of DMA buffer space allocated? I was planning on just writing to that address in my C code... but this doesn't seem right.

If I want to read this large file into the RAM space that xillybus has allocated for it.... do I just declare a large char [] array or whatever as normal, write the file into that, and then use the _write function to my //./xillybus_ device and , and xillybus handles everything else internally?

Thank you!

Top