Xillybus to DDR3 RAM to GTX Transcievers

Questions and discussions about the Xillybus IP core and drivers

Re: Xillybus to DDR3 RAM to GTX Transcievers

Postby support » Mon Jan 23, 2017 6:15 pm

Hello,

As for writing the data is a cyclic manner, I suggest taking a look on Xillybus' programming guide for Linux. Or just copy the allwrite() function in the streamwrite.c demo program, which gets the writing done correctly. Just while-loop giving allwrite() the entire buffer, and you're done.

The allwrite() function guarantees that all data is indeed written before it returns. A plain _write() call may return after less than what was required -- this is an acceptable result per POSIX standard.

If the entire file is stored in (physical) RAM, there will be no problem, as long as the stream is asynchronous, which the demo bundle's write_32 is. You may want to generate a custom IP with a larger DMA buffer. Just pick a buffer time of say, 500 ms, and that means that you'll have 500 ms worth of data in the DMA buffers almost all the time, which is usually far more than enough to handle those moments of CPU starvation by the OS. If you pick "data acquisition and playback" as the use of the stream, it will turn asynchronous.

Regarding closing files: When an host-to-FPGA file is closed, there's an attempt to flush all data in the stream towards the FPGA before the _close() call returns (or it gives up after 1000 ms). So it's one way to push data forward immediately (which isn't relevant in your case, I guess).

Besides, the *_open signal on the FPGA side will go low as the file descriptor closes, which is yet another possible reason to close a file explicitly.

This way or another, when a process quits, all its file descriptors are closed automatically, so calling _close() as the last thing before quitting a program is more of coding convention thing, as it makes no practical difference. Unless you run some memory leak test tool on your program, which may complain that memory resources were not returned when the program exited if files remained opened. Which in turn can cause people to look for the missing free() or undestroyed object, wasting their time in vain.

So close the file. ;)

Regards,
Eli
support
 
Posts: 623
Joined: Tue Apr 24, 2012 3:46 pm

Re: Xillybus to DDR3 RAM to GTX Transcievers

Postby mwayne » Wed Sep 12, 2018 4:32 pm

Hi,

So, this worked. I was able to make an arbitrary waveform generator that reads from a binary file, writes to an FPGA over PCIe, and outputs the binary bit pattern over the high speed serial transceivers. We are trying to push the limits of what it can do, and have run into a weird problem.

I have something like this in my code

Code: Select all
bytesleft = streamsize;                                              //the number of bytes in the binary file
char device_stream[32] = "\\\\.\\xillybus_qfpdstream";                                 //my xillybus device

if((fd_device = _open(device_stream, O_WRONLY | _O_BINARY)) < 0)
         printf("Couldn't open device stream.\n");                       
fd_stream = _open(filename, _O_RDONLY | _O_BINARY);
while(bytesleft > 0)
{
     if(bytesleft >= 500000)
     {
          bytesread = _read(fd_stream, buf, 500000);
          allwrite(fd_device, buf, bytesread);
          bytesleft = bytesleft - 500000;
     }
{


We change filename to suit our needs, and 90% of the time everything works correctly. However, it seems as if the FIRST time we load up a new file the _read operation is taking much longer than usual and the streaming is interrupted.

i.e., if I want the FPGA to output {file1, file1, file2, file2, file3, file3, file4, file4, file1, file1} then sometimes the first instance of fileX will take longer. Not always, but sometimes. The second, third, etc always seem to take the correct amount of time. If the same file is called again (i.e., file4, file4, file1}, the first file1 will take longer sometimes even though it was used earlier.

I don't know if it matters but we are using file sizes ranging from 0.1 Gb to 100 Gb (bits not bytes), and they all seem to do this.

Oh, buf is declared in the main function and passed to _read. I tried making it a static global, but that didn't do anything.

Is there something weird about _read that I'm missing, or is there some weird caching thing where once a file is loaded into RAM windows knows its there and doesn't load it again?

Thanks,

Michael
mwayne
 
Posts: 8
Joined: Tue Nov 01, 2016 3:51 pm

Re: Xillybus to DDR3 RAM to GTX Transcievers

Postby support » Wed Sep 12, 2018 5:43 pm

Hello,

Please refer to this page: http://xillybus.com/doc/bandwidth-guidelines

In particular item 2, saying "Don't involve the disk or other storage". The problem is that your disk is too slow to keep up with the data rate you want.

The reason it works anyhow most of the time is that any modern operating system has a disk cache in RAM, so most of the time you actually read the data from RAM. Solution: In your program, read the data into a RAM buffer before starting, and write to the device file from the buffer.

Regards,
Eli
support
 
Posts: 623
Joined: Tue Apr 24, 2012 3:46 pm

Re: Xillybus to DDR3 RAM to GTX Transcievers

Postby mwayne » Wed Sep 12, 2018 8:37 pm

Ok, thank you.

I downloaded a program (IMDisk) that mounts your RAM as a hard drive. Then I just moved the files from C:\ (my SSD) to the new virtual drive R:\, and this seemed to fix the problem. This lets me preload my bit files to output .... except now I'm of course limited by the amount of RAM I have.

Would fiddling with the size of the DMA buffer in the core or the size of my buffer in the C code (current 500000 bytes) improve performance? Aren't burst operations faster than a lot of smaller ones?
mwayne
 
Posts: 8
Joined: Tue Nov 01, 2016 3:51 pm

Re: Xillybus to DDR3 RAM to GTX Transcievers

Postby support » Thu Sep 13, 2018 6:15 am

Hello,

Reading and writing large chunks saves the overhead related to system calls of the operating system, so yes, there's some increased efficiency there. On the other hand, there's also a negative effect with large buffers, probably related to the CPU cache: If the chunk is small enough to be kept in the CPU's cache until it's written, it's faster. My own experience is that 64kB is the optimal point, but YMMV.

Taking a second glance on your C code, I don't think it's written well. At the very least, I would expect bytesleft to be subtracted with bytesread, and not a constant. The whole point is that bytesread might be smaller than the required length. And then the whole thing with bytesleft >= 500000 doesn't seem to make sense. It look like this code is working just because all _read() calls always return with bytesread == 500000, which is a common situation when reading from plain files, but is not guaranteed.

Regards,
Eli
support
 
Posts: 623
Joined: Tue Apr 24, 2012 3:46 pm

Re: Xillybus to DDR3 RAM to GTX Transcievers

Postby mwayne » Thu Sep 13, 2018 2:54 pm

Hi,

I only posted a section of the code. There is another part that handles if the amount left is less than 500000.
If it is, it only writes that much, not always the 500000.
mwayne
 
Posts: 8
Joined: Tue Nov 01, 2016 3:51 pm

Previous

Return to Xillybus

cron