Several questions on continuous streaming data

Questions and discussions about the Xillybus IP core and drivers

Several questions on continuous streaming data

Postby junslee »

Several questions came to my mind, which is as follows.

---------------------------------------------
Backgrounds for my circumstances
: I use,
- Xilinx KC705 FPGA dev. kit,
which has 2ch 14-bit 250 MHz ADC,
- Upstream(FPGA->Host PC)
- Intel i7-4790K
- DDR3 RAM 32 GB (Dual Channel)
- Samsung 960 pro SSD (Expected)
---------------------------------------------

1. Currently, my data rate is 2*14(bit)*250(MHz)=7 Gbps=834 MB/s.
I think ADC should be 10 or 12-bit accuracy since Xillybus Rev.A maximum rate is upto 800 MB/s.
Although, it would be nearly 600 MB/s, which is too fast to save onto storage.
The thing more worse than that is even non-binary choice(10,12-bit) is not fit into
xillybus input data bus width, so I could only make it 8 or 16-bit to fit in.
(8-bit is too low accurate, 16-bit case has too much data)
So now I fill extra 2 bits with 0's making it 16-bit(14bit data+00).
Is there any solution to load 14-bit on 16-bit bus without bandwidth loss like serial-communication packet?

2. If the answer is "No" to Q#1, I would like to use 16-bit bus.
My data is continuous stream and I don't have real-time processing program yet and don't know how to.
Hence, I need to store raw data essentially 1 minute and upto 10 min. if possible.
But I read the opinions on documentations that it must be tough due to limitations on Host PC.
I searched some storage devices and I found recent PCIe NVMe 3.0 SSD has writing speed about 2100 MB/s.
Do you think it could cover 1 GB/s (16-bit 2ch 250 MHz stream) continuous stream for few minutes?
How about your opinion?


3. I read a tutorial about speed test, it said "Don't load the storage."
Currently, I test normal SSD with 450 MB/s writing speed,
It seems like around 200 MB/s with dd command.
(dd if=\\.\xillbyus_read_32 of=ssd storage\first bs=1M count=1000)
Here, Does buffer size mean user-application memory(buffer) size in the host application manual?
Does it affect a lot on speed? I don't understand why it is so slow considering few counts.
(Should Windows be installed on primary SSD? CPU Usage is not much and it is i7 octa core)


4. For Windows user, where can I find the device file (ex:\\.\xillybus_read_32).
I couldn't find such file on any drives..


5. I read few papers on FPGA-PCIe-GPU processing, which is extremely fast since it is not via Host CPU cache.
I doubt if you knew any of this kind of processing or xillybus could approach this manner.


Thank you for reading such a long stories.
junslee
 
Posts: 1
Joined:

Re: Several questions on continuous streaming data

Postby support »

Hello,

1. The common practice is indeed to fill those extra bits with zeros and not care about the wasted bandwidth. You may of course write some piece of logic that packs the data into say, 32 bit words, to fully utilize the bandwidth. This wouldn't make sense to save PCIe bandwidth, as Xillybus covers the requirement anyhow with rev. B. But for the sake of utilizing the hard disk better, this may be a good idea. This packing is not recommended in real-time processing, as it wastes CPU cycles on unpacking.

2. You can't use a 16-bit bus, because the interface between the FIFO and Xillybus' IP core runs at 250 MHz, so the theoretical limit stands at 500 MB/s. And 16-bit interfaces are no good for high-bandwidth for other reasons as well (on revision A IP cores, actually. This is no issue on revision B/XL cores).

I can't comment on storage devices. However I warmly suggest to employ a mechanism for detecting an overflow on the FPGA's FIFO during acquisition. Please refer to section 4 in Xillybus FPGA designer’s guide. You may not want to stop the stream with an EOF, but you need a way to tell if an overflow occurred. In other words, if the data you acquired is OK.

3. The buffer size question isn't clear. This way or another, the best performance is achieved with bs=128k or so, probably due to tradeoffs between OS overhead and the size of processor's RAM cache. But there's not a dramatic change as the buffer size changes. Anyhow, if you want to test the SSD speed alone, I suggest taking data from /dev/zero instead.

4. The Xillybus device files are not represented as files in Windows, but rather as Windows Objects. You may use this tool to find the objects under \\.\ (look under GLOBAL??): https://technet.microsoft.com/en-us/sys ... inobj.aspx

5. I can indeed not comment much on direct FPGA to GPU, which is an order of magnitude more difficult to implement. Your required bandwidth is relatively low, so you could simply read the data from Xillybus into buffers, which you'd submit to the GPU processor. The CPU consumption won't be significant, and you'll be using the "classic" GPU API, saving yourself a major headache (GPU is a headache either way).

Regards,
Eli
support
 
Posts: 802
Joined:


Return to Xillybus