by support »
Hello,
1. The common practice is indeed to fill those extra bits with zeros and not care about the wasted bandwidth. You may of course write some piece of logic that packs the data into say, 32 bit words, to fully utilize the bandwidth. This wouldn't make sense to save PCIe bandwidth, as Xillybus covers the requirement anyhow with rev. B. But for the sake of utilizing the hard disk better, this may be a good idea. This packing is not recommended in real-time processing, as it wastes CPU cycles on unpacking.
2. You can't use a 16-bit bus, because the interface between the FIFO and Xillybus' IP core runs at 250 MHz, so the theoretical limit stands at 500 MB/s. And 16-bit interfaces are no good for high-bandwidth for other reasons as well (on revision A IP cores, actually. This is no issue on revision B/XL cores).
I can't comment on storage devices. However I warmly suggest to employ a mechanism for detecting an overflow on the FPGA's FIFO during acquisition. Please refer to section 4 in Xillybus FPGA designer’s guide. You may not want to stop the stream with an EOF, but you need a way to tell if an overflow occurred. In other words, if the data you acquired is OK.
3. The buffer size question isn't clear. This way or another, the best performance is achieved with bs=128k or so, probably due to tradeoffs between OS overhead and the size of processor's RAM cache. But there's not a dramatic change as the buffer size changes. Anyhow, if you want to test the SSD speed alone, I suggest taking data from /dev/zero instead.
4. The Xillybus device files are not represented as files in Windows, but rather as Windows Objects. You may use this tool to find the objects under \\.\ (look under GLOBAL??):
https://technet.microsoft.com/en-us/sys ... inobj.aspx5. I can indeed not comment much on direct FPGA to GPU, which is an order of magnitude more difficult to implement. Your required bandwidth is relatively low, so you could simply read the data from Xillybus into buffers, which you'd submit to the GPU processor. The CPU consumption won't be significant, and you'll be using the "classic" GPU API, saving yourself a major headache (GPU is a headache either way).
Regards,
Eli
Hello,
1. The common practice is indeed to fill those extra bits with zeros and not care about the wasted bandwidth. You may of course write some piece of logic that packs the data into say, 32 bit words, to fully utilize the bandwidth. This wouldn't make sense to save PCIe bandwidth, as Xillybus covers the requirement anyhow with rev. B. But for the sake of utilizing the hard disk better, this may be a good idea. This packing is not recommended in real-time processing, as it wastes CPU cycles on unpacking.
2. You can't use a 16-bit bus, because the interface between the FIFO and Xillybus' IP core runs at 250 MHz, so the theoretical limit stands at 500 MB/s. And 16-bit interfaces are no good for high-bandwidth for other reasons as well (on revision A IP cores, actually. This is no issue on revision B/XL cores).
I can't comment on storage devices. However I warmly suggest to employ a mechanism for detecting an overflow on the FPGA's FIFO during acquisition. Please refer to section 4 in Xillybus FPGA designer’s guide. You may not want to stop the stream with an EOF, but you need a way to tell if an overflow occurred. In other words, if the data you acquired is OK.
3. The buffer size question isn't clear. This way or another, the best performance is achieved with bs=128k or so, probably due to tradeoffs between OS overhead and the size of processor's RAM cache. But there's not a dramatic change as the buffer size changes. Anyhow, if you want to test the SSD speed alone, I suggest taking data from /dev/zero instead.
4. The Xillybus device files are not represented as files in Windows, but rather as Windows Objects. You may use this tool to find the objects under \\.\ (look under GLOBAL??): https://technet.microsoft.com/en-us/sysinternals/winobj.aspx
5. I can indeed not comment much on direct FPGA to GPU, which is an order of magnitude more difficult to implement. Your required bandwidth is relatively low, so you could simply read the data from Xillybus into buffers, which you'd submit to the GPU processor. The CPU consumption won't be significant, and you'll be using the "classic" GPU API, saving yourself a major headache (GPU is a headache either way).
Regards,
Eli