by support »
Hello,
Neither Xillybus nor Xillybus Lite were really designed for low latency, but rather for throughput. That said, there have been some physics researchers who reported latencies of 50us or so in a system that controlled some quantum physics thing. I would expect Xillybus Lite to go lower than that, but either way, if you're into a real-time system (it doesn't seem so, but still) the real challenge is the latencies imposed by Linux.
Anyhow, it looks like you're into a coprocessing application, so you want as much data as possible processed. For that, it's not all that good to loop on each chunk with the software, and even less to micromanage the flow that way in software. The best way to do this is to have a thread (or process) in software pushing data for processing into one stream continuously, and let another software thread collect the results by reading from another stream. Just throw all you have to throw on one side, and collect on the other. Think mass-production in a modern factory.
On the PL (FPGA) side, the logic fetches data at its own pace from one FIFO, and pushes data into another FIFO, again, in its own pace. Then let Xillybus' flow control handle the rest: There is no need to wait for the hardware to be ready. Let the logic fetch the data from its FIFO when it's ready. Neither is there a need to wait for reading back processed data. Just let the logic write data to the other FIFO (respecting its full signal, of course) and let the software read as much it can. The read() call will block (i.e. sleep) until there's data to read.
Even if you're not very convenient with multi-threading or running two processes (it's down to a simple fork() call, actually, or just running two separate programs) it's highly recommended to go this way. Quite often people try to make some kind of loop that writes a chunk of data and then reads, and those things tend to get horribly wrong (or slow) because of unnecessary dependencies between the two functions.
Regards,
Eli
Hello,
Neither Xillybus nor Xillybus Lite were really designed for low latency, but rather for throughput. That said, there have been some physics researchers who reported latencies of 50us or so in a system that controlled some quantum physics thing. I would expect Xillybus Lite to go lower than that, but either way, if you're into a real-time system (it doesn't seem so, but still) the real challenge is the latencies imposed by Linux.
Anyhow, it looks like you're into a coprocessing application, so you want as much data as possible processed. For that, it's not all that good to loop on each chunk with the software, and even less to micromanage the flow that way in software. The best way to do this is to have a thread (or process) in software pushing data for processing into one stream continuously, and let another software thread collect the results by reading from another stream. Just throw all you have to throw on one side, and collect on the other. Think mass-production in a modern factory.
On the PL (FPGA) side, the logic fetches data at its own pace from one FIFO, and pushes data into another FIFO, again, in its own pace. Then let Xillybus' flow control handle the rest: There is no need to wait for the hardware to be ready. Let the logic fetch the data from its FIFO when it's ready. Neither is there a need to wait for reading back processed data. Just let the logic write data to the other FIFO (respecting its full signal, of course) and let the software read as much it can. The read() call will block (i.e. sleep) until there's data to read.
Even if you're not very convenient with multi-threading or running two processes (it's down to a simple fork() call, actually, or just running two separate programs) it's highly recommended to go this way. Quite often people try to make some kind of loop that writes a chunk of data and then reads, and those things tend to get horribly wrong (or slow) because of unnecessary dependencies between the two functions.
Regards,
Eli