Xillybus-Lite or Xillybus for controlling an accelerator

Questions and discussions about Xillinux

Xillybus-Lite or Xillybus for controlling an accelerator

Postby Guest »

Hello,

in a project that i am working i have an accelerator that does corner detection on an image. The accelerator is controlled by a set of 1bit signals.
Right now i have grouped these signals in 2 registers(control_reg, response_reg) and access them for read and write through xillybus-lite.
The accelerator gets the actual data from an asynchronous 8bit xillybus stream and sends the results to an asynchronous 32bit xillybus stream.
The thing is that i have second thoughts regarding the xillybus-lite for controlling the accelerator because i need the communication latency to be as low as possible.
So i am thinking to stop using xillybus-lite and add another stream for control. Any thought or suggestion as to what is better or how to improve utilization of xillybus-lite
in my program would be much appreciated.

The image is partitioned in 12 segments and the processing now is done based on the following flow :
    1. check accelerator ready response(read response_reg)
    2. if (ready) send new computation command(write control_reg) else goto 1
    3. check request data response(read response_reg)
    4. if (request data) send segment data else goto 3
    5. check results ready response(read response_reg)
    6. if (results ready) send give results command(write control_reg) else goto 5
    7. check segment done response(read response_reg)
    8. check image done response(read response_reg)
    9. if (segment done and image done) goto 1 and repeat for next image elseif (segment done ) goto 3 and repeat for next segment else goto 7

Thanks.
Guest
 

Re: Xillybus-Lite or Xillybus for controlling an accelerator

Postby support »

Hello,

Neither Xillybus nor Xillybus Lite were really designed for low latency, but rather for throughput. That said, there have been some physics researchers who reported latencies of 50us or so in a system that controlled some quantum physics thing. I would expect Xillybus Lite to go lower than that, but either way, if you're into a real-time system (it doesn't seem so, but still) the real challenge is the latencies imposed by Linux.

Anyhow, it looks like you're into a coprocessing application, so you want as much data as possible processed. For that, it's not all that good to loop on each chunk with the software, and even less to micromanage the flow that way in software. The best way to do this is to have a thread (or process) in software pushing data for processing into one stream continuously, and let another software thread collect the results by reading from another stream. Just throw all you have to throw on one side, and collect on the other. Think mass-production in a modern factory.

On the PL (FPGA) side, the logic fetches data at its own pace from one FIFO, and pushes data into another FIFO, again, in its own pace. Then let Xillybus' flow control handle the rest: There is no need to wait for the hardware to be ready. Let the logic fetch the data from its FIFO when it's ready. Neither is there a need to wait for reading back processed data. Just let the logic write data to the other FIFO (respecting its full signal, of course) and let the software read as much it can. The read() call will block (i.e. sleep) until there's data to read.

Even if you're not very convenient with multi-threading or running two processes (it's down to a simple fork() call, actually, or just running two separate programs) it's highly recommended to go this way. Quite often people try to make some kind of loop that writes a chunk of data and then reads, and those things tend to get horribly wrong (or slow) because of unnecessary dependencies between the two functions.

Regards,
Eli
support
 
Posts: 802
Joined:

Re: Xillybus-Lite or Xillybus for controlling an accelerator

Postby Guest »

Hi Eli,

thanks for the info. The accelerator was given to me as is, i didn't create it, so i must use it just like Xillybus IP Core. What i mean is that they told me this is its interface, this is how it operates, this is how it expects the data, your job is to integrate it with Xillybus and do some timing testing.

Right now after some changes in software(no multi-threading yet) the processing time is acceptable based on previous measurements from another project that used the accelerator and if i manage to save 3-4 millisec it will be great.

Again thanks for the info and recommendations.

Regards,
John
Guest
 


Return to Xillinux (Linux distribution for Zynq-7000)