by support »
Hello,
I would suggest taking a look at section 6.6 ("Coprocessing / Hardware acceleration") of the Xillybus host application programming guide for your operating system. It should give you a direction.
But to put it short: You'll need to organize the software, so there's a software thread that pushes large chunks of data for processing towards the FPGA, and another thread that collects results, also in large chunks. Sometimes it even makes sense to do this as separate programs (they access different device files anyhow).
The point is that unlike what we usually believe, even a modern CPU is relatively slow, so if it gets too involved in each and every little task, it becomes the bottleneck. Think in terms of an industrial mass production factory. The term "pipeline" is used a lot in FPGA design for a good reason.
Regards,
Eli