Explore more on xillybus about mem_write

Questions and discussions about the Xillybus IP core and drivers

Explore more on xillybus about mem_write

Postby Guest »

Hi Eli,

So far everything is going fantastic with Xillybus. I want to explore more on the memory read and write now for downloading some matrix into the memory for matrix multiply on the data stream.

However, it seems only being able to write 32 *8 bits into the memory. Is there a way to make the memory bigger, for example 1024*32 bits?

Best,
Chongxi
Guest
 

Re: Explore more on xillybus about mem_write

Postby support »

Hello,

Yes, you may go to the IP Core Factory at the site, and generate a custom IP core with a seekable stream which is capable of 32 bit words as well as more address bits.

Please refer to section 6.1 ("Seekable Streams") in the Host Application Programming Guide for your operating system, from the site's Documentation section. It describes the specific points related to 32 bit seekable streams.

I should mention that seekable streams aren't effective for number-crunching applications, as the CPU is heavily involved in setting the address and reading/writing each piece of data synchronously. But for a starter, it's fine.

Regards,
Eli
support
 
Posts: 802
Joined:

Re: Explore more on xillybus about mem_write

Postby Guest »

Hi, eli,

Yeah, that concerns me a bit. I would try the seekable stream first.

The application is that the host received the filtered data and generate some matrices according to what it receives and would download the matrices into the FPGA to multiply with those filtered data stream.

What is the efficient way in Xillybus framework to do this based on your opinion?

Best,
Chongxi
Guest
 

Re: Explore more on xillybus about mem_write

Postby support »

Hello,

I would suggest taking a look at section 6.6 ("Coprocessing / Hardware acceleration") of the Xillybus host application programming guide for your operating system. It should give you a direction.

But to put it short: You'll need to organize the software, so there's a software thread that pushes large chunks of data for processing towards the FPGA, and another thread that collects results, also in large chunks. Sometimes it even makes sense to do this as separate programs (they access different device files anyhow).

The point is that unlike what we usually believe, even a modern CPU is relatively slow, so if it gets too involved in each and every little task, it becomes the bottleneck. Think in terms of an industrial mass production factory. The term "pipeline" is used a lot in FPGA design for a good reason.

Regards,
Eli
support
 
Posts: 802
Joined:


Return to Xillybus