Connection between Artix 7 and Jetson Nano lost after a whil

Questions and discussions about the Xillybus IP core and drivers

Connection between Artix 7 and Jetson Nano lost after a whil

Postby Guest »

Hi,

I am trying to implement a data stream from a custom IP core to the Xillybus IP. Data is sampled from external inputs and then sent from an Artix-7 to a Jetson Nano TX2 via 1 lane.
Enumeration works fine, there are no error messages during synthesis or implementation other than the usual Xillybus ones. Timing is also met (although I didn't specify any input or output delays).
It always works for several kB or MB after reboot and configuration of the FPGA, but after a while the host doesn't receive any more data, the AxiReady pin in the FPGA goes low and I cannot send any more data through although my own IP core tries to sent data to Xillybus.
I have to reload the kernel module to be able to send data again, but always with the same result after varying amounts of data. One day it works for 50-60MB without problem, the next it always crashes after already 4-5 MB.

Frequently, the kernel module (pci_tegra) crashes and takes the whole system with it.
Also, it often happens that double words are just ignored and data is incomplete, especially after a while shortly before it denies any data at all.

The problem exists both on the demo and on IP factory module that I created.

What could be the source of this problem?
Guest
 

Re: Connection between Artix 7 and Jetson Nano lost after a

Postby support »

Hello,
Guest wrote:Frequently, the kernel module (pci_tegra) crashes and takes the whole system with it.

This is the smoking gun. This should of course never happen, and implies that the PCIe interface is unstable. The problem is somewhere between Xilinx' PCIe block and the Jetson's PCIe interface. The first questions that come to mind are whether the PCIe block's reference clock is generated by the Jetson? Is it the correct clock? Is it connected properly?

Also, how do you connect between the FPGA and the Jetson? A not-so-good cable will get you through enumeration, but then problems can occur randomly, as you mentioned.

Another thing is whether you've made changes to the PCIe block in order to adapt it to your project. If such a change is not done carefully, timing constraints may turn incorrect.

To summarize: Look for really low-level things. Cabling, clocking, signal integrity, PCIe block setup and timing constraints.

If you'd like to assess the health of the PCIe link before it crashes, this might come handy:

http://billauer.co.il/blog/2011/07/pcie ... yer-error/

Generally speaking, it would have been better to add a few related kernel log outputs to a question of this sort. But the fact that smoke came out from pci_tegra was good enough this time.

Regards,
Eli
support
 
Posts: 802
Joined:

Re: Connection between Artix 7 and Jetson Nano lost after a

Postby Guest »

Thanks for the fast answer.

We made changes to the PCIe block constraints, but only those described by you to change the used transceiver block, which worked without warnings.
The only warning we get is from the RefClk - we currently have to use the one on the original side.
We have not expected problems with that, however, since enumeration, and transmission of several Megabytes of data works well for some time.

Regarding timing constraints: We create a second clock from the clock generator wizard, which I added to the detach_clocks.xdc and seems to work without error messages, too.

I followed the text in your link, but all I can find out so far are that all the time error bit 0 is set, and after the data interface stops working bits 3 and 1 appear, too.
Guest
 

Re: Connection between Artix 7 and Jetson Nano lost after a

Postby support »

Hello,

It's not clear what you meant with the Refclk from the "original side". But if you mean that you're using a clock that is generated on the FPGA board instead of the clock that the Jetson generates, that a recipe for trouble of the sort that you describe. It works first, but then it doesn't.

PCIe is an excellent protocol, and it's very good at recovering from a whole range of problems. Therefore, a PCIe link often appears to work fine, when in fact there are a lot of problems that are hidden by the protocol.

As for my suggestions in the said link: None of the bits should be set. Bit 0 means Correctable Error. The fact that it's set means that something went wrong (usually errors in low-level packets), but the protocol managed to fix it. Sometimes this bit gets set because of early-stage issues. So I suggest clearing the bits as mentioned on the said web page, and see if bit 0 returns to 1. If it does, there's something that needs fixing.

As for the other two bits that became '1' in correlation with a visible problem, that gives the final confirmation that it's a low-level link problem. It's pointless to analyze why exactly these bits were asserted.

As for the changes you made in the timing constraints: I can't comment on those, because you didn't explain exactly what you did and why you did it. However, there shouldn't be a need to add a clock in the timing constraints, so the fact that you did that may indicate that you're hiding a different problem.

And a more general comment: This is FPGA design, not C programming. The goal is not to silence errors or warnings. Unlike programming, it's very much possible to obtain a completely malfunctioning design with the tool's blessing. The FPGA tools assume that you know what you're doing. Errors and warnings are a nice extra, but if you respond to them by going for the first solution that silences them, odds are that you didn't fix the problem at all.

So be sure that you understand exactly every change you did with the timing constraints. In fact, I would suggest not making any changes to the demo bundle, if possible. There's no problem if you connect only the first PCIe lane to the host, and the others remain physically disconnected. The PCIe protocol should automatically negotiate down the link to x1.

Regards,
Eli
support
 
Posts: 802
Joined:


Return to Xillybus