Page 1 of 2

Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by Guest
Eli,

I have a specific question after a bunch of setup.

We have two custom boards. One board works pretty good, except that it generally takes two cold starts to get our board to show up on linux lspci. That is, after the first cold start, the board doesn't show on lspci, then after a second cold start, it does. After a third cold start, it often doesn't show up again; then after a fourth cold start, it does show up on lspci. Note this is a PCIe/104 stack and "cold start" means we power cycle the host computer and FPGA at the same time. Note that we're pretty sure we've eliminated FPGA config speed vs SBC (BIOS) power up.

The second custom board, presumably identical to the first, won't get recognized by lspci no matter what we do.

Either way, we've tested numerous things. We did find something concerning in the xillybus_fpga_api.pdf on page 16, section 4.4 Capture control. It says "The capture_en signal works as a write enable signal for the captured data. There are two situations in which capturing should not take place".

Well, we are putting data into a fifo going to the xillybus_ins, and we're doing that right away.

QUESTION: I don't believe this should be the case, but is it possible that there's ANYTHING we might be doing on our end of the fifos that go into xillybus_ins, that could possibly cause the x4 link to not come up and therefore the board to not show up on lspci. I don't think there should be, but because of the sentence quoted above, I'm a little concerned.

Thanks very much,
Helmut

P.S. We meet timing but have been working on confirming and even adding constraints. Improvements but no 100% success yet. We're not sure what other avenues to test...

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by support
Hello,

The Xillybus IP core is in a quiescent state until explicitly woken up by the dedicated host driver (and brought back to that state when the driver exits at powerdown / restart). When in quiescent state, the core does absolutely nothing on the PCIe interface, and consequently whatever happens on its application interface side has no significance whatsoever.

So no, whatever craziness might be going on with the application logic, this can't possibly have any effect on the odds to get the PCIe link recognized.

What you describe is a serious malfunction. Assuming that the FPGA is configured fast enough (I usually suggest keeping the host's reset pin asserted as long as the FPGA pins are high-Z, and deassert it by the FPGA), you should look for a big problem.

For example, if you haven't done that yet, please verify in Vivado's pin report that the PCIe reference clock is placed on the pins that you connect the reference clock to on the board. Because if they're not, odds are that the clock is still picked up by the two floating FPGA pins by virtue of crosstalk, and somehow it works. Or works sometimes, exactly like you described.

Even if you've already checked this, this example illustrates how bold your problem needs to be to cause that kind of unstable behavior. The should be no need to add constraints or make any fine manipulations. PCIe and Xillybus are rock stable out of the box, unless something really bad is going on.

Regards,
Eli

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by Guest
Eli,

Thanks for the advice.

I don't think there should be a big problem, because when the link does come up, everything is very solid. Nothing has been changed below the xillybus_ins level.

It therefore sounds like our real problem may simply be the relative timing of FPGA configuration verses host boot. Unfortunately, the FPGA board is already done, and I don't think it has the ability to hold the host in reset.

Please double check my logic. *IFF* I'm correct about no big problem, then am I correct that it's just relative config/boot timing?

Thanks again,
Helmut

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by support
Hello,

With "big problem" I didn't means one that is hard to fix. I meant that it isn't a subtle issue. And the FPGA not being up early enough isn't a subtle issue.

So yes, the FPGA being configured late could explain this as well. Holding the host in reset at boot, possibly manually, could help checking this hypothesis.

But there's usually a way to make some kind of cable between the board (some spare pins on a connector?) and the host to get this arranged.

Regards,
Eli

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by Guest
Eli,

Oh, wow, an answer already. I asked around, I forget, 3am my EDT.

Anyway, I'd like to pursue temporary holding of host reset or permanent modification of the FPGA board. I didn't design the FPGA board, or the third party host board, but I do have decades of EE experience. I just don't have specific knowledge about the resets of which we speak...

QUESTION ONE: In order to hold the host in reset, can that be done THROUGH the PCIe/104 connector? If so, would that be akin to the KC705 signal PCIE_PERST? Or is there a different signal to use while going through the PCIe/104 connector? (In the PCI104_Express_v3_0 spec I find only this same signal, called PERST#. I don't find any other reset. On the KC705 schematic, while there is a bidirectional buffer between PCIE_PERST and the FPGA, it's forced in the to-FPGA direction by wiring the 2DIR pin to ground. So I don't think the KC705 is designed to be able to hold the host in reset via this signal.)

QUESTION TWO: If bypassing the PCIe/104 connector with a separate wire, we do have debug outputs and test points available on the FPGA board. I could search for a place directly on the host to connect a wire, but I'll first ask if you have a recommendation...

QUESTION THREE: Aside from holding the host in reset, I could just work on configuring the FPGA ever faster. To date it's been using SPIx1 and either default clock (unknown, is it 3MHz?) or 3MHz. Faster clocks have been tested sporadically and not consistently enough. The circuit also works at SPIx2. DO YOU BELIEVE that pursuing this route might provide temporary relief, until perhaps a test mod and subsequent board respin could be done?

Thanks again,
Helmut

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by support
Hello,

Speeding up the SPI configuration clock is definitely a good idea. Doubling the width as well. In particular, if you see a significant improvement, you know you're on to something.

Holding reset: What I meant has nothing to do with the PCIE_PERST signal, and I don't think that signal goes through the PCIe 104 connector. I'm talking about the host's master reset signal, which is usually connected to the reset button on a regular PC. I would guess that it's available on some pin header on the PC board.

Just pull the relevant pin high or low, whatever makes the host stay in reset, and connect that to any I/O pin on the FPGA. If you pulled it high, tie that pin to '0' in the FPGA design. As a result, the pin will be high until the FPGA is configured, and then constant low. This simple trick ensures that the host is reset until the FPGA is alive and kicking. Caveat: You can start the host without the FPGA.

Regards,
Eli

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by Guest
Eli,

I'm going down the road of speeding up the config clock. At 18min bitstream build times, it's a bit slow. But things are getting better. (There's no easily hand-accessible reset on the host at this time.)

Just now, I had 5 consecutive successes, 1 failure, then 2 more successes (testing until which time the next bitstream build completed). I have an ILA showing ltssm_state, inside the PCIe logic. All the successes ended in state 0x16, which is "L0" and good. The failure ended in state 0x08, which is "Polling Compliance, Send Pattern".

My prior research indicated that one should never get into ANY of the Polling Compliance states. Those are for running test equipment. So I'm curious as to why I'm getting into this state. Is it possible that the host bios or host linux is seeing a bad link and putting it through Polling Compliance? Is the Xilinx PCIe IP doing this?

Also, is this failure consistent with race timing suspecting between FPGA config and host boot?

TEXT FROM XILINX POST JUST NOW. Please note "while writing this..."

Also, is this failure consistent with race timing suspecting between FPGA config and host boot? That is, I've been concerned about whether or not the FPGA config is fast enough relative to the host boot. If the FPGA config is too slow, it won't be ready for when the host tries to enumerate the PCIe. This will surely lead to the device not showing up on lspci. But will that somehow then leave the FPGA PCIe ltssm_state in state 0x08? Note that I have reason to believe that ltssm_state remains 0x00 until it's probed somehow. I think I saw this by reprogramming the FPGA long after things were running. So if the FPGA is slow to config and misses the host's PCIe enumeration, I would think the ltssm_state would remain 0x00. That's why I'm confused by seeing 0x08 and wondering if this particular failure mode is NOT about the relative speed of FPGA config and host boot.

While writing this, I've been testing a much faster FPGA config clock speed, and still getting intermittent failures on boot. This again speaks *against* this being a relative speed issue.

Thanks many times over,
Helmut

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by support
Hello,

The Compliance state is invoked when during the initial link setup, at least one Lane’s Receiver, which detected a Receiver, has never detected an exit from Electrical Idle since entering Polling.Active. In other words, at least one of the lanes from the host were completely inactive, indicating that the host didn't attempt training while the FPGA did. This state is intended for tests with inexpensive test equipment, which doesn't behave like a host, and therefore doesn't respond with training sequences: When the peripheral doesn't get any response from its training sequence at all, it concludes it's facing a BER meter basically.

Unfortunately, this info doesn't shed much light on why this happens. It may very well be that the BIOS does some extra attempts to pick up the PCIe link if it fails on the first go, which sometimes works and sometimes doesn't. If this is the underlying reason for the random behavior, getting slightly better odds for a link bringup doesn't mean you're doing better and vice versa.

If you can't reset the host manually, maybe look on the PERST signal with a scope, vs a signal from the FPGA indicating when it's configured. What's the timing relations between them. Often there are several PERST pulses on a bootup. Does an observation of this sort add information?

Regards,
Eli

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by Guest
Eli,

I'm coming back for more...! Is Xillybus Tandem PROM Configuration compatible? From what I read in PG054 beginning on page 154, the logic directly connecting to the Xilinx PCIe endpoint IP needs to be compatible. Right now, that's the xillybus.v wrapper. I do realize that it's open (not encrypted) code that I could modify if necessary. It would be better, however, for us to leave all that stock. So, bottom line, I think it's appropriate to ask you the compatibility question.

Thanks again, many times over.
Helmut

Re: Am I preventing Xillybus/PCIe from establishing a link?

PostPosted:
by support
Hello,

I have no personal experience with using Xillybus on a TANDEM enabled system, however I see no reason why it wouldn't work: As I've mentioned earlier, Xillybus is quiescent until the host driver tells it to wake up, so it won't try anything until far after the transaction from phase I to II.

But as the devil is in the details, I can unfortunately not give a definitive answer this way or another.

Regards,
Eli