Can a CPU freeze due to congestion or unresponsive device?

Post a reply

Confirmation code
Enter the code exactly as it appears. All letters are case insensitive.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:
BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON
Topic review
   

Expand view Topic review: Can a CPU freeze due to congestion or unresponsive device?

Re: Can a CPU freeze due to congestion or unresponsive devic

Post by support »

Hello,

Truth to be told, I don't know the answer to neither of those questions, because I abandoned memory reads very early in the development process. Exactly because of this thing with CPU freezes.

But I've seen situations where PCIe devices failed to respond to lspci's configuration queries, and I think the response was all 1's. I'm not sure if I remember correctly or if this matters at all.

Regards,
Eli

Re: Can a CPU freeze due to congestion or unresponsive devic

Post by Guest »

Thanks for the response Eli.

In the case of a read timeout. what value does the CPU get? how does the CPU know whether this is an intended value versus an error code

Re: Can a CPU freeze due to congestion or unresponsive devic

Post by support »

Hello,

Let's take them one by one, because these are difference scenarios.

According to the PCIe spec's section 2.8, if a PCIe device doesn't answer a memory read request, a timeout mechanism should kick in. The time of this timeout is application-specific, and should be between 50us and 50ms (but it's recommended to make this timeout no less than 10 ms).

So in theory, a CPU should recover gracefully from such a situation. In practice, I've seen a CPU getting stuck exactly in this scenario. This is why Xillybus' IP core never makes any memory reads from the device.

As for memory writes: If the device can't handle write requests, their flow will be stalled at some point, thanks to PCIe's flow control. I'm not sure if there's a timeout mechanism to prevent holding the flow control stalled indefinitely.

But what I can say from a practical point of view, is that I've never seen nor heard about any computer getting stuck because people reconfigured the FPGA while Xillybus' driver attempted to write to the FPGA's PCIe interface. So sending packets to the void is harmless. What happens if you deliberately prevent write requests from being delivered, I don't know. It's a rather bizarre scenario, though.

Regards,
Eli

Can a CPU freeze due to congestion or unresponsive device?

Post by Guest »

hello xillybus,

in the case of a mmio read:
the cpu simply has an instruction of loading to register from a memory location (indistinguishable from loading from a host DRAM location). Is the CPU allowed to take as many cycles as it needs to "mov rax, [rbx]" where rbx points to a MMIO address. In that case, what happens if a PCIe device takes long or forever to send a completion TLP?

similarly in the case of a mmio write:
if a cpu constantly writes to a device, can the PCIe fabric or device be so saturated that writes are simply dropped like in an IP network? and how does CPU know anything about saturation of the fabric or device given that write are posted

Top