hello xillybus,
in the case of a mmio read:
the cpu simply has an instruction of loading to register from a memory location (indistinguishable from loading from a host DRAM location). Is the CPU allowed to take as many cycles as it needs to "mov rax, [rbx]" where rbx points to a MMIO address. In that case, what happens if a PCIe device takes long or forever to send a completion TLP?
similarly in the case of a mmio write:
if a cpu constantly writes to a device, can the PCIe fabric or device be so saturated that writes are simply dropped like in an IP network? and how does CPU know anything about saturation of the fabric or device given that write are posted