Hi,
I have been doing some experiments to transfer data peer to peer between two PCIe device (NVMe in my case). In this case I have observed that when one PCIe endpoint device sends PCIe read request TLP to another PCIe endpoinr device, the completions start flowing in after a good latency. Whereas when PCIe endpoint device sends read request to host (Root Complex), the completions are almost instantaneous.
As I understand the PCIe Endpoint core and Root Complex cores are not extremely different. The skeleton are almost same and there should be no problem in handling the read requests in either case.
Any idea?