PCIe 5GT/s vs 2.5GT/s

Questions and discussions about the Xillybus IP core and drivers

PCIe 5GT/s vs 2.5GT/s

Postby Guest »

Eli,

Certainly Xillybus works at 5GT/s, so I'm just asking for your thoughts here. The Xillybus demo for KC705 defaulted to 2.5GT/s. We didn't change that and got it working. We got my big project working. Just now I switched the big project's Xilinx PCIe Endpoint from 5GT/s to 2.5GT/s. It's not working. The ltssm_state is getting to 0x2D = "Timeout to Detect". This differs from working = 0x16 = "L0" and from possibly startup timing issues = 0x08 = "Polling Compliance, Send_Pattern".

I'm currently falling back to the Xillybus demo itself as well as the Xilinx fundamental example/demo to see if 5GT/s works with those.

Thanks in advance,
Helmut
Guest
 

Re: PCIe 5GT/s vs 2.5GT/s

Postby support »

Hello,

Please refer to section 4.5 in "Getting started with the FPGA demo bundle for Xilinx":

http://xillybus.com/downloads/doc/xilly ... xilinx.pdf

It discusses changing the lane speed, and how to get through that correctly.

Regards,
Eli
support
 
Posts: 802
Joined:

Re: PCIe 5GT/s vs 2.5GT/s

Postby Guest »

Eli,

Thanks for the ref. I went through that doc very closely.

I had a working XILINX_DEMO at 2.5GT/s. I converted it to 5GT/s and it WORKED. This proves board worked. I did this task twice. The first time, I was trying to follow the first half of your doc's advice. The second time, I realized I wasn't changing anything of import, so for the second try, I essentially changed nothing. As mentioned, XILINX_DEMO with 5GT/s WORKED. (Note that XILINX_DEMO is simply a bare bones pcie_k7_vivado_ex with the pin corrections and stuff made.)

Then I had a working XILLY_DEMO at 2.5GT/s, which essentially came from your demo, adapted for 4x lanes. That was working. Twice now I've tried to dot every i and cross every t, following your doc to convert that to 5GT/s. Both attempts have failed.

I'm sure this is *my* problem, of course.

I good notes the second time around. I think I did all the right things.

I am **not** __expecting__ you, Eli, to go through the notes below. But just in case you want to do so and have time, I'm providing them now, rather than delaying things by first asking if you want them. Please forgive the apparent assumption that you can.


==============================================================================
Converting XILLY_DEMO_325T from 2.5GT/s to 5GT/s
==============================================================================

1) Copy XILLY_DEMO_325T to XILLY_DEMO_325T5GT and open project
A) Note Maximum Link Speed 2.5GT/s; AXI Interface Freq ***250MHz***; AXI Interface Width 64bit; Ref Clock Freq 100MHz
B) Note Vender ID 10EE, Device ID EBEB, Revision ID 07, Subsystem Vender ID 10EE, Subsystem ID EBEB

2) Sources, right click on pcie_7x_0_i and select Re-Customize IP.
3) Change Maximum Link Speed to 5.0 GT/s.
A) See AXI Interface Freq ***remain*** 250. See AXI Interface Width remain 64bit and Ref Clock Freq remain 100MHz
B) Note Vender ID remains 10EE, Device ID forced to 7024, Revision ID remains 07, Subsystem Vender ID remains 10EE, Subsystem ID remains EBEB
C) ***Change Device ID to EBEB.
D) Click OK then skip IP generation.
E) Customize IP again, confirm settings, CANCEL to get out

4) Run synthesis, including OOC IPs.
5) 6) numbering accidentally skipped
7) Compare XILLY_DEMO_325T\xillybus-eval-zynq-pcie-2.0c\vivado-essentials\pcie_k7_vivado\pcie_k7_vivado.xci with XILLY_DEMO_325T5GT\xillybus-eval-zynq-pcie-2.0c\vivado-essentials\pcie_k7_vivado\pcie_7x_0.xci\pcie_k7_vivado.xci

XILLY_DEMO_325T VALUES:
changed lines:
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.c_gen1">false</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.c_trgt_lnk_spd">0</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.max_lnk_spd">1</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="PARAM_VALUE.Link_Speed">2.5_GT/s</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="PARAM_VALUE.Trgt_Link_Speed">4&apos;h1</spirit:configurableElementValue>

XILLY_DEMO_325T5GT VALUES:
changed lines:
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.c_gen1">true</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.c_trgt_lnk_spd">2</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.c_gen1">true</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="PARAM_VALUE.Link_Speed">5.0_GT/s</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="PARAM_VALUE.Trgt_Link_Speed">4&apos;h2</spirit:configurableElementValue>
new lines:
<xilinx:configElementInfo xilinx:referenceId="PARAM_VALUE.Interface_Width" xilinx:valueSource="user"/>
<xilinx:configElementInfo xilinx:referenceId="PARAM_VALUE.Link_Speed" xilinx:valueSource="user"/>
<xilinx:configElementInfo xilinx:referenceId="PARAM_VALUE.Max_Payload_Size" xilinx:valueSource="user"/>
<xilinx:configElementInfo xilinx:referenceId="PARAM_VALUE.Ref_Clk_Freq" xilinx:valueSource="user"/>
<xilinx:configElementInfo xilinx:referenceId="PARAM_VALUE.Trans_Buf_Pipeline" xilinx:valueSource="user"/>
<xilinx:configElementInfo xilinx:referenceId="PARAM_VALUE.Trgt_Link_Speed" xilinx:valueSource="user"/>

8) Generate Bitstream
9) Open implemented design and check PCIe pinout. Confirmed using PCIe pins _115 (Eli: our custom board is 4x and on GTX's 0-3 instead of standard 4-7.)
10) Write .mcs file
11) Open Hardware Manager, open target, right click device, Add Configuration Memory Device (Man=Micron, Dens=256, Width=x1_x2_x4) chose 3.3v device: mt25ql256-spi-x1_x2_x4
12) Program configuration Memory (confirm mcs has very recent time. careful, must change to correct project directory to get correct .mcs!)
13) Cold start and check lspci. Fails to appear on lspci three times in a row
14) Sources window, right click pcie_k7_vivado and select Open IP Example Design, specifying directory M:\SPECTRE\Src\Xillybus and unchecking overwrite existing example project
A) Run Synthesis
B) Open ...\synth_1\runme.log and see:
INFO: [Synth 8-6157] synthesizing module 'pcie_k7_vivado_pipe_clock' [m:/SPECTRE/Src/Xillybus/pcie_k7_vivado_ex/imports/pcie_k7_vivado_pipe_clock.v:67]
Parameter PCIE_ASYNC_EN bound to: FALSE - type: string
Parameter PCIE_TXBUF_EN bound to: FALSE - type: string
Parameter PCIE_CLK_SHARING_EN bound to: FALSE - type: string
Parameter PCIE_LANE bound to: 4 - type: integer
Parameter PCIE_LINK_SPEED bound to: 3 - type: integer
Parameter PCIE_REFCLK_FREQ bound to: 0 - type: integer
Parameter PCIE_USERCLK1_FREQ bound to: 4 - type: integer
Parameter PCIE_USERCLK2_FREQ bound to: 4 - type: integer
Parameter PCIE_OOBCLK_MODE bound to: 1 - type: integer
Parameter PCIE_DEBUG_MODE bound to: 0 - type: integer
Parameter DIVCLK_DIVIDE bound to: 1 - type: integer
Parameter CLKFBOUT_MULT_F bound to: 10 - type: integer
Parameter CLKIN1_PERIOD bound to: 10 - type: integer
Parameter CLKOUT0_DIVIDE_F bound to: 8 - type: integer
Parameter CLKOUT1_DIVIDE bound to: 4 - type: integer
Parameter CLKOUT2_DIVIDE bound to: 4 - type: integer
Parameter CLKOUT3_DIVIDE bound to: 4 - type: integer
Parameter CLKOUT4_DIVIDE bound to: 20 - type: integer
Parameter PCIE_GEN1_MODE bound to: 1'b0

C) XILLY_DEMO_325T5GT source xillybus.v reads:
pcie_k7_8x_pipe_clock #
(
.PCIE_ASYNC_EN ( "FALSE" ), // PCIe async enable
.PCIE_TXBUF_EN ( "FALSE" ), // PCIe TX buffer enable for Gen1/Gen2 only
.PCIE_LANE ( 6'h08 ), // PCIe number of lanes
.PCIE_LINK_SPEED ( 3 ),
.PCIE_REFCLK_FREQ ( 0 ), // PCIe reference clock frequency
.PCIE_USERCLK1_FREQ ( 4 ), // PCIe user clock 1 frequency
.PCIE_USERCLK2_FREQ ( 4 ), // PCIe user clock 2 frequency
.PCIE_DEBUG_MODE ( 0 )
)

D) The two above appear consistent for PCIE_LINK_SPEED==3, PCIE_USERCLK1_FREQ=4, and PCIE_USERCLK2_FREQ==4
E) Note, however, that the example design pcie_k7_vivado_ex has source pcie_k7_vivado_support.v that reads:
pcie_k7_vivado_pipe_clock #
(
.PCIE_ASYNC_EN ( "FALSE" ), // PCIe async enable
.PCIE_TXBUF_EN ( "FALSE" ), // PCIe TX buffer enable for Gen1/Gen2 only
.PCIE_LANE ( LINK_CAP_MAX_LINK_WIDTH ), // PCIe number of lanes
// synthesis translate_off
.PCIE_LINK_SPEED ( 2 ),
// synthesis translate_on
.PCIE_REFCLK_FREQ ( PCIE_REFCLK_FREQ ), // PCIe reference clock frequency
.PCIE_USERCLK1_FREQ ( PCIE_USERCLK1_FREQ ), // PCIe user clock 1 frequency
.PCIE_USERCLK2_FREQ ( PCIE_USERCLK2_FREQ ), // PCIe user clock 2 frequency
.PCIE_DEBUG_MODE ( 0 )
)

F) Note that even though the log has PCIE_LINK_SPEED==3 (I think means either speed), this source above has PCIE_LINK_SPEED==2 (only PCIe Gen2).
Long ago I tried a quick edit to xillybus.v from 3 to 2, but that didn't appear to help.
Meanwhile, I can calculate that PCIE_USERCLK1_FREQ==4.
More difficult to calculate, it seems that PCIE_USERCLK2_FREQ==4. The code for both userclks is below

localparam USER_CLK_FREQ = 3;
localparam USER_CLK2_DIV2 = "FALSE";
localparam USERCLK2_FREQ = (USER_CLK2_DIV2 == "TRUE") ? (USER_CLK_FREQ == 4) ? 3 : (USER_CLK_FREQ == 3) ? 2 : USER_CLK_FREQ: USER_CLK_FREQ;
//HELMUT's REDUCTION: localparam USERCLK2_FREQ = (USER_CLK2_DIV2 == "TRUE") ? xxxxx : USER_CLK_FREQ;
//HELMUT'S REDUCTION: xxxxx equals: (USER_CLK_FREQ == 4) ? 3 : (USER_CLK_FREQ == 3) ? 2 : USER_CLK_FREQ
//HELMUT'S REDUCTION: xxxxx equals: (USER_CLK_FREQ == 4) ? 3 : yyyyy
//HELMUT'S REDUCTION: yyyyy equals: (USER_CLK_FREQ == 3) ? 2 : USER_CLK_FREQ
//HELMUT'S REDUCTION: resolves to: localparam USERCLK2_FREQ = USER_CLK_FREQ
//HELMUT'S REDUCTION: resolves to: localparam USERCLK2_FREQ = 3
//HELMUT'S REDUCTION: resolves to: PCIE_USERCLK2_FREQ==4
pcie_k7_vivado_support #
(
.LINK_CAP_MAX_LINK_WIDTH ( 4 ), // PCIe Lane Width
.C_DATA_WIDTH ( C_DATA_WIDTH ), // RX/TX interface data width
.KEEP_WIDTH ( KEEP_WIDTH ), // TSTRB width
.PCIE_REFCLK_FREQ ( REF_CLK_FREQ ), // PCIe reference clock frequency
.PCIE_USERCLK1_FREQ ( USER_CLK_FREQ +1 ), // PCIe user clock 1 frequency
.PCIE_USERCLK2_FREQ ( USERCLK2_FREQ +1 ), // PCIe user clock 2 frequency
.PCIE_USE_MODE ("3.0"), // PCIe use mode
.PCIE_GT_DEVICE ("GTX") // PCIe GT device
)
Guest
 

Re: PCIe 5GT/s vs 2.5GT/s

Postby support »

Hello,

As you've noted correctly above, there is no change in the pcie_k7_8x_pipe_clock settings, and hence it should be just a matter of changing the link speed, regenerate the PCIe block, reimplement, and all should be good. There's no need to analyze the internals further. This has been done before numerous times.

Have you tried the transition from Gen1 to Gen2 with an out-of-the-box Xillybus bundle (the one for 4x, that is)?

Regards,
Eli
support
 
Posts: 802
Joined:

Re: PCIe 5GT/s vs 2.5GT/s

Postby Guest »

Eli,

Since my last post on this thread, I have gone back to fundamentals. I started again with the K7/Zynq example from Xillybus, and made only the minimal changes necessary to work at 2.5GT/s on our slightly non-standard target board. Then, I extremely carefully followed the recommended process in the getting started pdf, section 4.5 "Changing the number of PCIe lanes and/or link speed".

The extremely careful process resulted in failure. The result attempting to work at 5GT/s doesn't show on lspci. The very last attempt included a previously proven method of the FPGA holding the SBC in reset.

Well, I know Xillybus will work at 5GT/s in general, but I am simply unable to achieve this result as describe above. This leads me to try two different paths forward, and I'd like your advice on them, please.

1) What else can I try along the path that's already failed?

2) I want to try a new path that compares the Xillybus demo that fails at 5GT/s to the Xilinx example that works at 5GT/s.

Yes, I believe I've mentioned it here before: The Xilinx example that comes with the Xilinx core, and that's examined in the Xillybus getting started section 4.5, can itself be built and run, of course. When I do this, adapting it for our slightly non-standard target board, it works at 2.5GT/s. When I upgrade it to 5GT/s, it works at that speed as well. So I find myself with a Xilinx example that works at 5GT/s, but a Xillybus demo that fails.

So my #2 path is to compare these two. The failing Xillybus demo is in fact, in part, a wrapper around the exact same Xilinx core. The working Xilinx example is a different wrapper of the exact same Xilinx core. So one wrapper is failing while the other is working. I am very skilled at taking baby steps to migrate a working case to a failed case, finding out along the way what's wrong and how to make the failing case work. This is what I'd like to do for my #2 path. I believe this will involve some closing looking at the pipe clock, of course. My strong belief and hope is that I will discover the problem, and thus be able to make the Xillybus demo as well as all my existing projects, suddenly begin working at the desired 5GT/s speed.

At this point, I'll repeat that when I say "failing Xillybus demo", this is in no way an indictment of Xillybus. It's simply the state of affairs for my current copy of the demo, adapted for our board. Eventually I'll have the "working Xillybus demo". It will then be working because either I found a minor nuance that I misunderstood from the getting started guide, or I found an additional thing that the getting started guide didn't mention, perhaps simply because of our slightly non-standard target board.

With all of the above said, do you have any advice, Eli, for either path #1 or path #2?

Thanks again,
Helmut
Guest
 

Re: PCIe 5GT/s vs 2.5GT/s

Postby support »

Hello,

At this late stage, I wonder why the 5 GT/s is necessary. I suppose you're using a x4 setting, so with Gen1 x 4 you're at 800 MB/s. Do you have any intention to go higher than that, with a revision B IP core, maybe? Otherwise, I can't see the point.

I don't remember if you've already done this: Compare the parameters of the example PCIe block with the one used by Xillybus. In particular, I've seen issues with the "Device Class" PCIe configuration entry, which is set to 0xff on Xillybus (unclassified), while Xilinx' examples tend to pick 0x05 (Memory Controller, which is strictly speaking a false declaration). Some computers have gone crazy on the 0xff class, but working fine with Gen1 and fail with Gen2 because of this would set a new level of peculiarity.

Otherwise, I would go for the #2 path myself. This whole situation is really odd.

Regards,
Eli
support
 
Posts: 802
Joined:


Return to Xillybus

cron