Error in Xillybus demo: Lost sync with interrupt messages.

Post a reply

Confirmation code
Enter the code exactly as it appears. All letters are case insensitive.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:
BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON
Topic review
   

Expand view Topic review: Error in Xillybus demo: Lost sync with interrupt messages.

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by tsegorah »

support wrote:So what I can suggest at this point is to edit the driver to swap the byte order of the data as it's interpreted by the ISR. I suppose one of the kernel built-ins cpu_to_be32()/be32_to_cpu()/cpu_to_le32()/le32_to_cpu() will do the trick.

Yes, it works now, thank you. We've swatted the bytes in xillybus_isr.

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by support »

Hello,

Actually, it's much worse. It gets stuck really early.

Sorry. This was a completely silly idea. One can't swap the data just like that, because the TLP header information gets messed up.

So what I can suggest at this point is to edit the driver to swap the byte order of the data as it's interpreted by the ISR. I suppose one of the kernel built-ins cpu_to_be32()/be32_to_cpu()/cpu_to_le32()/le32_to_cpu() will do the trick. Since all processing of the data from the message buffer is done in the ISR, that should be enough.

The data may arrive swapped as well, but if you use a 32-bit stream, I guess swapping of the data wires will help.

Unfortunately, big Endian is quite uncommon, so Xillybus doesn't support it so well.

Regards,
Eli

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by tsegorah »

I'we swapped the wires, but the error looks like the same:
Code: Select all
[  110.411128] xilibus: xilibus core init
[  110.443063] xilibus: xillybus_pcie_init
[  110.443107] xilibus pcie: xilly_probe
[  110.443129] pci 0000:00:00.0: enabling device (0106 -> 0107)
[  110.443144] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[  110.443158] xilibus pcie: map BAR 0 (first 128 bytes)
[  110.443266] xilibus pcie: DMA set mask 32
[  110.443272] xilibus: xillybus_endpoint_discovery
[  110.443278] xilibus: setup channels. Channel count 0
[  110.443285] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[  110.443296] xilibus: dma_addr low 0x7977000
[  110.443300] xilibus: dma_addr high 0x0
[  110.542474] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by support »

Hello,

Yes, it did mess up things, pretty much as expected.

Edit: Note that I suggested something bad below. Don't try it, as it won't help.

What I can suggest as the next attempt, is to swap the byte ordering in xillybus.v. The related signals to swap are trn_td and trn_rd.

Note however that the swapping should be done for each 32-bit word separately. So in the instantiation of pcie_v6_4x it goes:

Code: Select all
.trn_td( { trn_td[39:32],  trn_td[47:40], trn_td[55:48], trn_td[63:56], trn_td[7:0],  trn_td[15:8], trn_td[23:16], trn_td[31:24] } ),


And the same with trn_rd.

Or maybe swap the whole 64-bit word:

Code: Select all
.trn_td( { trn_td[7:0],  trn_td[15:8], trn_td[23:16], trn_td[31:24], trn_td[39:32],  trn_td[47:40], trn_td[55:48], trn_td[63:56]  } ),


I wouldn't normally, but it looks from your debug output that the 32-bit words are swapped as well. In case you printed them out in the order they appear in memory.

The problem is that big Endian machines tend to make some correction by themselves.

It may be possible to work this out in the driver, but fixing it by playing with the Verilog code is the once-for-all solution. Once you get it right.

Regards,
Eli

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by tsegorah »

tsegorah wrote:[ 302.472256] xilibus: 1004002 470000b0

If we'll swap the bytes manually, will these values be the right?

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by tsegorah »

Now the log looks like this:
Code: Select all
[   98.420459] xilibus pcie: xilly_probe
[   98.420532] pci 0000:00:00.0: enabling device (0106 -> 0107)
[   98.420550] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[   98.420565] xilibus pcie: map BAR 0 (first 128 bytes)
[   98.420671] xilibus pcie: DMA set mask 32
[   98.420677] xilibus: xillybus_endpoint_discovery
[   98.420682] xilibus: setup channels. Channel count 0
[   98.420689] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[   98.420702] xilibus: dma_addr low 0x7982000
[   98.420706] xilibus: dma_addr high 0x0
[   98.420723] xilibus: xillybus_isr
[   98.420731] xilibus: sizeof(msg_buffer) = 4096
[   98.420736] xilibus: 0 0
[   98.420740] xilibus: fpga counter 0; own counter 11
[   98.420750] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=0, channel=000, dir=0, bufno=000, data=0000000
[   98.420757] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 0 (instead of b) on entry 0
[   98.520494] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.
[  143.601923] xilibus: xillybus_pcie_init
[  143.601958] xilibus pcie: xilly_probe
[  143.601980] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[  143.601994] xilibus pcie: map BAR 0 (first 128 bytes)
[  143.602102] xilibus pcie: DMA set mask 32
[  143.602108] xilibus: xillybus_endpoint_discovery
[  143.602113] xilibus: setup channels. Channel count 0
[  143.602119] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[  143.602130] xilibus: dma_addr low 0x78f8000
[  143.602134] xilibus: dma_addr high 0x0
[  143.701496] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.
[  251.060279] xilibus: xillybus_pcie_init
[  251.060313] xilibus pcie: xilly_probe
[  251.060335] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[  251.060349] xilibus pcie: map BAR 0 (first 128 bytes)
[  251.060459] xilibus pcie: DMA set mask 32
[  251.060465] xilibus: xillybus_endpoint_discovery
[  251.060503] xilibus: setup channels. Channel count 0
[  251.060511] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[  251.060524] xilibus: dma_addr low 0x7830000
[  251.060528] xilibus: dma_addr high 0x0
[  251.160494] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by support »

Hello,

There's an issue with big Endian, because the PCI/PCIe bus is big Endian, and then there's a whole mess with swapping the byte ordering back and forth (as most architectures are little Endian).

One thing you could try, is to change the line in xillybus_core.c saying

Code: Select all
iowrite32(1, endpoint->registers + fpga_endian_reg);


to

Code: Select all
iowrite32(0x1000000, endpoint->registers + fpga_endian_reg);


That will flip the FPGA's endian correction mechanism. It might do the trick. But sometimes it can mess something else up.

Regards,
Eli

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by tsegorah »

support wrote:Hello,

Is the processor running in big Endian mode, by any chance?


Yes it is.

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by support »

Hello,

I can see that you've edited the code of the driver as well. Have you made any changes except for adding debug output? Something related to cache coherency, maybe?

Are you sure that you haven't changed anything else that the clock frequency and the device ID? Have you compared the .xco files to verify this?

Is the processor running in big Endian mode, by any chance?

Besides, does this always happen, even when loading the driver on a system just powered up?

Regards,
Eli

Re: Error in Xillybus demo: Lost sync with interrupt message

Post by tsegorah »

support wrote:Hello,

To begin with, I don't suggest focusing on the FPGA side. The IP core is heavily used, and is stable. The chances for a problem on that side are very slim.

What is less common is the processor you're using. Unfortunately, you've supplied partial information. That error never occurs by itself. Could you please supply all messages produced by Xillybus' driver in the kernel log when the problem occurred?

Also -- does the design meet timing constraints? Have you made any manipulations to the driver, the processor's PCIe driver, interface or anything else? Regardless, I would suggest connecting just any PCIe device to the port (a PCIe Ethernet card, for example) and see if it works properly.

Regards,
Eli

We use a custm PCB so we changed .ucf file. Here it is:
=======================================================================
#CONFIG PART = xc6vlx240t-ff1156-1;

# The location constraints for REFCLK are implicitly given by the choice
# of the input buffer.

#NET "PCIE_REFCLK_P" LOC = V6;
#NET "PCIE_REFCLK_N" LOC = V5;
#INST "xillybus_ins/pcieclk_ibuf" LOC = IBUFDS_GTXE1_X0Y7;
NET "PCIE_REFCLK_P" LOC = AD8;

#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[0].GTX" LOC = GTXE1_X0Y15;
#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[1].GTX" LOC = GTXE1_X0Y14;
#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[2].GTX" LOC = GTXE1_X0Y13;
#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[3].GTX" LOC = GTXE1_X0Y12;
#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_block_i" LOC = PCIE_X0Y1;

#INST "xillybus_ins/pcie/pcie_clocking_i/mmcm_adv_i" LOC = MMCM_ADV_X0Y7;

NET "PCIE_REFCLK_P" TNM_NET = "SYSCLK";
NET "*/pcie/pcie_clocking_i/clk_125" TNM_NET = "CLK_125";
NET "*/pcie/TxOutClk_bufg" TNM_NET = "TXOUTCLKBUFG";

TIMESPEC TS_SYSCLK = PERIOD "SYSCLK" 100 MHz HIGH 50 % PRIORITY 100;
TIMESPEC TS_CLK_125 = PERIOD "CLK_125" TS_SYSCLK * 1.25 HIGH 50 % PRIORITY 1;
TIMESPEC TS_TXOUTCLKBUFG = PERIOD "TXOUTCLKBUFG" 100 MHz HIGH 50 % PRIORITY 100;

PIN "*/pcie/trn_reset_n_int_i.CLR" TIG;
PIN "*/pcie/trn_reset_n_i.CLR" TIG;
PIN "*/pcie/pcie_clocking_i/mmcm_adv_i.RST" TIG;

#NET "PCIE_PERST_B_LS" TIG;
#NET "PCIE_PERST_B_LS" LOC = AE13 | IOSTANDARD = LVCMOS25 | PULLUP | NODELAY ;

# DS12
#NET "GPIO_LED[0]" LOC = "AC22";
# DS11
#NET "GPIO_LED[1]" LOC = "AC24";
# DS9
#NET "GPIO_LED[2]" LOC = "AE22";
# DS10
#NET "GPIO_LED[3]" LOC = "AE23";

# PlanAhead Generated physical constraints

NET "PCIE_RX_P[1]" LOC = AD3;
NET "PCIE_RX_P[0]" LOC = AE5;
#NET "PCIE_RX_P[3]" LOC = AF3;
#NET "PCIE_RX_P[2]" LOC = AG5;
NET "PCIE_TX_P[1]" LOC = AG1;
NET "PCIE_TX_P[0]" LOC = AH3;
#NET "PCIE_TX_P[3]" LOC = AJ1;
#NET "PCIE_TX_P[2]" LOC = AK3;

NET "GPIO_LED[3]" LOC = H36;
NET "GPIO_LED[2]" LOC = G39;
NET "GPIO_LED[1]" LOC = F37;
NET "GPIO_LED[0]" LOC = C38;
===================================================

And only lanes 0 and 1 are connected on PCB.

We added some messages to kernel log:
===============================================================================
[ 302.471883] xilibus: xillybus_pcie_init
[ 302.471920] xilibus pcie: xilly_probe
[ 302.471943] pci 0000:00:00.0: enabling device (0106 -> 0107)
[ 302.471958] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[ 302.471991] xilibus pcie: map BAR 0 (first 128 bytes)
[ 302.472119] xilibus pcie: DMA set mask 32
[ 302.472126] xilibus: xillybus_endpoint_discovery
[ 302.472131] xilibus: setup channels. Channel count 0
[ 302.472138] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[ 302.472149] xilibus: dma_addr low 0x79a8000
[ 302.472153] xilibus: dma_addr high 0x0
[ 302.472169] xilibus: xillybus_isr
[ 302.472176] xilibus: sizeof(msg_buffer) = 4096
[ 302.472181] xilibus: 1004002 470000b0
[ 302.472186] xilibus: fpga counter 4; own counter 11
[ 302.472196] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472203] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472214] xilibus: xillybus_isr
[ 302.472218] xilibus: sizeof(msg_buffer) = 4096
[ 302.472222] xilibus: 1004002 470000b0
[ 302.472226] xilibus: fpga counter 4; own counter 11
[ 302.472234] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472241] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472248] xilibus: xillybus_isr
[ 302.472252] xilibus: sizeof(msg_buffer) = 4096
[ 302.472256] xilibus: 1004002 470000b0
[ 302.472261] xilibus: fpga counter 4; own counter 11
[ 302.472268] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472275] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472282] xilibus: xillybus_isr
[ 302.472286] xilibus: sizeof(msg_buffer) = 4096
[ 302.472291] xilibus: 1004002 470000b0
[ 302.472295] xilibus: fpga counter 4; own counter 11
[ 302.472303] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472310] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472317] xilibus: xillybus_isr
[ 302.472321] xilibus: sizeof(msg_buffer) = 4096
[ 302.472326] xilibus: 1004002 470000b0
[ 302.472330] xilibus: fpga counter 4; own counter 11
[ 302.472337] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472344] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472351] xilibus: xillybus_isr
[ 302.472355] xilibus: sizeof(msg_buffer) = 4096
[ 302.472360] xilibus: 1004002 470000b0
[ 302.472364] xilibus: fpga counter 4; own counter 11
[ 302.472372] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472378] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472385] xilibus: xillybus_isr
[ 302.472389] xilibus: sizeof(msg_buffer) = 4096
[ 302.472394] xilibus: 1004002 470000b0
[ 302.472398] xilibus: fpga counter 4; own counter 11
[ 302.472406] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472413] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472420] xilibus: xillybus_isr
[ 302.472424] xilibus: sizeof(msg_buffer) = 4096
[ 302.472428] xilibus: 1004002 470000b0
[ 302.472432] xilibus: fpga counter 4; own counter 11
[ 302.472440] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472447] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472454] xilibus: xillybus_isr
[ 302.472458] xilibus: sizeof(msg_buffer) = 4096
[ 302.472462] xilibus: 1004002 470000b0
[ 302.472467] xilibus: fpga counter 4; own counter 11
[ 302.472474] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472481] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472488] xilibus: xillybus_isr
[ 302.472492] xilibus: sizeof(msg_buffer) = 4096
[ 302.472496] xilibus: 1004002 470000b0
[ 302.472501] xilibus: fpga counter 4; own counter 11
[ 302.472508] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472515] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472522] xilibus: xillybus_isr
[ 302.472526] xilibus: sizeof(msg_buffer) = 4096
[ 302.472531] xilibus: 1004002 470000b0
[ 302.472535] xilibus: fpga counter 4; own counter 11
[ 302.472543] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472549] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472556] xillybus_pcie 0000:01:00.0: Lost sync with interrupt messages. Stopping.
[ 302.571480] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.
====================================================================================

We've changed Clk to 100 MHz and regenerated the xilinx core.
Although we've changed device ID in the xilinx core and the driver from EBEB to 6012 and subsystem ID to 6012 too.
Design meets constrains.
CPU and FPGA are on the same PCB, so we can't connect another device.

Top