Error in Xillybus demo: Lost sync with interrupt messages.

Questions and discussions about the Xillybus IP core and drivers

Error in Xillybus demo: Lost sync with interrupt messages.

Postby tsegorah »

Good day. We try to test xillybus on Virtex 6 1140 and NXP qorlq p2020 and Linux embedded 3.12.20.
We get the kernel error: Lost sync with interrupt messages.
The error apears when we try to load the PCIE driver.
Messages on AXI from Linux to FPGA:
all messages consist of 2 64-bit words and the first of them is always 80000040_0000000F.
the 2th word is:
80000040_01000000
80000028_00508807
8000002C_00000000
80000024_00000080
80000008_04000000
80000020_00000000
80000008_01000000
80000008_01000000
80000008_01000000
80000008_01000000
80000008_01000000
80000008_01000000
80000008_01000000
80000008_01000000
80000008_01000000
80000008_01000000
When the core gets the message 80000020_00000000, it sends an interrupt and a message of 3 64-bit words:
40000002_010000FF
XXXXX000_01004002
470000B0_470000B0
where XXXXX changes between loadings of the driver and can be, for example, 079A7 or 2FB55.
This message apears not once, but 11 times during the loading.
What is happening? What can we do with this proplem?
tsegorah
 
Posts: 7
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby support »

Hello,

To begin with, I don't suggest focusing on the FPGA side. The IP core is heavily used, and is stable. The chances for a problem on that side are very slim.

What is less common is the processor you're using. Unfortunately, you've supplied partial information. That error never occurs by itself. Could you please supply all messages produced by Xillybus' driver in the kernel log when the problem occurred?

Also -- does the design meet timing constraints? Have you made any manipulations to the driver, the processor's PCIe driver, interface or anything else? Regardless, I would suggest connecting just any PCIe device to the port (a PCIe Ethernet card, for example) and see if it works properly.

Regards,
Eli
support
 
Posts: 802
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby tsegorah »

support wrote:Hello,

To begin with, I don't suggest focusing on the FPGA side. The IP core is heavily used, and is stable. The chances for a problem on that side are very slim.

What is less common is the processor you're using. Unfortunately, you've supplied partial information. That error never occurs by itself. Could you please supply all messages produced by Xillybus' driver in the kernel log when the problem occurred?

Also -- does the design meet timing constraints? Have you made any manipulations to the driver, the processor's PCIe driver, interface or anything else? Regardless, I would suggest connecting just any PCIe device to the port (a PCIe Ethernet card, for example) and see if it works properly.

Regards,
Eli

We use a custm PCB so we changed .ucf file. Here it is:
=======================================================================
#CONFIG PART = xc6vlx240t-ff1156-1;

# The location constraints for REFCLK are implicitly given by the choice
# of the input buffer.

#NET "PCIE_REFCLK_P" LOC = V6;
#NET "PCIE_REFCLK_N" LOC = V5;
#INST "xillybus_ins/pcieclk_ibuf" LOC = IBUFDS_GTXE1_X0Y7;
NET "PCIE_REFCLK_P" LOC = AD8;

#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[0].GTX" LOC = GTXE1_X0Y15;
#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[1].GTX" LOC = GTXE1_X0Y14;
#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[2].GTX" LOC = GTXE1_X0Y13;
#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[3].GTX" LOC = GTXE1_X0Y12;
#INST "xillybus_ins/pcie/pcie_2_0_i/pcie_block_i" LOC = PCIE_X0Y1;

#INST "xillybus_ins/pcie/pcie_clocking_i/mmcm_adv_i" LOC = MMCM_ADV_X0Y7;

NET "PCIE_REFCLK_P" TNM_NET = "SYSCLK";
NET "*/pcie/pcie_clocking_i/clk_125" TNM_NET = "CLK_125";
NET "*/pcie/TxOutClk_bufg" TNM_NET = "TXOUTCLKBUFG";

TIMESPEC TS_SYSCLK = PERIOD "SYSCLK" 100 MHz HIGH 50 % PRIORITY 100;
TIMESPEC TS_CLK_125 = PERIOD "CLK_125" TS_SYSCLK * 1.25 HIGH 50 % PRIORITY 1;
TIMESPEC TS_TXOUTCLKBUFG = PERIOD "TXOUTCLKBUFG" 100 MHz HIGH 50 % PRIORITY 100;

PIN "*/pcie/trn_reset_n_int_i.CLR" TIG;
PIN "*/pcie/trn_reset_n_i.CLR" TIG;
PIN "*/pcie/pcie_clocking_i/mmcm_adv_i.RST" TIG;

#NET "PCIE_PERST_B_LS" TIG;
#NET "PCIE_PERST_B_LS" LOC = AE13 | IOSTANDARD = LVCMOS25 | PULLUP | NODELAY ;

# DS12
#NET "GPIO_LED[0]" LOC = "AC22";
# DS11
#NET "GPIO_LED[1]" LOC = "AC24";
# DS9
#NET "GPIO_LED[2]" LOC = "AE22";
# DS10
#NET "GPIO_LED[3]" LOC = "AE23";

# PlanAhead Generated physical constraints

NET "PCIE_RX_P[1]" LOC = AD3;
NET "PCIE_RX_P[0]" LOC = AE5;
#NET "PCIE_RX_P[3]" LOC = AF3;
#NET "PCIE_RX_P[2]" LOC = AG5;
NET "PCIE_TX_P[1]" LOC = AG1;
NET "PCIE_TX_P[0]" LOC = AH3;
#NET "PCIE_TX_P[3]" LOC = AJ1;
#NET "PCIE_TX_P[2]" LOC = AK3;

NET "GPIO_LED[3]" LOC = H36;
NET "GPIO_LED[2]" LOC = G39;
NET "GPIO_LED[1]" LOC = F37;
NET "GPIO_LED[0]" LOC = C38;
===================================================

And only lanes 0 and 1 are connected on PCB.

We added some messages to kernel log:
===============================================================================
[ 302.471883] xilibus: xillybus_pcie_init
[ 302.471920] xilibus pcie: xilly_probe
[ 302.471943] pci 0000:00:00.0: enabling device (0106 -> 0107)
[ 302.471958] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[ 302.471991] xilibus pcie: map BAR 0 (first 128 bytes)
[ 302.472119] xilibus pcie: DMA set mask 32
[ 302.472126] xilibus: xillybus_endpoint_discovery
[ 302.472131] xilibus: setup channels. Channel count 0
[ 302.472138] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[ 302.472149] xilibus: dma_addr low 0x79a8000
[ 302.472153] xilibus: dma_addr high 0x0
[ 302.472169] xilibus: xillybus_isr
[ 302.472176] xilibus: sizeof(msg_buffer) = 4096
[ 302.472181] xilibus: 1004002 470000b0
[ 302.472186] xilibus: fpga counter 4; own counter 11
[ 302.472196] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472203] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472214] xilibus: xillybus_isr
[ 302.472218] xilibus: sizeof(msg_buffer) = 4096
[ 302.472222] xilibus: 1004002 470000b0
[ 302.472226] xilibus: fpga counter 4; own counter 11
[ 302.472234] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472241] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472248] xilibus: xillybus_isr
[ 302.472252] xilibus: sizeof(msg_buffer) = 4096
[ 302.472256] xilibus: 1004002 470000b0
[ 302.472261] xilibus: fpga counter 4; own counter 11
[ 302.472268] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472275] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472282] xilibus: xillybus_isr
[ 302.472286] xilibus: sizeof(msg_buffer) = 4096
[ 302.472291] xilibus: 1004002 470000b0
[ 302.472295] xilibus: fpga counter 4; own counter 11
[ 302.472303] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472310] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472317] xilibus: xillybus_isr
[ 302.472321] xilibus: sizeof(msg_buffer) = 4096
[ 302.472326] xilibus: 1004002 470000b0
[ 302.472330] xilibus: fpga counter 4; own counter 11
[ 302.472337] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472344] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472351] xilibus: xillybus_isr
[ 302.472355] xilibus: sizeof(msg_buffer) = 4096
[ 302.472360] xilibus: 1004002 470000b0
[ 302.472364] xilibus: fpga counter 4; own counter 11
[ 302.472372] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472378] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472385] xilibus: xillybus_isr
[ 302.472389] xilibus: sizeof(msg_buffer) = 4096
[ 302.472394] xilibus: 1004002 470000b0
[ 302.472398] xilibus: fpga counter 4; own counter 11
[ 302.472406] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472413] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472420] xilibus: xillybus_isr
[ 302.472424] xilibus: sizeof(msg_buffer) = 4096
[ 302.472428] xilibus: 1004002 470000b0
[ 302.472432] xilibus: fpga counter 4; own counter 11
[ 302.472440] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472447] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472454] xilibus: xillybus_isr
[ 302.472458] xilibus: sizeof(msg_buffer) = 4096
[ 302.472462] xilibus: 1004002 470000b0
[ 302.472467] xilibus: fpga counter 4; own counter 11
[ 302.472474] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472481] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472488] xilibus: xillybus_isr
[ 302.472492] xilibus: sizeof(msg_buffer) = 4096
[ 302.472496] xilibus: 1004002 470000b0
[ 302.472501] xilibus: fpga counter 4; own counter 11
[ 302.472508] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472515] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472522] xilibus: xillybus_isr
[ 302.472526] xilibus: sizeof(msg_buffer) = 4096
[ 302.472531] xilibus: 1004002 470000b0
[ 302.472535] xilibus: fpga counter 4; own counter 11
[ 302.472543] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=1, channel=001, dir=0, bufno=004, data=70000b0
[ 302.472549] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 4 (instead of b) on entry 0
[ 302.472556] xillybus_pcie 0000:01:00.0: Lost sync with interrupt messages. Stopping.
[ 302.571480] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.
====================================================================================

We've changed Clk to 100 MHz and regenerated the xilinx core.
Although we've changed device ID in the xilinx core and the driver from EBEB to 6012 and subsystem ID to 6012 too.
Design meets constrains.
CPU and FPGA are on the same PCB, so we can't connect another device.
tsegorah
 
Posts: 7
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby support »

Hello,

I can see that you've edited the code of the driver as well. Have you made any changes except for adding debug output? Something related to cache coherency, maybe?

Are you sure that you haven't changed anything else that the clock frequency and the device ID? Have you compared the .xco files to verify this?

Is the processor running in big Endian mode, by any chance?

Besides, does this always happen, even when loading the driver on a system just powered up?

Regards,
Eli
support
 
Posts: 802
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby tsegorah »

support wrote:Hello,

Is the processor running in big Endian mode, by any chance?


Yes it is.
tsegorah
 
Posts: 7
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby support »

Hello,

There's an issue with big Endian, because the PCI/PCIe bus is big Endian, and then there's a whole mess with swapping the byte ordering back and forth (as most architectures are little Endian).

One thing you could try, is to change the line in xillybus_core.c saying

Code: Select all
iowrite32(1, endpoint->registers + fpga_endian_reg);


to

Code: Select all
iowrite32(0x1000000, endpoint->registers + fpga_endian_reg);


That will flip the FPGA's endian correction mechanism. It might do the trick. But sometimes it can mess something else up.

Regards,
Eli
support
 
Posts: 802
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby tsegorah »

Now the log looks like this:
Code: Select all
[   98.420459] xilibus pcie: xilly_probe
[   98.420532] pci 0000:00:00.0: enabling device (0106 -> 0107)
[   98.420550] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[   98.420565] xilibus pcie: map BAR 0 (first 128 bytes)
[   98.420671] xilibus pcie: DMA set mask 32
[   98.420677] xilibus: xillybus_endpoint_discovery
[   98.420682] xilibus: setup channels. Channel count 0
[   98.420689] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[   98.420702] xilibus: dma_addr low 0x7982000
[   98.420706] xilibus: dma_addr high 0x0
[   98.420723] xilibus: xillybus_isr
[   98.420731] xilibus: sizeof(msg_buffer) = 4096
[   98.420736] xilibus: 0 0
[   98.420740] xilibus: fpga counter 0; own counter 11
[   98.420750] xillybus_pcie 0000:01:00.0: Malformed message (skipping): opcode=0, channel=000, dir=0, bufno=000, data=0000000
[   98.420757] xillybus_pcie 0000:01:00.0: Sending a NACK on counter 0 (instead of b) on entry 0
[   98.520494] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.
[  143.601923] xilibus: xillybus_pcie_init
[  143.601958] xilibus pcie: xilly_probe
[  143.601980] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[  143.601994] xilibus pcie: map BAR 0 (first 128 bytes)
[  143.602102] xilibus pcie: DMA set mask 32
[  143.602108] xilibus: xillybus_endpoint_discovery
[  143.602113] xilibus: setup channels. Channel count 0
[  143.602119] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[  143.602130] xilibus: dma_addr low 0x78f8000
[  143.602134] xilibus: dma_addr high 0x0
[  143.701496] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.
[  251.060279] xilibus: xillybus_pcie_init
[  251.060313] xilibus pcie: xilly_probe
[  251.060335] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[  251.060349] xilibus pcie: map BAR 0 (first 128 bytes)
[  251.060459] xilibus pcie: DMA set mask 32
[  251.060465] xilibus: xillybus_endpoint_discovery
[  251.060503] xilibus: setup channels. Channel count 0
[  251.060511] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[  251.060524] xilibus: dma_addr low 0x7830000
[  251.060528] xilibus: dma_addr high 0x0
[  251.160494] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.
tsegorah
 
Posts: 7
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby tsegorah »

tsegorah wrote:[ 302.472256] xilibus: 1004002 470000b0

If we'll swap the bytes manually, will these values be the right?
tsegorah
 
Posts: 7
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby support »

Hello,

Yes, it did mess up things, pretty much as expected.

Edit: Note that I suggested something bad below. Don't try it, as it won't help.

What I can suggest as the next attempt, is to swap the byte ordering in xillybus.v. The related signals to swap are trn_td and trn_rd.

Note however that the swapping should be done for each 32-bit word separately. So in the instantiation of pcie_v6_4x it goes:

Code: Select all
.trn_td( { trn_td[39:32],  trn_td[47:40], trn_td[55:48], trn_td[63:56], trn_td[7:0],  trn_td[15:8], trn_td[23:16], trn_td[31:24] } ),


And the same with trn_rd.

Or maybe swap the whole 64-bit word:

Code: Select all
.trn_td( { trn_td[7:0],  trn_td[15:8], trn_td[23:16], trn_td[31:24], trn_td[39:32],  trn_td[47:40], trn_td[55:48], trn_td[63:56]  } ),


I wouldn't normally, but it looks from your debug output that the 32-bit words are swapped as well. In case you printed them out in the order they appear in memory.

The problem is that big Endian machines tend to make some correction by themselves.

It may be possible to work this out in the driver, but fixing it by playing with the Verilog code is the once-for-all solution. Once you get it right.

Regards,
Eli
support
 
Posts: 802
Joined:

Re: Error in Xillybus demo: Lost sync with interrupt message

Postby tsegorah »

I'we swapped the wires, but the error looks like the same:
Code: Select all
[  110.411128] xilibus: xilibus core init
[  110.443063] xilibus: xillybus_pcie_init
[  110.443107] xilibus pcie: xilly_probe
[  110.443129] pci 0000:00:00.0: enabling device (0106 -> 0107)
[  110.443144] xilibus pcie: mem start 0x80000000, end 0x8000007f, len 128
[  110.443158] xilibus pcie: map BAR 0 (first 128 bytes)
[  110.443266] xilibus pcie: DMA set mask 32
[  110.443272] xilibus: xillybus_endpoint_discovery
[  110.443278] xilibus: setup channels. Channel count 0
[  110.443285] xilibus: [0] is_writebuf 1,channelnum 0, bufsize 1024,bufnum 1
[  110.443296] xilibus: dma_addr low 0x7977000
[  110.443300] xilibus: dma_addr high 0x0
[  110.542474] xillybus_pcie 0000:01:00.0: No response from FPGA. Aborting.
tsegorah
 
Posts: 7
Joined:

Next

Return to Xillybus

cron