Communication errors uccur sometimes when sending message
I've been testing a simple communication on 2-node bus with a PC with Kvaser Memorator Pro 2xHs v2 as the first node and Altera Cyclone V SoC with CTU_CAN_FD IP instantiated in FPGA as the second.
Setup on both nodes:
# ip link set can0 up type can \
bitrate 1000000 \
dbitrate 1000000 \
fd on
The problem occurs sometimes (it's relatively easy to achieve) when CTUCANFD node tries to send a message on the bus. I get errors from driver in kernel log and communication is stuck.
Sometimes it doesn't trigger on the first try, so play with messages like this:
[CTUCANFD NODE] # cansend can0 012#00.cc.ee
And it's correctly received on the other node:
[KVASER NODE] # candump can1
can1 012 [3] 00 CC EE
Then after some time for some unknown reason when I try to send another message (it can be FD or normal), I get errors from the driver.
[CTUCANFD NODE] # cansend can0 012##100.cc.ee
However cansend returns 0 and no message is received by other node on the bus.
Kernel log on CTUCANFD NODE:
[ 338.750082] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 96, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.763340] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.769160] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000014, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.782487] ctucanfd c0041000.ctu_can_fd can0: Fault conf: state = 3
[ 338.788985] ctucanfd c0041000.ctu_can_fd can0: bus_off
[ 338.795853] ctucanfd c0041000.ctu_can_fd can0: bus-off
[ 338.800995] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.814323] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.820136] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.833460] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.839272] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.852596] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.858408] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.871732] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.877545] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.890870] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.896683] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.910007] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.915817] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.929142] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.934953] ctucanfd c0041000.ctu_can_fd can0: ctucan_err_interrupt: ISR = 0x00000004, rxerr 256, txerr 0, error type 0, pos 2, ALC id_field 0, bit 0
[ 338.948277] ctucanfd c0041000.ctu_can_fd can0: error_warning
[ 338.976765] ctucanfd c0041000.ctu_can_fd can0: ctucan_interrupt: stuck interrupt (isr=0x00000004), stopping
Interface stats:
[CTUCANFD NODE] # ip -d -s link show can0
4: can0: <NO-CARRIER,NOARP,UP,ECHO> mtu 72 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can <FD> state ERROR-WARNING (berr-counter tx 0 rx 256) restart-ms 0
bitrate 1000000 sample-point 0.740
tq 20 prop-seg 18 phase-seg1 18 phase-seg2 13 sjw 1
ctu_can_fd: tseg1 2..190 tseg2 1..63 sjw 1..31 brp 1..8 brp-inc 1
dbitrate 1000000 dsample-point 0.740
dtq 20 dprop-seg 18 dphase-seg1 18 dphase-seg2 13 dsjw 1
ctu_can_fd: dtseg1 2..94 dtseg2 1..31 dsjw 1..31 dbrp 1..2 dbrp-inc 1
clock 50000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 10000 0 1 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
80028 10007 0 9000 0 0
TX: bytes packets errors dropped carrier collsns
24 7 0 0 0 0
On Kvaser node, there are some errors detected as well:
[KVASER NODE] # ip -d -s link show can1
13: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 32) restart-ms 0
bitrate 1000000 sample-point 0.750
tq 12 prop-seg 29 phase-seg1 30 phase-seg2 20 sjw 1
kvaser_usb_kcan: tseg1 1..255 tseg2 1..32 sjw 1..16 brp 1..4096 brp-inc 1
dbitrate 1000000 dsample-point 0.750
dtq 12 dprop-seg 29 dphase-seg1 30 dphase-seg2 20 dsjw 1
kvaser_usb_kcan: dtseg1 1..255 dtseg2 1..32 dsjw 1..16 dbrp 1..4096 dbrp-inc 1
clock 80000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 117 0 1 1 1 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
959 126 117 0 0 0
TX: bytes packets errors dropped carrier collsns
4 4 0 2 0 0
From now communication is stuck unless I manually disable and enable the failing interface on CTUCANFD node, but even after that, every further message sent from CTUCANFD node triggers this error (even if I reset all interfaces, not just the failing).