Beaglebone Black System Crash using SocketCAN

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Beaglebone Black System Crash using SocketCAN

BC-OSU
Hello all,

I'm having some strange issues with SocketCAN on a Beaglebone Black. The issue seems to go deep down into kernel or drivers possibly but this is far out of my knowledge so I'm looking on some possible help here.

I'm trying to write a program that intercepts CAN traffic on a CAN bus and then sends a simple CAN frame as well every second or so. I'm however getting odd and unpredictable crashes during runtime. The code I'm using to read in messages is identical to the code I found for the can-utils "cangen" program. I'm usually able to cause the system crash pretty easily by loading the CAN bus hard (80% or 6500 msgs per second) but sometimes this amount of bus load works fine for a lengthy amount of time (20 mins). I however I think bus load is not the cause of this crash. I can be pushing the CAN bus fairly easily and it will still sometimes crash the Beaglebone Black.  I've been able to get a error report of the crash by using serial. Here's the system output:

[  864.029581] BUG: soft lockup - CPU#0 stuck for 22s! [main:1119]
[  864.036222] BUG: scheduling while atomic: main/1119/0x40010100
[  864.045930] Kernel panic - not syncing: softlockup: hung tasks
[  864.052066] [<c0013598>] (unwind_backtrace+0x0/0xe0) from [<c061b230>] (panic+0x84/0x1e0)
[  864.060643] [<c061b230>] (panic+0x84/0x1e0) from [<c0093200>] (watchdog_timer_fn+0x120/0x164)
[  864.069587] [<c0093200>] (watchdog_timer_fn+0x120/0x164) from [<c005cf2c>] (__run_hrtimer+0xec/0x1e4)
[  864.079253] [<c005cf2c>] (__run_hrtimer+0xec/0x1e4) from [<c005d960>] (hrtimer_interrupt+0x108/0x25c)
[  864.088924] [<c005d960>] (hrtimer_interrupt+0x108/0x25c) from [<c00247c0>] (omap2_gp_timer_interrupt+0x20/0x30)
[  864.099501] [<c00247c0>] (omap2_gp_timer_interrupt+0x20/0x30) from [<c0093c20>] (handle_irq_event_percpu+0x60/0x214)
[  864.110530] [<c0093c20>] (handle_irq_event_percpu+0x60/0x214) from [<c0093e10>] (handle_irq_event+0x3c/0x5c)
[  864.120832] [<c0093e10>] (handle_irq_event+0x3c/0x5c) from [<c0096810>] (handle_level_irq+0xd4/0xec)
[  864.130405] [<c0096810>] (handle_level_irq+0xd4/0xec) from [<c0093658>] (generic_handle_irq+0x20/0x30)
[  864.140158] [<c0093658>] (generic_handle_irq+0x20/0x30) from [<c000e15c>] (handle_IRQ+0x64/0x8c)
[  864.149366] [<c000e15c>] (handle_IRQ+0x64/0x8c) from [<c0008760>] (omap3_intc_handle_irq+0x60/0x74)
[  864.158847] [<c0008760>] (omap3_intc_handle_irq+0x60/0x74) from [<c0623280>] (__irq_svc+0x40/0x50)
[  864.168230] Exception stack(0xde29fd40 to 0xde29fd88)
[  864.173524] fd40: 00008000 00000044 fa1d0000 fa1d00b0 de31d540 00000010 00000022 c099c3e4
[  864.182094] fd60: de31d000 c09cb80c c09cd640 c094c0c0 00745a2f de29fd88 bf007014 bf00e018
[  864.190658] fd80: 40000113 ffffffff
[  864.194320] [<c0623280>] (__irq_svc+0x40/0x50) from [<bf00e018>] (c_can_plat_read_reg_aligned_to_16bit+0x18/0x24 [c_can_platform])
[  864.206628] [<bf00e018>] (c_can_plat_read_reg_aligned_to_16bit+0x18/0x24 [c_can_platform]) from [<bf007014>] (c_can_read_reg32+0x14/0x30 [c_can])
[  864.220299] [<bf007014>] (c_can_read_reg32+0x14/0x30 [c_can]) from [<bf008158>] (c_can_poll+0x66c/0x818 [c_can])
[  864.230970] [<bf008158>] (c_can_poll+0x66c/0x818 [c_can]) from [<c053b780>] (net_rx_action+0x6c/0x1bc)
[  864.240727] [<c053b780>] (net_rx_action+0x6c/0x1bc) from [<c0042d30>] (__do_softirq+0xfc/0x22c)
[  864.249845] [<c0042d30>] (__do_softirq+0xfc/0x22c) from [<c0043120>] (irq_exit+0x44/0x84)
[  864.258417] [<c0043120>] (irq_exit+0x44/0x84) from [<c000e160>] (handle_IRQ+0x68/0x8c)
[  864.266715] [<c000e160>] (handle_IRQ+0x68/0x8c) from [<c0008760>] (omap3_intc_handle_irq+0x60/0x74)
[  864.276195] [<c0008760>] (omap3_intc_handle_irq+0x60/0x74) from [<c0623280>] (__irq_svc+0x40/0x50)
[  864.285578] Exception stack(0xde29fe70 to 0xde29feb8)
[  864.290869] fe60:                                     df259010 df259010 04e404e3 00000001
[  864.299438] fe80: 60000013 00000000 00000007 00000000 de698407 df259010 de5b8800 a0000013
[  864.308006] fea0: 00000c1b de29feb8 c0622f70 c0622f74 60000013 ffffffff
[  864.314940] [<c0623280>] (__irq_svc+0x40/0x50) from [<c0622f74>] (_raw_spin_unlock_irqrestore+0x10/0x14)
[  864.324877] [<c0622f74>] (_raw_spin_unlock_irqrestore+0x10/0x14) from [<c035a3b4>] (uart_write+0xc8/0xd4)
[  864.334906] [<c035a3b4>] (uart_write+0xc8/0xd4) from [<c0344e2c>] (n_tty_write+0x238/0x394)
[  864.343660] [<c0344e2c>] (n_tty_write+0x238/0x394) from [<c034236c>] (tty_write+0x184/0x214)
[  864.352505] [<c034236c>] (tty_write+0x184/0x214) from [<c00f5c00>] (vfs_write+0xa8/0x178)
[  864.361077] [<c00f5c00>] (vfs_write+0xa8/0x178) from [<c00f5ee0>] (sys_write+0x38/0x64)
[  864.369466] [<c00f5ee0>] (sys_write+0x38/0x64) from [<c000d880>] (ret_fast_syscall+0x0/0x30)
[  864.378309] drm_kms_helper: panic occurred, switching back to text console

Sometimes I get this error message and a system crash just by stopping my program that is using SocketCAN.

Here's my some pieces of my software in the event it comes down to a simple software error on my part and in can-utils. I've tried reading and writing in various different ways with little difference. I've also tried removing every thing except for recvmsg and the problem still persists.

const int canfd_on = 1;
int keepRunning = 1;

void *txcanthread(int cansock) {
        struct canfd_frame txmsg;
        while(keepRunning)
        {
                sleep(1);
                txmsg.can_id  = 0xAA;
                txmsg.len = 0;
                write(cansock, &txmsg, sizeof(struct canfd_frame));
        }
        return NULL;
}

int main()
{
        fd_set rdfs;
        char ctrlmsg[CMSG_SPACE(sizeof(struct timeval)) + CMSG_SPACE(sizeof(__u32))];
        struct iovec iov;
        struct msghdr msg;
        struct canfd_frame frame;
        int s, nbytes;
        struct sockaddr_can addr;
        struct ifreq ifr;
        s = socket(PF_CAN, SOCK_RAW, CAN_RAW);
        strcpy(ifr.ifr_name, "can0" );
        ioctl(s, SIOCGIFINDEX, &ifr);
        addr.can_family = AF_CAN;
        addr.can_ifindex = 0;
        setsockopt(s, SOL_CAN_RAW, CAN_RAW_FD_FRAMES, &canfd_on, sizeof(canfd_on));

        if(bind(s, (struct sockaddr *)&addr, sizeof(addr)) < 0)
        {
                printf("Problem binding socket");
                return 1;
        }
        pthread_t txthread, logging;
        pthread_create(&txthread, NULL, txcanthread, s);
        pthread_create(&logging, NULL, logthread, NULL);

        unsigned long msgsRecv = 0;
        iov.iov_base = &frame;
        msg.msg_name = &addr;
        msg.msg_iov = &iov;
        msg.msg_iovlen = 1;
        msg.msg_control = &ctrlmsg;

        while(keepRunning){
                FD_ZERO(&rdfs);
                FD_SET(s, &rdfs);
                if (FD_ISSET(s, &rdfs))
                {
                        iov.iov_len = sizeof(frame);
                        msg.msg_namelen = sizeof(addr);
                        msg.msg_controllen = sizeof(ctrlmsg);
                        msg.msg_flags = 0;
                        nbytes = recvmsg(s, &msg, 0);

                        //Don't allow the translation of bad reads
                        if (nbytes < sizeof(struct can_frame))
                        {
                                fprintf(stderr, "read: incomplete CAN frame\n");
                                return 1;
                        }
                        if (nbytes < 0)
                        {
                                perror("Close socket?");
                                close(s);
                                pthread_join(txthread, NULL);
                                pthread_join(logging, NULL);
                                return 1;
                        }
                        if (nbytes > 0)
                        {
                                printf("%lu\n", msgsRecv);
                                msgsRecv++;
                        }
                }
        }

I would greatly appreciate if I can get some feedback on what I can do to resolve this issue

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

reubenp
Not sure if you're still working on this?  Its a bug in the socketcan driver which has been fixed in later versions of the kernel.  I managed to patch mine, and I seem to not have an error anymore.

Install linux headers:
apt-get install linux-headers-$(uname -r)
cd /usr/src/----/include/linux/can/
wget https://raw.githubusercontent.com/torvalds/linux/master/include/linux/can/led.h

mkdir ~/c_can/
cd c_can/

now get files
one commit for c_can_platform.c:
wget https://raw.githubusercontent.com/torvalds/linux/5e946e56231b5c765598cd3103faf3c7a0121812/drivers/net/can/c_can/c_can_platform.c
another for the other two files:
wget https://raw.githubusercontent.com/torvalds/linux/7ee330c7b3b738847bf297912b371bbcec3bc994/drivers/net/can/c_can/c_can.h
wget https://raw.githubusercontent.com/torvalds/linux/7ee330c7b3b738847bf297912b371bbcec3bc994/drivers/net/can/c_can/c_can.c

nano Makefile:

obj-m += c_can.o
obj-m += c_can_platform.o

all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

run make

should give you the .ko files as output.  Now install:
mv c_can.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can.ko
mv c_can.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can_platform.ko

from memory you'll need to also rmmod c_can and c_can_platform, then modprobe them.

Hopefully that works!

Reuben
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

hound
I managed to patch my kernel as you've written.
But, how now can I start can?
Previously I set up can as wrote this:
http://www.embedded-things.com/bbb/enable-canbus-on-the-beaglebone-black/
But now I get error:
Cannot find device "can0"

I use: bone-debian-7.9-lxde-4gb-armhf-2015-11-12-4gb
Can you help me please?
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

reubenp
Could you post the output of running "dmesg" up?  This could be a number of different issues.
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

hound
You 've written in your post:
mv c_can.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can.ko
mv c_can.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can_platform.ko

I try to do that:
mv c_can.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can.ko
mv c_can_platform.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can_platform.ko

Now can0 is successfully starting. But I anyway get error:
<nabble_embed>Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.316738] 7d40: 00000000 07c107c1 00000000 df0f6100 00000000 df246000 c0859758 c0033677 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.325262] 7d60: df247de3 00000002 df246000 00000001 c070cf5c 07c107c1 00000000 c0889f34 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.333793] 7d80: df247ec0 c08d44d8 df247ec0 df246000 c0859758 0000000b df247de3 00000002 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.342317] 7da0: 20000193 c000f96d df246240 0000000b c03676aa 00000020 00000000 00000004 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.350854] 7dc0: 37000000 20393438 31306166 30336620 38312033 28203364 39313838 64362029 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.359377] 7de0: 00203061 00000000 00000000 c04d0c01 c06fee98 df247e14 df246000 00001028 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.367895] 7e00: f9e0b02c df247ec0 c085a184 00000010 c085537c 00000000 00000000 c0008439 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.376427] 7e20: 00000000 00000001 00000007 00000000 00000000 f9e0b02c c04d6967 c04d6967 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.384946] 7e40: c04d6968 60000033 df0f6100 c004d551 ffffffff df0f6100 c0d60fc0 c0849fc0 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.393471] 7e60: df247e80 df08c0c0 de62f040 c04d5c93 00000037 df247eb8 df246000 00000000 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.401998] 7e80: 00000037 c003558f c0849728 c0074a4d c0849fc0 c0849fc0 191f9f36 00000025 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.410519] 7ea0: c0849fc0 c0849fc0 0000745f c03676a2 800000b3 ffffffff df247ef4 c04d6bb5 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.419049] 7ec0: 60000013 0000002c f9e0b000 f9e0b02c df23dc10 00000065 60000013 00004000 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.427578] 7ee0: 00000010 c085537c 00000000 00000000 00000000 df247f08 c04d6991 c03676a2 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.436100] 7f00: 800000b3 ffffffff c036767d df246000 df237220 00000001 df237200 df008840 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.444624] 7f20: c085537c c0074a97 00000000 c0074979 df071db0 00000000 df237200 c00749f9 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.453148] 7f40: 00000000 00000000 00000000 c00454ab 00000000 00000000 00000000 df237200 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.461677] 7f60: 00000000 00000000 dead4ead ffffffff ffffffff df247f74 df247f74 00000001 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.470204] 7f80: 00010001 dead4ead ffffffff ffffffff df247f90 df247f90 00000000 df071db0 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.478735] 7fa0: c0045441 00000000 00000000 c000c8fd 00000000 00000000 00000000 00000000 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.487264] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.495795] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 Message from syslogd@beaglebone at Feb 11 07:56:20 ... kernel:[ 175.616976] Code: 4770 bf00 f8d0 3274 (f853) 0c20 </nabble_embed>

Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

hound
In reply to this post by reubenp
I use sn65hvd232. These errors I'm getting if I don't connect any can devices to CAN.
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

reubenp
What happens if you try to bring can0 up (
sudo ip link set can0 up type can bitrate 500000
sudo ifconfig can0 up
or similar)
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

hound
This post was updated on .
Now it works. I just re-installed OS and did what you've written in your post. But, I've done:
mv c_can.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can.ko
mv c_can_platform.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can_platform.ko

instead of:
mv c_can.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can.ko
mv c_can.ko /lib/modules/3.8.13-bone63/kernel/drivers/net/can/c_can/c_can_platform.ko

Now I connected 2 BBB to each other and I'm trying to send and receive data between them.
I'm not getting any kernel errors now and OS doesn't hang. But! Sometimes data is not sent.
For command 'ip -details -statistics link show can0' i get:
RTNETLINK answers: Device or resource busy
3: can0: <NO-CARRIER,NOARP,UP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT qlen 10
    link/can
    can state BUS-OFF (berr-counter tx 249 rx 0) restart-ms 0
    bitrate 500000 sample-point 0.875
    tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
    c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 24000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          2          15         7        
    RX: bytes  packets  errors  dropped overrun mcast  
    261        37       0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns
    376        47       0       6       0       0      

then I restart can driver:
ifconfig can0 down
ifconfig can0 up

Then data is sent successfully, but after a few times it is not sent again.
why is this happening?
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

hound
I've solved problem. It was just my mistake. I not correct connect 2 BBB to each other.
But, I still don't undarstand how to know about errors in cansocket API.
IMB
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

IMB
Hello,

I am struggling to get can0 P9.24/26 work in my beaglebone black with kernel 3.8.13. I have been dealing with this for many weeks but nothing. I started with 3.8.13-bone47 (where i enabled pru, uart 2 and 5) but as i couldn't install leanux headers i had to upgrade to 3.8.13-bone80.

I followed your steps above and:

I can get the can0 enabled, but when i try to candump/cangen can0, it doesn't work, i get Buss OFF always.

But if i do the same with a virtual can vcan0, candump/candsend work properly.


A couple of more questions.
1) can i connect directly can0 P9.24/26 directly between 2 beaglebones, or do i need transceivers/resistors?
2) could it be possible to test can0 with only 1 beaglebone using candum/cangen, or should i use 2 beaglebone and run candump in one, and cangen in the otherone.

I am quite desperate because i don't know if it doewsn't work because of a bug in the kernel (as I read), or because i am doing something wrong.

Really thanks for your support and apologizes for the inconveniences.

Looking forward hearing from you

Best regards
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

hound
Hello.
1. No, you cannot. You have to use transceivers/resistors without it can0 won't work. But vcan works well.
I used two BBB with transceivers/resistors - SN65HVD232.

Sometimes I get buss off in that case I just restart can0 bus.
Hope I helped you.


14.09.2016, 16:45, "IMB [via Socket-CAN]" <[hidden email]>:

> Hello,
>
> I am struggling to get can0 P9.24/26 work in my beaglebone black with kernel 3.8.13. I have been dealing with this for many weeks but nothing. I started with 3.8.13-bone47 (where i enabled pru, uart 2 and 5) but as i couldn't install leanux headers i had to upgrade to 3.8.13-bone80.
>
> I followed your steps above and:
>
> I can get the can0 enabled, but when i try to candump/cangen can0, it doesn't work, i get Buss OFF always.
>
> But if i do the same with a virtual can vcan0, candump/candsend work properly.
>
> A couple of more questions.
> 1) can i connect directly can0 P9.24/26 directly between 2 beaglebones, or do i need transceivers/resistors?
> 2) could it be possible to test can0 with only 1 beaglebone using candum/cangen, or should i use 2 beaglebone and run candump in one, and cangen in the otherone.
>
> I am quite desperate because i don't know if it doewsn't work because of a bug in the kernel (as I read), or because i am doing something wrong.
>
> Really thanks for your support and apologizes for the inconveniences.
>
> Looking forward hearing from you
>
> Best regards
>
> ----------------------------------------
>
> If you reply to this email, your message will be added to the discussion below:
> http://socket-can.996257.n3.nabble.com/Beaglebone-Black-System-Crash-using-SocketCAN-tp7702p7733.html
>
> To unsubscribe from Beaglebone Black System Crash using SocketCAN, click here.
> NAML

-- 
С уважением,
Василий Близнецов
"Satellite Solutions"
www.satsol.ru
[hidden email]

Tim
Reply | Threaded
Open this post in threaded view
|

Re: Beaglebone Black System Crash using SocketCAN

Tim
Have you terminated your bus?  You must apply resistors across A and B legs.
Bus Off is as a result of excessive errors.  It is meant to disable the offending device on the vehicle CAN bus to prevent disruption to the normal operation.