CAN too many RX errors - any way to increase throughput?

This forum is for users of Microchip MPUs and who are interested in using Linux OS.

Moderator: nferre

amrbekhit
Posts: 8
Joined: Mon Jun 23, 2014 1:20 pm

CAN too many RX errors - any way to increase throughput?

Tue Oct 07, 2014 4:03 pm

Hello all,

I've got a system designed around the Aria G25 (AT91SAM9G25) SoM. I've got two MCP2515 chips connected to the CPU, each one on one of the SPI ports. The system works fine - both CAN ports work and I can send and receive messages.

Here's my DTS declaration for the SPI ports and the CAN chips:

Code: Select all

      spi0: spi@f0000000 {
        status = "okay";
        cs-gpios = <&pioA 14 0>, <&pioA 7 0>, <0>, <0>;
        
		can0: can@0 {
			compatible = "microchip,mcp2515";
			reg = <0>;
			spi-max-frequency = <10000000>;		// 10Mhz
			clock-names = <&mcp251x_clock>, "mcp251x_clock"; 
			interrupt-parent = <&pioA>; 
			interrupts = <3 0x2>;
		};

        device@1 {
          compatible = "spidev";
          spi-max-frequency = <5000000>;	    // 5 MHz
          reg = <1>;
        };
      };

      spi1: spi@f0004000 {
        status = "okay";
        cs-gpios = <&pioA 8 0>, <0>, <0>, <0>;

		can1: can@0 {
			compatible = "microchip,mcp2515";
			reg = <0>;
			spi-max-frequency = <10000000>;		// 10Mhz
			clock-names = <&mcp251x_clock>, "mcp251x_clock"; 
			interrupt-parent = <&pioA>; 
			interrupts = <2 0x2>;
		};
      };
As suggested in the forum post here (https://groups.google.com/forum/#!msg/a ... 697jxGhqwJ), I've hardcoded the oscillator frequency of the MCP2515's in the driver source code.

Unfortunately, I'm running into a problem where packets are dropped/lost when they are sent in rapid succession. The problem doesn't seem to be number of packets, but rather how fast they are sent. I have a device that sends 9 CAN messages in quick succession every second. The amount of data isn't great, but I lose quite a few of the packets.

As a test, I connected a CAN analyser to my board and transmitted a message every 10ms. By regularly checking the stats in ifconfig, I can tell that no packets are lost. As soon as I start my python code which monitors the CAN bus and processes the data, I start to get dropped packets. Using the command suggested here (http://unix.stackexchange.com/questions ... every-time) for getting total CPU usage, the system is running at about 50% usage, with the CAN analyser sending messages every 10ms and my Python program running.

At 50%, there seems to be enough CPU time to handle more packets, so I'm curious as to why I'm still getting dropped packets. 

What can I do to increase the throughput of the CAN system to minimise data loss?

Thanks,

Amr
blue_z
Location: USA
Posts: 1974
Joined: Thu Apr 19, 2007 10:15 pm

Re: CAN too many RX errors - any way to increase throughput?

Tue Oct 07, 2014 9:59 pm

Your post omits many salient details, like starting with the Linux version.
Is preemption enabled or not?
What I/O scheduler is configured (or tried)?
What is the SPI data rate?
amrbekhit wrote:At 50%, there seems to be enough CPU time to handle more packets, so I'm curious as to why I'm still getting dropped packets.
Apparently you're (mis)using a metric that doesn't pinpoint the problem.
You should be careful when using a metric that is an average over an unspecified time interval.
It's like using a low-bandwidth scope to see if a glitch is happening. You have a tool, but it doesn't have the resolving power to be useful.
CPU load also tells you nothing about interrupt and lock-acquisition latencies.
amrbekhit wrote:What can I do to increase the throughput of the CAN system to minimise data loss?
Check out this paper on what one outfit did to improve Linux CAN performance.
But increasing throughput may not reduce data loss. You should be looking at improving message integrity and reliability.

Regards
amrbekhit
Posts: 8
Joined: Mon Jun 23, 2014 1:20 pm

Re: CAN too many RX errors - any way to increase throughput?

Tue Oct 07, 2014 11:16 pm

Hi blue_z,

Thanks for your response.

- I'm using Linux kernel 3.15.1
- Regarding preemption, the Preemption Model in the kernel is set to "No forced preemption (Server)".
- For IO Schedulers, the Deadline I/O and CFQ I/O schedulers are both built-in to the kernel. The Default I/O scheduler is set to No-op
- The SPI data rate according to the device tree is set to 10MHz

Code: Select all

spi-max-frequency = <10000000>;      // 10Mhz
Is there anything there that particularly stands out?

Thanks for linking that paper - I'll have a look through it.

Amr
blue_z
Location: USA
Posts: 1974
Joined: Thu Apr 19, 2007 10:15 pm

Re: CAN too many RX errors - any way to increase throughput?

Thu Oct 09, 2014 1:38 am

amrbekhit wrote:Is there anything there that particularly stands out?
You might eventually want to try different settings, such as a preemptive kernel.
But that probably won't help solve dropped CAN packets.
You could try using a simpler app than your Python program such as candump CAN utility, and then start instrumenting code to determine where packets are getting dropped.

Regards
BillBoyd
Posts: 17
Joined: Fri Aug 15, 2014 10:24 pm

Re: CAN too many RX errors - any way to increase throughput?

Thu Oct 09, 2014 9:46 pm

It is possible in CAN to get messages on the bus back to back that approach 100% bus loading. This can happen when several CAN nodes are ready to send their messages roughly at the same time and they get sent in a burst according to the message priority (the ID is the priority value)...one right after the last...

The CAN spec does not evenly space out CAN messages. A CAN system needs to be able to take care of this (or not if you have periodic messages and don't mine dropped frames). Even though you can have an average CAN bus loading of let's say 20&, these bursts can surprise you.

I once (twice) worked at a company that used several CAN nodes in a Linux system. I believe they used ioctal calls directly to the CAN controllers to speed things up. Not dropping CAN messages was always a challenge and I think the biggest reason was the various latencies of Linux.
Perhaps you can get some clues from this information.

Bob Boys
ARM

Return to “LINUX”

Who is online

Users browsing this forum: No registered users and 3 guests