Atmel website | ARM Community | AVR freaks | Technical Support
Banner
 FAQ •  Search •  Register •  Login 

All times are UTC + 1 hour [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: AT91SAM9G20 Linux based units fail to run Linux
PostPosted: Tue Oct 05, 2010 12:02 am 
Offline

Joined: Wed Jan 09, 2008 5:09 pm
Posts: 207
Location: Mounds View, MN
I have gotten some units back from the field that failed to boot to Linux. Initially the units were fixed by removing the backup battery and then re-installing the battery. But since we are seeing more units with this failure mode, I started researching the issue.

I have found what is causing the problem and it is repeatable. For some reason, the units that fail have the RTTINCIEN bit set. So when the system attempts to initialize Linux, it cannot assign an IRQ to the PIT. And since the PIT generates the basic timing for Linux, then Linux locks up. If I reboot the failing unit and use u-boot to clear the RTTINCIEN bit, then the unit sucessfully initializes Linux and everything is fine. Pulling the battery fixed this issue because the bit is cleared when VDDBU is lost.

I have done a cursory review of the rtc_at91sam code and there are some situations when the RTTINCIEN bit is set, but I am not sure why it stays set on some of my units in the field. I wonder, though, if maybe Linux should disable all interrupts at the very beginning, instead of assuming that registers are in the proper state.

_________________
Tim Barr
Multitech Inc.


Top
 Profile  
 
 Post subject: Re: AT91SAM9G20/45 Linux based units fail to run Linux
PostPosted: Tue May 15, 2012 8:11 am 
Offline

Joined: Wed May 25, 2011 9:44 am
Posts: 32
Hi,

I'm facing the same issue on my custom board design based on AT91SAMG45-EK. I'm using Linux 3.0.4

==============================

The error is as follows:

Uncompressing Linux... done, booting the kernel.
Linux version 3.0.4 (gcc version 4.2.0 20070413 (prerelease) (CodeSourcery Sourcery G++ Lite 2007q1-10)) #70 Thu Dec 1 09:55:16 IST 2011
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
CPU: VIVT data cache, VIVT instruction cache
Machine: Atmel AT91SAM9M10G45-EK
Ignoring unrecognised tag 0x54410008
Memory policy: ECC disabled, Data cache writeback
Clocks: CPU 400 MHz, master 133 MHz, main 12.000 MHz
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 62720
Kernel command line: console=ttyS0,115200 rootfstype=ext3 root=/dev/mmcblk0p2 ro rootdelay=3 ip=dhcp
PID hash table entries: 1024 (order: 0, 4096 bytes)
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 128MB 128MB = 256MB total
Memory: 255900k/255900k available, 6244k reserved, 0K highmem
Virtual kernel memory layout:
vector : 0xffff0000 - 0xffff1000 ( 4 kB)
fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)
DMA : 0xffa00000 - 0xffe00000 ( 4 MB)
vmalloc : 0xd0800000 - 0xfee00000 ( 742 MB)
lowmem : 0xc0000000 - 0xd0000000 ( 256 MB)
modules : 0xbf000000 - 0xc0000000 ( 16 MB)
.init : 0xc0008000 - 0xc0025000 ( 116 kB)
.text : 0xc0025000 - 0xc03949a0 (3519 kB)
.data : 0xc0396000 - 0xc03b6f80 ( 132 kB)
.bss : 0xc03b6f80 - 0xc03d0310 ( 101 kB)
NR_IRQS:192
AT91: 160 gpio irqs in 5 banks
irq 1: nobody cared (try booting with the "irqpoll" option)
[<c0031124>] (unwind_backtrace+0x0/0xf4) from [<c006a60c>] (__report_bad_irq+0x24/0xb4)
[<c006a60c>] (__report_bad_irq+0x24/0xb4) from [<c006a878>] (note_interrupt+0x1dc/0x240)
[<c006a878>] (note_interrupt+0x1dc/0x240) from [<c00690b0>] (handle_irq_event_percpu+0xa0/0x1ac)
[<c00690b0>] (handle_irq_event_percpu+0xa0/0x1ac) from [<c00691e4>] (handle_irq_event+0x28/0x38)
[<c00691e4>] (handle_irq_event+0x28/0x38) from [<c006b458>] (handle_level_irq+0x80/0xdc)
[<c006b458>] (handle_level_irq+0x80/0xdc) from [<c0068da4>] (generic_handle_irq+0x28/0x30)
[<c0068da4>] (generic_handle_irq+0x28/0x30) from [<c0025030>] (asm_do_IRQ+0x30/0x90)
[<c0025030>] (asm_do_IRQ+0x30/0x90) from [<c002b9b4>] (__irq_svc+0x34/0x60)
Exception stack(0xc0397f00 to 0xc0397f48)
7f00: 00000000 c0396000 20000053 00000000 00000002 c03a7078 00000000 c0398020
7f20: 70004000 c03bbbc0 c0396000 0000000a c039cdc0 c0397f48 c0025040 c00414cc
7f40: 20000053 ffffffff
[<c002b9b4>] (__irq_svc+0x34/0x60) from [<c00414cc>] (__do_softirq+0x3c/0x110)
[<c00414cc>] (__do_softirq+0x3c/0x110) from [<c0025040>] (asm_do_IRQ+0x40/0x90)
[<c0025040>] (asm_do_IRQ+0x40/0x90) from [<c002b9b4>] (__irq_svc+0x34/0x60)
Exception stack(0xc0397f88 to 0xc0397fd0)
7f80: 00008001 20000053 03014584 00000000 c002176c c03b6f80
7fa0: c0020950 c0398020 70004000 41069265 7002006c 00000000 00000000 c0397fd0
7fc0: c039edc0 c00088fc 20000053 ffffffff
[<c002b9b4>] (__irq_svc+0x34/0x60) from [<c00088fc>] (start_kernel+0x19c/0x3bc)
[<c00088fc>] (start_kernel+0x19c/0x3bc) from [<7000803c>] (0x7000803c)
handlers:
[<c0036144>] at91sam926x_pit_interrupt
Disabling IRQ #1
Console: colour dummy device 80x30
console [ttyS0] enabled
Calibrating delay loop...

===============================

I was printing some register values in AT91 Bootstrp code to find out which interrupt is causing the problem. I found out that for me ALR Interrupt in RTC_IMR is being set.

After reading the AT91SAM9G45 datasheet I found out that the System Controller (set of peripherals) generate this interrupt.

What is the exact cause? Can I workaround this by resetting some of the interrupt bits? But how do I find out which bits are required to be reset?

Once the units are in the field then such issues are disastrous. Looking for inputs?

PJ


Top
 Profile  
 
 Post subject: Re: AT91SAM9G20 Linux based units fail to run Linux
PostPosted: Tue May 15, 2012 9:54 am 
Offline

Joined: Wed May 25, 2011 9:44 am
Posts: 32
I did some more testing and it turns out that ALR bit of RTC is not the cause.

The error appeard because of some other interrupt in the IRQ 1 section and it was not RTTINCIEN as I'm observing that.

If we remove the battery and plug it back the board starts booting again and as I'm not observing all the interrupt registers, the actual cause is lost.


Top
 Profile  
 
 Post subject: Re: AT91SAM9G20 Linux based units fail to run Linux
PostPosted: Wed May 16, 2012 7:31 pm 
Offline

Joined: Wed Jan 09, 2008 5:09 pm
Posts: 207
Location: Mounds View, MN
Since pulling the battery fixes the issue, you can probably reduce the IRQs to check to only the systems that are powered by VDDBU.

_________________
Tim Barr
Multitech Inc.


Top
 Profile  
 
 Post subject: Re: AT91SAM9G20/45 Linux based units fail to run Linux
PostPosted: Fri May 18, 2012 12:18 pm 
Offline

Joined: Wed May 25, 2011 9:44 am
Posts: 32
Is is possible to find out from the kernel dump the exact cause of the error?

-Prasant


Top
 Profile  
 
 Post subject: Re: AT91SAM9G20 Linux based units fail to run Linux
PostPosted: Tue May 22, 2012 10:32 am 
Offline

Joined: Wed May 25, 2011 9:44 am
Posts: 32
Hi,

Well it actually turned out to be ALR interrupt in the RTC. Now the workaround is to clear the status register of the RTC in the bootstrap code (that I can do).

So, looks like the AT91_RTC driver in the linux forgets to clear the ALR interrupt before exiting. Is it possible to fix that?

There is still a possibility of other interrupts creating the same issue. How to make it a full proof thing?

-PJ


Top
 Profile  
 
 Post subject: Re: AT91SAM9G20 Linux based units fail to run Linux
PostPosted: Tue May 22, 2012 10:53 pm 
Offline

Joined: Wed Jan 09, 2008 5:09 pm
Posts: 207
Location: Mounds View, MN
Yeah, we ended up adding code in linux to clear some of the interrupt enables during boot up. Seems to not be a problem because the apporpriate drivers set the interruot enables properly during boot. I think the issue with the interrupt enables being in the wrong state is when the unit is powered down without using the shutdown command. Usually the shutdown command leaves the enables in the proper state.

_________________
Tim Barr
Multitech Inc.


Top
 Profile  
 
 Post subject: Re: AT91SAM9G20 Linux based units fail to run Linux
PostPosted: Thu May 24, 2012 6:52 am 
Offline

Joined: Wed May 25, 2011 9:44 am
Posts: 32
Hi Tim,

The unit was shutdown using 'reboot'. So there was a proper shutdown.The RTC driver is buggy, atleast in my case thats the observation.

Yeah as a workaround, we will also reset some of the registers in bootstrap stage.

-PJ


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC + 1 hour [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: