Linux RT crash while handling an interrupt

This forum is for users of Microchip MPUs and who are interested in using Linux OS.

Moderator: nferre

pawel8542
Posts: 6
Joined: Thu Jul 21, 2016 1:33 pm

Linux RT crash while handling an interrupt

Mon Sep 14, 2020 9:13 pm

Hello

I am using a 4.14-rt kernel compiled with PREEMPT_RT_FULL, on SAMA5D44 processor.
I am using interrupts from GPIO. Additionally, I wrote my own module for counters (TC).

In counters module I am registering interrupts as follows:

Code: Select all

TCB[id].Irq = platform_get_irq (pdev, 0);
if (TCB [id] .Irq <0)
   return -EINVAL;

result = request_irq (TCB [id] .Irq,
    tcbIrq, // The pointer to the handler function below
    IRQF_TIMER, // without threading, good performace but sometimes hang
    // IRQF_NO_SUSPEND | IRQF_NOBALANCING, // threaded, weak performance, but do not hang
    "tcb_irq", // Used in / proc / interrupts to identify the owner
    & TCB[id]);
Inside the tcbIrq() function is tcbRunNotify() function and it sometimes call:
sysfs_notify_dirent(tcb_kernfs_node);

Sometimes when handling an interrupt from gpio (a separate thread is created for this) we need to handle the interrupt from TCB.
When I additionally need to call sysfs_notify_dirent() the system exits with the exception of the udefined instruction.
Important part of callstack is shown in the screenshot below. Elsewhere, I confirmed that this happens when handling irq from gpiolib (not shown in the screenshot).
Is there any other way to pass information to sysfs to prevent the system from crashing? Maybe the kernel should be compiled a bit differently?
Any idea would be appreciated.

regards
Paweł

Image
blue_z
Location: USA
Posts: 2115
Joined: Thu Apr 19, 2007 10:15 pm

Re: Linux RT crash while handling an interrupt

Tue Sep 15, 2020 3:48 am

Your post is full of misuse of jargon/terminology.
There's minor stuff like "SAMA5D44 processor" (it's a SoC, the Cortex-A5 is the processor) and "module" (not every module is a kernel driver).
But then there's mention of "crash", "hang", and "system exit" (that's a new one!) that makes understanding your post difficult.
Does each of those terms refer to different symptoms or the same kernel oops?

pawel8542 wrote: Important part of callstack is shown in the screenshot below.
Whatever that screenshot is, you haven't properly identified it, and I don't recognize it.
If there was a kernel oops or panic, then post the stacktrace.
See Bug hunting

pawel8542 wrote: Is there any other way to pass information to sysfs to prevent the system from crashing?
Are you trying to turn this into a XY problem?

Regards
pawel8542
Posts: 6
Joined: Thu Jul 21, 2016 1:33 pm

Re: Linux RT crash while handling an interrupt

Tue Sep 15, 2020 3:53 pm

blue_z wrote:
Tue Sep 15, 2020 3:48 am
Your post is full of misuse of jargon/terminology.
Of course, you are right. You have to make the forum readable for everyone, and I was in a hurry. Sorry.

blue_z wrote:
Tue Sep 15, 2020 3:48 am
If there was a kernel oops or panic, then post the stacktrace.
Unfortunately, I can't see any stacktrace after the problem occurs.
I think I have it active in the kernel configuration:
CONFIG_STACKTRACE = y
Below is the stacktrace from a hardware debugger. I can't present it in the same form as you showed in the link.
However, I can determine the values of all functions arguments, local variables, call lines, etc.

Code: Select all

__loop_delay(asm)
panic()
__die(inline)
die()
arm_notify_die()
uaccess_restore(inline)
do_undefinstr()
__und_svc_fault(asm)
exception
rt_spin_lock_slowlock_locked()
arch_local_irq_restore(inline)
rt_spin_lock_slowlock()
rt_spin_lock()
kernfs_notify()
tcbRunNotify()
tcbProccessIrq()
tcbIrq()
__read_once_size(inline)
static_key_count(inline)
static_key_false(inline)
trace_irq_handler_exit(inline)
__handle_irq_event_percpu()
handle_irq_event_percpu()
handle_irq_event()
cond_unmask_eoi_irq(inline)
handle_fasteoi_irq()
generic_handle_irq()
__handle_domain_irq()
aic5_handle()
__irq_svc(asm)
exception
try_to_wake_up()
wake_up_process()
rt_unlock_idle_list(inline)
wake_up_worker()
__need_more_worker(inline)
insert_work()
list_empty(inline)
__censored_work()
censored_work_on()
censored_work(inline)
schedule_work(inline)
kernfs_notify()
gpio_sysfs_irq()
irq_finalize_oneshot(inline)
irq_forced_thread_fn()
irq_thread()
kthread()
ret_from_fork(asm)
ret_fast_syscall(asm)
blue_z wrote:
Tue Sep 15, 2020 3:48 am
Are you trying to turn this into a XY problem?
ok - let's try this way.
I have my own TCB driver. The driver performs some simple actions when a timer interrupt occurs. I need precise time measurement so I don't want to do a separate thread to handle this interrupt. After finishing some part of work, I want to inform the application working in user space about it.
How to do it?
Sending a notification to the application is no time-critical. The delay does not bother me.
It is important that interrupt handler is executed quickly.

best regards
Paweł
blue_z
Location: USA
Posts: 2115
Joined: Thu Apr 19, 2007 10:15 pm

Re: Linux RT crash while handling an interrupt

Wed Sep 16, 2020 1:52 am

pawel8542 wrote: Below is the stacktrace from a hardware debugger.
The call sequence tcbRunNotify() -> kernfs_notify() -> rt_spin_lock() indicates the problem.
Apparently you have not accounted for the ramifications of using the RT patchset.
Even though kernfs_notify() is documented as "callable from any context", that claim is presumably only for the mainline kernel (i.e. without the RT patchset).

Because you are using PREEMPT_RT_FULL, a non-threaded interrupt handler must not use spin_locks.
With PREEMPT_RT_FULL a spin_lock is really a mutex, which implies a sleepable thread.
Your invocation of kernfs_notify(), which normally calls spin_lock_irqsave(), now has the effect of calling rt_spin_lock() from an interrupt context that cannot sleep.
Digging into the undefined-instruction exception is probably a waste of time since your "own module for counters" clearly does not conform to PREEMPT_RT_FULL restrictions.

pawel8542 wrote: I have my own TCB driver. The driver performs some simple actions when a timer interrupt occurs. I need precise time measurement so I don't want to do a separate thread to handle this interrupt. After finishing some part of work, I want to inform the application working in user space about it.
How to do it?
kernfs_notify() cannot be called from a RT non-threaded interrupt handler, but presumably is callable from a threaded interrupt handler, e.g. a bottom half of a two-part TCB driver.
Apparently you can use raw_spin_lock() (which would not be converted to a mutex by the RT patch) to protect the critical region (a single word or circular buffer?) shared by the timer driver halves to pass the timing information.

Regards
pawel8542
Posts: 6
Joined: Thu Jul 21, 2016 1:33 pm

Re: Linux RT crash while handling an interrupt

Thu Sep 17, 2020 11:55 am

blue_z wrote:
Wed Sep 16, 2020 1:52 am
kernfs_notify() cannot be called from a RT non-threaded interrupt handler, but presumably is callable from a threaded interrupt handler, e.g. a bottom half of a two-part TCB driver.
Thank you for this idea.
Finally, I did it the following way.
I declared:

Code: Select all

void tcbRunNotifyTask(unsigned long unused){
	tcbRunNotify();
}
DECLARE_TASKLET(tcbRunNotifyTaskName, tcbRunNotifyTask, 0);
And direct call of tcbRunNotify() was changed to:

Code: Select all

tasklet_schedule(&tcbRunNotifyTaskName);

The problem appears to be resolved. At least my test passes correctly. Thank you.

best regards
Paweł

Return to “LINUX”

Who is online

Users browsing this forum: Google [Bot] and 14 guests