The preemptive, non-blocking QK kernel is specifically designed to execute non-blocking active objects. QK runs active objects in the same way as prioritized interrupt controller (such as NVIC in ARM Cortex-M) runs interrupts using the single stack (MSP on Cortex-M). This section explains how the preemptive non-blocking QK kernel works on ARM Cortex-M.
The ARM Cortex-M architecture is designed primarily for the traditional real-time kernels that use multiple per-thread stacks. Therefore, implementation of the non-blocking, single-stack kernel like QK is a bit more involved on Cortex-M than other CPUs and works as follows:
PendSV_Handler()
and NMI_Handler()
exception handlers. NOTE: QK uses only the CMSIS-compliant exception and interrupt names, such as
PendSV_Handler
,NMI_Handler
, etc.
NOTE: The QK port specifically does not use the SVC exception (Supervisor Call). This makes the QK ports compatible with various "hypervisors" (such as mbed uVisor or Nordic SoftDevice), which use the SVC exception.
The "synchronous preemption" occurs when one (low-priority) thread is preempted by another (high-priority) thread. QK handles this case as a regular function call. This function call happens inside the QActive post function:
The "asynchronous preemption" occurs when an interrupt posts an event to a higher-priority thread than the currently executing. In ARM Cortex-M, this preemption is hanlded in the PendSV
exception handler:
The QK "activator" enables interrupts and calls the Low-priority thread (a regular C function call). The Low-priority thread (active object) starts running.NOTE: The QK activator must run in the thread context, while PendSV executes in the exception context. The change of the context is accomplished by returning from the PendSV exception directly to the QK "activator". To return directly to the QK activator, PendSV synthesizes an exception stack frame, which contains the exception return address set to QK_activate_().
NOTE: The NMI exception is pended while interrupts are still disabled. This is not a problem, because NMI cannot be masked by disabling interrupts, so it runs without any problems.
The QF header file for the ARM Cortex-M port is located in /ports/arm-cm/qk/gnu/qf_port.h. This file is almost identical to the QV port, except the header file in the QK port includes qk_port.h
header file instead of qv_porth
. The most important function of qk_port.h is specifying interrupt entry and exit.
Listing: qk_port.h header file for ARM Cortex-M
QK_ISR_CONTEXT()
returns true when the code executes in the ISR context and false otherwise. The macro takes advantage of the ARM Cortex-M register IPSR, which is non-zero when the CPU executes an exception (or interrupt) and is zero when the CPU is executing thread code. NOTE: QK needs to distinguish between ISR and thread contexts, because threads need to perform synchronous context switch (when a higher-priority thread becomes ready to run), while ISRs should not do that.
QK_get_IPSR()
obtains the IPSR register and returns it to the caller. This function is defined explicitly for the GNU-ARM toolchain, but many other toolchains provide this function as an intrinsic, built-in facility. QK_ISR_ENTRY()
macro notifies QK about entering an ISR. The macro is empty, because the determination of the ISR vs thread context is performed independently in the QK_ISR_CONTEXT()
macro (see above). QK_ISR_EXIT()
macro notifies QK about exiting an ISR. QK_sched_()
function returns non zero value if this is the case. NOTE: Because the priority of the PendSV exception is the lowest of all interrupts, it is actually triggered only after all nested interrupts exit. The PendSV exception is then entered through the efficient tail-chaining process, which eliminates the restoring and re-entering the interrupt context.
The QK port to ARM Cortex-M requires coding the PendSV and NMI exceptions in assembly. This ARM Cortex-M-specific code, as well as QK initialization (QK_init()
) is located in the file ports/arm-cm/qk/gnu/qk_port.c
qk_port.s
contains common code for all Cortex-M variants (Architecture v6M and v7M) as well as options with and without the VFP. The CPU variants are distinguished by conditional compilation, when necessary.Listing: QK_init() function in qk_port.c file
5 Exception priorities of SysTick, PendSV and Debug are set to QF_BASEPRI.
NOTE: the exception priority of PedSV is later changed to 0xFF in step [8]
Listing: PendSV_Handler() and Thread_ret() functions in qk_port.c file
naked
means that the GNU-ARM compiler won't generate any entry/exit code for this function. PendSV_Handler
is a CMSIS-complinat name of the PendSV exception handler. The PendSV_Handler
exception is always entered via tail-chaining from the last nested interrupt. For the ARMv6-M architecture (Cortex-M0/M0+)...
Otherwise, for the ARMv7-M architecture (Cortex-M3/4/7) and when the __ARM_FP
macro is defined...
NOTE: The symbol
__ARM_FP
is defined by the GNU-ARM compiler when the compile options indicate that the ARM FPU is used.
NOTE: In the presence of the FPU (Cortex-M4F/M7), the EXC_RETURN[4] bit carries the information about the stack frame format used, whereas EXC_RETURN[4] ==0 means that the stack contains room for the S0-S15 and FPSCR registers in addition to the usual R0-R3,R12,LR,PC,xPSR registers. This information must be preserved, in order to properly return from the exception at the end.
NOTE: The value moved to BASEPRI must be identical to the QF_BASEPRI macro defined in
qf_port.h
.
NOTE: The PendSV exception handler can be preempted by an interrupt, which might pend PendSV exception again. This would trigger PendSV incorrectly again immediately after calling QK activator.
The following code [14-23] fabricates an exception stack frame, to perform an exception-return to the QK activator without destroying the original exception stack frame of the PendSV exception. This is necessary to preserve the context of the preempted code.
QK_activate_()
is loaded into r2. This will be pushed to the stack as the PC register value. QK_activate_()
in r2 is adjusted to be half-word aligned instead of being an odd THUMB address. NOTE: This is necessary, because the value will be loaded directly to the PC, which cannot accept odd values.
Thread_ret()
function is loaded into r1. This will be pushed to the stack as the lr register value. NOTE: The address of the
Thread_ret
label must be a THUMB address, that is, the least-significant bit of this address must be set (this address must be odd number). This is essential for the correct return of the QK activator with setting the THUMB bit in the PSR. Without the LS-bit set, the ARM Cortex-M CPU will clear the T bit in the PSR and cause the Hard Fault. The GNU-ARM assembler/linker will synthesize the correct THUMB address of the svc_ret label only if this label is declared with the.type Thread_ret , function
attribute (see step [23]).
Hi memory (optionally S0-S15, FPSCR), if EXC_RETURN[4]==0 xPSR pc (interrupt return address) lr r12 r3 r2 r1 r0 EXC_RETURN (pushed in step [7] if FPU is present) old SP --> "aligner" (pushed in step [7] if FPU is present) xPSR == 0x01000000 PC == QK_activate_ lr == Thread_ret r12 don't care r3 don't care r2 don't care r1 don't care SP --> r0 don't care Low memory
21-22 The special exception-return value 0xFFFFFFF9 is synthesized in r0 (two instructions are used to make the code compatible with Cortex-M0, which has no barrel shifter).
NOTE: the r0 register is used instead of lr because the Cortex-M0 instruction set cannot manipulate the higher-registers (r9-r15). NOTE: The exception-return value is consistent with the synthesized stack-frame with the lr[4] bit set to 1, which means that the FPU registers are not included in this stack frame.
23 PendSV exception returns using the special value of the r0 register of 0xFFFFFFF9 (return to Privileged Thread mode using the Main Stack pointer). The synthesized stack frame causes actually a function call to QK_sched_ function in C.
NOTE: The return from the PendSV exception just executed switches the ARM Cortex-M core to the Privileged Thread mode. The QK_sched_ function internally re-enables interrupts before launching any thread, so the threads always run in the Thread mode with interrupts enabled and can be preempted by interrupts of any priority. NOTE: In the presence of the FPU, the exception-return to the QK activator does not change any of the FPU status bit, such as CONTROL.FPCA or LSPACT.
Thread_ret
function is the place, where the QK activator QK_activate_()
returns to, because this return address is pushed to the stack in step [16]. Please note that the address of the Thread_ret
label must be a THUMB address. NOTE: Clearing the CONTROL.FPCA bit occurs with interrupts disabled, so it is protected from a context switch.
Listing: NMI_Handler() function in qk_port.c file
NMI_Handler
is the CMSIS-compliant name of the NMI exception handler. This exception is triggered after returning from the QK activator in step [31] of the previous listing. The job of NMI is to discard its own stack frame and cause the exception-return to the original preempted thread context. The stack contents just after entering NMI is shown below: Hi memory (optionally S0-S15, FPSCR), if EXC_RETURN[4]==0 xPSR pc (interrupt return address) lr r12 r3 r2 r1 r0 old SP --> EXC_RETURN (pushed in PendSV [7] if FPU is present) "aligner" (pushed in PendSV [7] if FPU is present) xPSR don't care PC don't care lr don't care r12 don't care r3 don't care r2 don't care r1 don't care SP --> r0 don't care Low memory
The ARM Cortex-M CPU is designed to use regular C functions as exception and interrupt service routines (ISRs).
__attribute__((__interrupt__))
designation that will guarantee the 8-byte stack alignment.Typically, ISRs are application-specific (with the main purpose to produce events for active objects). Therefore, ISRs are not part of the generic QP port, but rather part of the BSP (Board Support Package).
The following listing shows an example of the SysTick_Handler()
ISR (from the DPP example application). This ISR calls the QF_TICK_X()
macro to perform QF time-event management.
QK_ISR_ENTRY()
before calling any QP API QK_ISR_EXIT()
right before exiting to let the QK kernel schedule an asynchronous preemption, if necessary. If you have the Cortex-M4F CPU and your application uses the hardware FPU, it should be enabled because it is turned off out of reset. The CMSIS-compliant way of turning the FPU on looks as follows:
SCB->CPACR |= (0xFU << 20);
Depending on wheter or not you use the FPU in your ISRs, the "Vanilla" QP port allows you to configure the FPU in various ways, as described in the following sub-sections.
If you use the FPU only at a single thread (active object) and none of your ISRs use the FPU, you can setup the FPU not to use the automatic state preservation and not to use the lazy stacking feature as follows:
FPU->FPCCR &= ~((1U << FPU_FPCCR_ASPEN_Pos) | (1U << FPU_FPCCR_LSPEN_Pos));
With this setting, the Cortex-M4F processor handles the ISRs in the exact-same way as Cortex-M0-M3, that is, only the standard interrupt frame with R0-R3,R12,LR,PC,xPSR is used. This scheme is the fastest and incurs no additional CPU cycles to save and restore the FPU registers.
If you use the FPU in more than one of the threads (active objects) or in any of your ISRs, you should setup the FPU to use the automatic state preservation and the lazy stacking feature as follows:
FPU->FPCCR |= (1U << FPU_FPCCR_ASPEN_Pos) | (1U << FPU_FPCCR_LSPEN_Pos);
This is actually the default setting of the hardware FPU and is recommended for the QK port, because it is safer in view of code evolution. Future changes to the application can easily introduce FPU use in multiple active objects, which would be unsafe if the FPU context was not preserved automatically.
QK can very easily detect the situation when no events are available, in which case QK calls the QK_onIdle()
callback. You can use QK_onIdle()
to suspended the CPU to save power, if your CPU supports such a power-saving mode. Please note that QK_onIdle()
is called repetitively from an endless loop, which is the QK idle-thread. The QK_onIdle()
callback is called with interrupts enabled (which is in contrast to the QV_onIdle() callback used in the non-preemptive configuration).
The THUMB-2 instruction set used exclusively in ARM Cortex-M provides a special instruction WFI (Wait-for-Interrupt) for stopping the CPU clock, as described in the "ARMv7-M Reference Manual" [ARM 06a]. The following listing shows the QK_onIdle()
callback that puts ARM Cortex-M into a low-power mode.
Listing: QV_onIdle() for ARM Cortex-M
QK_onIdle()
callback with interrupts enabled. WFI
instruction is generated using inline assembly. The bsp.c
file included in the examples/arm-cm/dpp_ek-tm4c123gxl/qk directory contains special instrumentation (an ISR designed for testing) for convenient testing of various preemption scenarios in QK.
The technique described in this section will allow you to trigger an interrupt at any machine instruction and observe the preemption it causes. The interrupt used for the testing purposes is the GPIOA interrupt (INTID == 0). The ISR for this interrupt is shown below:
GPIOPortA_IRQHandler()
, as all interrupts in the system, invokes the macros QK_ISR_ENTRY() and QK_ISR_EXIT(), and also posts an event to the Table active object, which has higher priority than any of the Philo active object.
The figure below hows how to trigger the GPIOA interrupt from the CCS debugger. From the debugger you need to first open the register window and select NVIC registers from the drop-down list (see right-bottom corner of Figure 6).You scroll to the NVIC_SW_TRIG register, which denotes the Software Trigger Interrupt Register in the NVIC. This write-only register is useful for software-triggering various interrupts by writing various masks to it. To trigger the GPIOA interrupt you need to write 0x00 to the NVIC_SW_TRIG by clicking on this field, entering the value, and pressing the Enter key.
The general testing strategy is to break into the application at an interesting place for preemption, set breakpoints to verify which path through the code is taken, and trigger the GPIO interrupt. Next, you need to free-run the code (don’t use single stepping) so that the NVIC can perform prioritization. You observe the order in which the breakpoints are hit. This procedure will become clearer after a few examples.
The first interesting test is verifying the correct tail-chaining to the PendSV exception after the interrupt nesting occurs, as shown in Synchronous Preemption in QK. To test this scenario, you place a breakpoint inside the GPIOPortA_IRQHandler()
and also inside the SysTick_Handler()
ISR. When the breakpoint is hit, you remove the original breakpoint and place another breakpoint at the very next machine instruction (use the Disassembly window) and also another breakpoint on the first instruction of the QK_PendSV
handler. Next you trigger the PIOINT0 interrupt per the instructions given in the previous section. You hit the Run button.
The pass criteria of this test are as follows:
GPIOPortA_IRQHandler()
function, which means that GPIO ISR preempted the SysTick ISR.SysTick_Handler()
, which means that the SysTick ISR continues after the PIOINT0 ISR completes.PendSV_Handler()
exception handler, which means that the PendSV exception is tail-chained only after all interrupts are processed. You need to remove all breakpoints before proceeding to the next test.The next interesting test is verifying that threads can preempt each other. You set a breakpoint anywhere in the Philosopher state machine code. You run the application until the breakpoint is hit. After this happens, you remove the original breakpoint and place another breakpoint at the very next machine instruction (use the Disassembly window). You also place a breakpoint inside the GPIOPortA_IRQHandler()
interrupt handler and on the first instruction of the PendSV_Handler()
handler. Next you trigger the GPIOA interrupt per the instructions given in the previous section. You hit the Run button.
The pass criteria of this test are as follows:
GPIOPortA_IRQHandler()
function, which means that GPIO ISR preempted the Philo thread.PendSV_Handler()
exception handler, which means that the PendSV exception is activated before the control returns to the preempted Philosopher thread.PendSV_Handler()
, you single step into QK_activate_()
. You verify that the activator invokes a state handler from the Table state machine. This proves that the Table thread preempts the Philo thread.In order to test the FPU, the Board Support Package (BSP) for the Cortex-M4F EK-TM4C123GXL board uses the FPU in the following contexts:
QK_onIdle()
callback (QP priority 0)BSP_random()
function called from all five Philo active objects (QP priorities 1-5).BSP_displayPhiloStat()
function called from the Table active object (QP priorty 6)SysTick_Handler()
ISR (priority above all threads)To test the FPU, you could step through the code in the debugger and verify that the expected FPU-type exception stack frame is used and that the FPU registers are saved and restored by the "lazy stacking feature" when the FPU is actually used.
Next, you can selectively comment out the FPU code at various levels of priority and verify that the QK context switching works as expected with both types of exception stak frames (with and without the FPU).
Other interesting tests that you can perform include changing priority of the GPIOA interrupt to be lower than the priority of SysTick to verify that the PendSV is still activated only after all interrupts complete.
In yet another test you could post an event to Philosopher active object rather than Table active object from the GPIOPortA_IRQHandler()
function to verify that the QK activator will not preempt the Philosopher thread by itself. Rather the next event will be queued and the Philosopher thread will process the queued event only after completing the current event processing.