modm API documentation
ARM Cortex-M Fault Reporters

Classes

class  modm::platform::FaultReporter
 

Functions

void modm_hardfault_entry ()
 

Detailed Description

lbuild module: modm:platform:fault

This module manages data storage for core dumps provided by the modm:crashcatcher module to investigate HardFault events via offline post-mortem debugging. The data is stored in the volatile memory designated for the heap.

This works as follows:

    >
  1. A HardFault occurs and is intercepted by CrashCatcher.
  2. CrashCatcher calls into this module to store the core dump in the heap as defined by the linkerscript's .table.heap section, thus effectively overwriting the heap, then reboots the device.
  3. On reboot, only the remaining heap memory is initialized, leaving the core dump data intact.
  4. The application has no limitations other than a reduced total heap size! It may access the report data at any time and use all hardware to send out this report.
  5. After the application clears the report and reboots, the heap will once again be fully available.

Restrictions on HardFault Entry

A HardFault is a serious bug and should it happen your application is most likely compromised in some way. Here are some important points to take note of.

    >
  1. The HardFault has a hardcoded priority of -1 and only the NMI and the Reset exceptions have a higher priority (-2 and -3). This means ALL device interrupts have a LOWER priority!
  2. The HardFault is a synchronous exception, it will NOT wait for anything to complete, especially not the currently executing interrupt (if any).
  3. There are many reasons for the HardFault exception to be raised (e.g. accessing invalid memory, executing undefined instructions, dividing by zero) making it very difficult to recover in a generic way. It is therefore reasonable to abandon execution (=> reboot) rather than resuming execution in an increasingly unstable application.

On HardFault entry, this module calls the function modm_hardfault_entry() which can be overwritten by the application to put the devices hardware in a safe mode. This can be as simple as disabling power to external components, however, its execution should be strictly time bound and NOT depend on other interrupts completing (they won't), which will cause a deadlock.

{
Board::MotorDrivers::disable();
// return from this function as fast as possible
}

After this function returns, this module will generate the coredump into the heap and reboot the device.

Reporting the Fault

In order to recover from the HardFault the device is rebooted with a smaller heap. Once the main() function is reached, the application code should check for FaultReporter::hasReport() and then only initialize the bare minimum of Hardware to send this report to the developer.

To access the report, use the FaultReporter::begin() and FaultReporter::end() functions which return a const_iterator of the actual core dump data, that can be used in a range-based for loop.

Remember to call FaultReporter::clearAndReboot() to clear the report, reboot the device and reclaim the full heap.

int main()
{
if (FaultReporter::hasReport()) // Check first after boot
{
Application::partialInitialize(); // Initialize only the necessary
reportBegin();
for (const uint8_t data : FaultReporter::buildId())
reportBuildId(data); // send each byte of Build ID
for (const uint8_t data : FaultReporter())
reportData(data); // send each byte of data
reportEnd(); // end the report
FaultReporter::clearAndReboot(); // clear the report and reboot
// never reached
}
// Normal initialization
Application::initialize();
}

The application is able to use the heap, however, depending on the report size (controllable via the report_level option) the heap may be much smaller then normal. Make sure your application can deal with that.

For complex applications which perhaps communicate asynchronously (CAN, Ethernet, Wireless) it may not be possible to send the report in one piece or at the same time. The report data remains available until you reboot, even after you've cleared the report.

int main()
{
const bool faultReport{FaultReporter::hasReport()};
FaultReporter::clear(); // only clear report but do not reboot
Application::initialize();
while (true)
{
doOtherStuff();
if (faultReport and applicationReady)
{
// Still valid AFTER clear, but BEFORE reboot
const auto id = FaultReporter::buildId();
auto begin = FaultReporter::begin();
auto end = FaultReporter::end();
//
Application::sendReport(id, begin, end);
// reboot when report has been fully sent
}
}
}

Coredump via GDB

In case you encounter a HardFault while debugging and you did not include this module or if you simply want to store the current system state for later analysis or to share with other developers, you can simply call the modm_coredump function inside GDB and it will generate a coredump.txt file. Note that this coredump file contains all volatile memories including the heap, so this method is strongly recommended if you can attach a debugger.

Consult your chosen build system module for additional integrations.

Using the Fault Report

The fault report contains a core dump generated by CrashCatcher and is supposed to be used by CrashDebug to present the memory view to the GDB debugger. For this, you must use the ELF file that corresponds to the devices firmware, as well as copy the coredump data formatted as hexadecimal values into a text file, then call the debugger like this:

arm-none-eabi-gdb -tui executable.elf -ex "set target-charset ASCII"
-ex "target remote | CrashDebug --elf executable.elf --dump coredump.txt"

Note that the FaultReporter::buildId() contains the GNU Build ID, which can help you find the right ELF file:

arm-none-eabi-readelf -n executable.elf
Displaying notes found in: .build_id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 59f08f7a37a7340799d9dba6b0c092bc3c9515c5

Post-Mortem Debugging with SCons

The modm:build:scons module provides a few helper methods for working with fault reports. You still need to copy the coredump data manually, however, the firmware selection is automated.

The SCons build system will automatically cache the ELF file for the build id for every firmware upload (using scons artifact). When a fault is reported, you can tell SCons the firmware build id and it will use the corresponding ELF file automatically.

# Copy data into coredump.txt
touch coredump.txt
# Start postmortem debugging of executable with this build id
scons debug-coredump firmware=59f08f7a37a7340799d9dba6b0c092bc3c9515c5

Module Options

modm:platform:fault:report_level: Fault Report Level

This module will try to store as much data as is available in the heap and any leftover data will be discarded. This means the application may not have any heap available after a reboot.

You can control how much data is generated by choosing the right report level:

It is strongly recommended to choose the report level that generates less data than you heap size. The scons size output displays this very prominently, if the Data size is smaller than your Heap size, you're good to use the core+stack+data setting:

Data: 5.2 KiB (26.0% used) = 2285 B static (11.2%) + 3040 B stack (14.8%)
(.bss + .data + .fastdata + .noinit + .stack)
Heap: 14.8 KiB (74.0% available)
(.heap1)

If Heap is smaller than the Data, you may need to switch to using only the core+stack setting:

Data: 11.2 KiB (56.0% used) = 8429 B static (41.2%) + 3040 B stack (14.8%)
(.bss + .data + .fastdata + .noinit + .stack)
Heap: 8.8 KiB (44.0% available)
(.heap1)

Generated with: core+stack+data in [core, core+stack, core+stack+data]

Function Documentation

void modm_hardfault_entry ( )

Called first after a HardFault occurred. Use this to put your hardware in a safe mode, since generating and storing the fault report may take a second or two before rebooting.

Warning
This is executed in the HardFault interrupt directly, which has the highest interrupt priority, so no other interrupt will be able to fire during this time. BEWARE OF DEADLOCKS!!!