This is the last post in my series (see
part 1 and
part 2) about watchdog timers (WDTs). After that one I will stop bugging you with the topic. But probably you've already got the impression I have a certain weakness for WDTs. Anyway let's take this to the next level and find out how to improve the solution from my last post even further.
Right now we have our application directly pinging the watchdog hardware i.e. there's a watchdog module/component linked into the main executable doing all the things required to prevent a timeout. That was easy to do because it only required us to insert two function calls for (de)initialization and one for pinging inside our mainloop. Unfortunately it also introduced a tight coupling between the main application and the watchdog handling logic (see figure 1).
Other drawbacks of this approach are:
- loss of flexibility and reuse (OK, reuse is possible but only on the code-level)
- adding support for new WDT hardware requires rebuilding the application or may have other side effects
- it's harder to verify/test the watchdog handling logic
To overcome these drawbacks we put the watchdog handling logic into a separate executable and introduce a small and light-weight layer that wraps the interprocess communication (
IPC) which is now required. I'll leave the choice of an appropriate IPC API to you. It should be non-blocking and easy to use (
SOAP is probably a bad idea). Certainly we've added slightly more complexity to the big picture (see figure 2) but at the same time it removes any dependencies from the main application to the WDT hardware and it can be run even when there is no watchdog present at all.
Having done these architectural changes we're now ready to introduce more logic to improve the WDT behavior. We can even add a user interface to the newly created watchdog application. Please note, that the term watchdog application/process refers now to a piece of software sitting on top of the WDT hardware.
The nonobvious outcome of inserting a watchdog process between the main app and the WDT hardware is that we're getting an additional high-level fallback layer without degrading the original WDT functionality. Both applications + OS remain under WDT control (remember this is a piece of hardware hooked up to the reset line). This gives us more opportunities to take action when the main app stops pinging the watchdog process e.g. we could try to kill and restart the main application process once and ultimately initiate a shutdown sequence which is healthier than the hard reset. And that's not all! Further enhancements may include:
- Restore the factory settings of the main application after a reboot in order to undo user changes (works only if the watchdog process starts first).
- Log all actions / errors or display a warning message on a local display. I recommend to use the operating system log.
- Send or broadcast a message before shutdown over the network to a monitoring instance if available.
- <put your feature here>
I'm pretty sure you'll have more features to add, but always remember:
Keep it simple! It's definitely not the right application to bloat with features.