Sunday, October 4. 2009
With the growing popularity of single board computers (SBC), especially those based on x86-compatible CPUs, more and more applications pave their way into the embedded systems domain. What we're experiencing is the reach of traditional desktop-like applications into areas, that were restricted to full custom developed and RTOS-driven hardware a few months ago.
SBC with enclosure
Vice versa, there's a complementary trend of traditional embedded applications moving from custom hardware and niche software platforms to single board computers powered by main stream operating systems e.g. Linux or Windows embedded. Driven by narrow market windows, cost pressure and short technology cycles, product makers try to keep up with the pace by using pre-designed and verified hardware platforms and/or Open Source software frameworks hiding the underlying complexity from their development teams.
This convergence from both sides of the embedded spectrum has gained momentum with the introduction of the Intel Atom architecture (and its compatibles) early last year. Its low TDP in combination with well-known interface standards e.g. USB, Gigabit Ethernet or SATA has lead to fanless and compact board designs, perfectly suited for single board computers used in industrial applications.
So called computer on modules remove the burden of designing an entire processor platform from hardware engineers and shift the focus back to their core business: implementing solutions for your customers. A carrier or base board with standardized connector, e.g. via ETX/COM Express, to the processor module provides application specific functionality through interfaces like USB or PCIexpress. The same applies to the software development approach: a commodity Linux distribution replaces board support packages, embedded network stacks and error-prone third-party libraries. It should be obvious, that x86-compatibility gives you access to a giant amount of existing software and one of the largest developer communities today.
Computer On Module
Let's take an image processing system (could be used for quality inspection of an assembly line) as an example. The task is to capture N-images every second, perform a feature detection in the image, store the result and assert a signal in case of missing features to sort out items not conforming to spec (see following picture).
The traditional embedded approach would be to connect a camera module e.g. via Camera Link to your custom-designed PCB equipped with a decent digital signal processor (DSP) and an output interface to connect to the real world e.g. serial/parallel GPIO. While there is nothing wrong with that, the question is: Is it worth designing (and maintaining!) full custom hard/software? Do we need hard realtime? How fast can you deliver the product? And at which costs? Basing our system on a single board computer instead, leads to the following simplified system level architecture.
We use a CCD based camera module that supports continuous streaming over USB like the Lumenera Lm075. Isochronous transfers ensure a bounded transmission latency. Drivers and SDK are available so that image processing algorithms can be developed and tested right from the beginning on every PC with USB connectivity. Yes, this is a huge benefit! If you've ever had to setup a vendor specific DSP development environment, you know what I'm talking about..."Hell! Why can't I connect to the evaluation board!"
The good news is, that current x86 architectures implement powerful DSP-operations like Streaming SIMD Extensions (http://en.wikipedia.org/wiki/SSE2) and are capable of dealing with double precision numbers which is still not given for most DSPs. Examples how to leverage these instructions for algorithm acceleration can be found here (Using SSE for image processing).
Once the core algorithm is implemented, the outcome of each processed image needs to be (1) logged and (2) in case of errors signaled to external hardware dealing with the failing item e.g. sorting it out. The first task can be accomplished by simple logging to a file or, thanks to the Ethernet port, by storage in a SQL database. External signaling is slightly harder but can be done via SuperIO chipsets or GPIO pins if available. Usually, this involves another level conversion or buffering e.g. to 24V for industrial automation systems. And we're done...
Well, what I've left out for the sake of brevity is the timing part. The critical path from image acquisition to external signaling needs to be carefully evaluated and finetuned to meet your timing requirements. Hard real-time is only available with RT-patched kernels but can be accomplished as well.
The computing architecture described in this post is not restricted to image processing applications only, in fact it can be applied to many problems in the embedded systems domain. It integrates perfectly into corporate IT networks due to its Ethernet capability and can be administrated/monitored remotely. The benefit of using an architecture similar to the given example is, that the overall system design effort shifts noticeably to the software side, where your expertise is located, in the implementation of core business logic based on a proven and standardized platform.
Saturday, January 24. 2009
This is the last post in my series (see part 1 and part 2) about watchdog timers (WDTs). After that one I will stop bugging you with the topic. But probably you've already got the impression I have a certain weakness for WDTs. Anyway let's take this to the next level and find out how to improve the solution from my last post even further.
Right now we have our application directly pinging the watchdog hardware i.e. there's a watchdog module/component linked into the main executable doing all the things required to prevent a timeout. That was easy to do because it only required us to insert two function calls for (de)initialization and one for pinging inside our mainloop. Unfortunately it also introduced a tight coupling between the main application and the watchdog handling logic (see figure 1).
Other drawbacks of this approach are:
To overcome these drawbacks we put the watchdog handling logic into a separate executable and introduce a small and light-weight layer that wraps the interprocess communication (IPC) which is now required. I'll leave the choice of an appropriate IPC API to you. It should be non-blocking and easy to use (SOAP is probably a bad idea). Certainly we've added slightly more complexity to the big picture (see figure 2) but at the same time it removes any dependencies from the main application to the WDT hardware and it can be run even when there is no watchdog present at all.
Having done these architectural changes we're now ready to introduce more logic to improve the WDT behavior. We can even add a user interface to the newly created watchdog application. Please note, that the term watchdog application/process refers now to a piece of software sitting on top of the WDT hardware.
The nonobvious outcome of inserting a watchdog process between the main app and the WDT hardware is that we're getting an additional high-level fallback layer without degrading the original WDT functionality. Both applications + OS remain under WDT control (remember this is a piece of hardware hooked up to the reset line). This gives us more opportunities to take action when the main app stops pinging the watchdog process e.g. we could try to kill and restart the main application process once and ultimately initiate a shutdown sequence which is healthier than the hard reset. And that's not all! Further enhancements may include:
I'm pretty sure you'll have more features to add, but always remember: Keep it simple! It's definitely not the right application to bloat with features.
Sunday, January 18. 2009
While my last post was focused on the hardware-side of watchdog timers (WDTs) I will now discuss more high-level/software concepts of WDTs. So, if you haven't read part 1. Now is the time! Done? Here we go.
I'm assuming we have a software powered application that requires a full-fledged operating system (OS) and relies on a bunch of peripheral hardware besides a CPU e.g. harddisks, network adaptors etc. Let's call it server since that might be a valid use case. OK. Our application is supposed to run 24/7 somewhere deeply buried at a customer's site and it's extremely costly to send out tech-support staff for on-site fixes, just to discover that e.g. someone played around with our settings or temporarily disabled the air condition for maintenance causing the system to lock-up.
Being smart and having read part 1 we pull out this timer thing connected to the reset line of our server and we're set, right? Wrong. The problem with this approach is its simplicity and tempting ease of implementation. I understand that you want to get things done and push the box out of the door. But this approach brings further implications we havn't dealt with yet. Remember, our software is not running on a microcontroller. A hard reset should only be the last resort since it puts a lot more stress on all components than a safe shutdown. And what if there is really a broken piece of hardware or the air-condition runs amok and starts heating? It will result in an endless reboot-reset cycle causing even more harm.
Let's tackle the "endless reboot-reset cycle" problem first since it can be applied without changes to the hard reset implementation. The idea is quite simple: we extend the WDT by counting the number of timeouts. If a timeout occurs our application is obviously not running the way it was intended to. So we maintain a counter or some flags (ideally this is done in hardware) and additionally log the time in a non-volatile way. Observing more than X timeouts in a timeframe less than Y seconds (insert appropriate values for your application) will power down the system. Now, human intervention is required for a restart.
I've tried to illustrate this in a loose UML-style statechart (see figure), on the left hand-side the application process and on the right hand side the watchdog process. The dashed line denotes concurrency. State names are written in bold + underline. Lines with arrows denote state transitions. Transitions can be conditional (with label) or unconditional without label. Including the WDT reload/ping of the application process was a bit tricky. Note, the arrow which is overlapping the dashed line. I've found no better notation. If somebody can shed light onto that issue, feel free to comment.
In my next post I will cover the "safe shutdown" issue and why it is important to split this functionality from your core software.
Thursday, January 8. 2009
(Delayed) Happy New Year everyone! My New Year's resolution for this blog is to keep posts shorter than the last one. Looking over it again made me feel that it was a bit too heavy.
Now back to the topic. I'm going to blog about watchdog timers (WDT) today because it was one of the main topics that bothered me for this week. Watchdogs are very common in embedded systems to resolve system/software hangs. These non-interactive systems need a mechanism to automatically reboot or recover from failure states without human intervention. So how does it work? The idea is quite simple. As the name says it's a timer which itself finally boils down to a counter. That counter is part of a dedicated simple circuitry running independently from the processor and continuously counting down e.g. at a speed of 1 tick per second. If the counter reaches 0 a signal is asserted and ... something has to happen. Usually you'll find the following two scenarios:
I'm sure you've already figured out what to do. To prevent timeouts one has to periodically reload the counter to assure your program is still alive. This is also referred to as pinging.
WDTs can be found in nearly every microcontroller today, on server mainboards or dedicated extension cards with programming support.
In my case it happened to be a Super Micro X7SBE motherboard with on-board watch dog. Since the board is equipped with an ICH9R chipset which contains WDT functionality (as part of Intel's TCO logic) I've expected the BIOS to use that one. Later (about a dozen reboots) I figured out that it's the watchdog in the W83627HG I/O chip. Of course, this is not documented in the mainboard manuals and there's also no driver for the device on the drivers disc (except, if you count the winio library). So, if you're planning to use an on-board WDT: Watch out! Don't walk into this trap.
Saturday, November 29. 2008
Granted, I've left out the important numbers in the title. To be more precise: this post is only for those of you maintaining an existing PCB design containing one or more Xilinx Virtex-II devices.
Many times it happens that you're forced to touch the board again because of discontinued parts (no, not the Virtex-II), minor circuit improvements or feature requests from your customers. So why not upgrade the Virtex-II to a package-compatible Spartan-3A at the same time?
No way!? Why should you do that? Valid points but before I'm going to explain why it could make sense in certain situations I want to introduce the prerequisites for the upgrade:
If these (heavy) requirements apply to your design this could be your migration path:
With the extension of the Spartan-3A family in August 2008 Xilinx now ships the entire device family in the FT256 package. The FT256 is slightly thinner but FG256 "compatible". Now, it's possible to replace a small to mid-range Virtex-II by using a large Spartan-3A.
In the table above I did the logic, multiplier and DCM resources comparison for you to find the appropriate replacement candidates (see third column). The second column lists all Spartan-3A devices that have the same amount of CLB logic than their Virtex-II "counterpart" but less multiplier/BlockRAM resources. So if your design doesn't use 100% of these non-CLB resources they might be an alternative, too. That's why I've called them potential upgrade parts.
Now back to the question: Why? I know it's a lot of work and not as easy as replacing just the FPGA in the next board revision. It requires adjusting the core voltage, changing the layout, rewriting constraints, updating production specs etc. etc. But it will pay off in the following points:
Which I consider valid reasons for small to large volume products and especially battery powered applications. Please note, this post is not meant to be a HOWTO or guide nor can I guarantee you that it works for all designs. I wanted to share the idea and research work with you. In case you find this interesting or want to exchange experiences I would be glad to hear back from you.
(Page 1 of 1, totaling 5 entries)