Tuesday, November 2. 2010Qt DevDays 2010 (#qtdd2010)
This year I've attended Qt DevDays again. And I've found a much bigger venue than previous years. The event is still growing. Roughly 1000 attendees this year. According to the organizers figures double every two years. DevDays moved outside of Munich half way to the airport which is not too bad because you don't have to spent an hour traveling into the city centre anymore. So I was on time to listen to the keynotes and had a chance to grab a coffee before.
Two main topics dominated the event: Qt Quick (Qt User Interface Creation Kit) and the planned Open Governance model for Qt. The first technical topic was demonstrated by Lars Knoll in his keynote and presented in several sessions both days. Basically it's NOKIAs technology to address the app development for all its mobile platforms. The web page reads: Qt Quick (Qt User Interface Creation Kit) is a high-level UI technology that allows developers and UI designers to work together to create animated, touch-enabled UIs and lightweight applications. This is an interesting strategical change towards lightweight applications a.k.a. apps to address mobile (NOKIA) devices. Because C++ is to heavy for that, they've added QML "an easy to use, declarative language" deeply integrated with the Qt Creator IDE of course. Obviously, the Qt technology stack will be driven by trends in the mobile world. Feature development for niche *nix platforms will phase out. The Open Governance model was discussed in the keynote by NOKIA CTO Rich Green and in the Qt Labs. It's good to see a company like NOKIA embracing open source and their community. It's not only discussion that will move into the public, also the QA process will be opened. In the Labs, I got the impression that many details need to be fleshed out. It's not implementation ready but you can clearly identify the trend. And I don't believe it will be easy because this means true change. Speaking of QA reminds me on something: On day three I've listened to Rohan McGovern in his talk about the "The Qt Continuous Integration System". It was a really impressive and very interesting presentation. Thumbs up, Rohan! It confirmed my experience with QA and CI: choose your tools wisely, dedicate time to it and prepare for a lot of work! Tuesday, May 18. 2010Programming massively parallel systems (FPGA vs. GPU)
Modern Graphic Processing Units (GPU) experience a big hype about their stream processing capabilities. In fact, GPU companies like AMD (ATI) and NVIDIA invest serious money to transform their former rasterizer hardware into general purpose computation engines. General Purpose GPU (GPGPU) is en vogue and attracts individuals and businesses from completely different application areas. The term GPGPU is synonymous for number crunching, high-performance/super computing, financial monte carlo simulation, etc.
How comes that popularity? The top three reasons for the fast adoption of GPGPU are:
Graphic Processing Units (GPUs)The evolution of GPUs is rooted in the 3D graphics pipeline and was heavily disrupted by the introduction of shaders. Shaders are small programs written in a low-level assembler like language. As the name implies, shaders where used to process pixels and vertices before being rasterized in graphics memory. Figure 1 sketches the general dataflow. Big/small arrows indicate bandwidth capacities between pipeline stages. Vertex and pixel processing stages are composed of several sub-steps. I've omitted them for brevity. ![]() Note, that the GPU can leverage its peak performance only if the data throughput is routed from/to its tightly connected video memory. The bottleneck is obviously the interface to system memory. But this constitutes no penalty for a graphics system. The asymmetric throughput is optimized for generating high resolution images with frame rates greater 30 fps from 3D-primitives. (To define a rectangle of arbitrary size you need two coordinate pairs (x1, y1) and (x2, y2) but the amount of pixels covered by the rendered rectangle on screen (in display memory) is proportional to width times height. You're getting the idea.) Textures and vertices are uploaded in advance before being fed into the pipeline. Each pipeline stage increases the amount of data by adding additional information like surface normals, texture coordinates etc. Since the fixed-function pipeline was limited to a certain set of pre-defined operations a more flexible and programmable pipeline was derived by introducing vertex and pixel shaders (see figure 2). Shaders were a big win for GPU developers. Programmers regained control over the graphics pipeline and GPU vendors were forced to disclose information about the underlying hardware yielding better and more efficient software. ![]() Pixel shaders are often called kernels because they apply the same set of operations to every pixel without knowledge of neighboring pixels. (Vertex shaders work in a similar way but process vertices instead.) This data independency makes them very powerful because multiple kernels can run at the same time without interference. Thus shaders implicitly scale very well (in terms of parallelization) by adding more hardware processing resources. And that's exactly what GPU manufacturers did to meet the demands of their customers: they added more shader processors, introduced vector operations and supported standard floating point formats. They effectively created a new platform for data parallel stream processing. Field Programmable Gate Arrays (FPGAs)FPGAs never had a fixed processing pipeline. These devices were not designed for signal or image processing in the first place. Early FPGAs served as glue logic or replaced standard logic ICs. Hence they are categorized as programmable logic. The internal layout of FPGAs is dominated by a large routing matrix capable of connecting thousands of tiny processing elements called lookup tables (LUT) or function generators (FG). They are arranged in a rectangular grid sometimes called fabric (see figure 3). Each LUT or FG has one or more dedicated flipflops (FF) for storing data or creating deeply pipelined designs. A LUT can generate every logic function from n-inputs (where n is vendor dependent and ranges from 3-6). It can also have accompanying multiplexers or fast lines for carry signal propagation to support arithmetic operations. The routing matrix and LUT functions are freely programmable enabling the FPGA developer to build complex logic. Additional on-chip memory (of several MBit size) can be used for FIFOs or buffering purposes. Lately Multiply-Accumulate (MAC) units were added to the fabric to gain market share of traditional DSPs. High-end devices contain approx. 1000 MAC units. Running all of them in parallel leads to very impressive performance specs. ![]() In theory, these building blocks enable you to build entire CPUs or even GPUs as long as there are enough LUTs. But the evolution of FPGA's went a different way. Programmability comes at high costs. Imagine all the transistors for configuration of the routing matrix or LUTs. And who will start to implement a CPU if you can get microcontrollers with full development toolchain support for single-digit bucks? FPGA vendors reacted by placing commonly used blocks like memory controllers, CPU cores, Ethernet MAC and high-speed serial transceivers (e.g. for PCIe) close to the fabric. No more bandwidth bottlenecks or memory shortage. This clever mix of high-speed IO and arithmetic/logic units turned FPGAs into todays reconfigurable System-on-a-Chips (see figure 4). ![]() Many ventures tried but two companies dominate the market today: Xilinx and Altera. Compared to GPU vendors, these companies operate in a completely different market segment and their target audience is not the end customer. They compete with ASICs, ASSPs and digital signal processors (DSP). You will find their FPGAs in many electronic systems, primarily in embedded systems e.g. backbone telecom routers. FPGAs are complex semiconductors tailored for specific applications. However, these devices remain freely programmable and reconfigurable. It's the designer defining the internal operation and that makes them so valuable and unique. SummaryBoth technologies, GPU and FPGA, are built for stream processing. Depending on the problem you're facing one or the other may solve it better or worse. Development cycles are usually shorter for GPUs because of their well-known programming model. FPGAs perform better in real-time environments when latencies need to be low. The most significant aspect is that a compiled FPGA design results in real hardware. But it's not only technical factors constraining your design decision. It requires developers with very specific skills and experience. And those are hard to get...
Posted by Andreas Kaiser
in Computing architectures, Development, Programmable logic
at
22:14
| Comments (3)
| Trackbacks (0)
Monday, March 15. 2010Altium NanoBoard 3000 - In The Box Experience
This series of posts is dedicated to the Altium NanoBoard 3000. I had the opportunity to play with it and like to share my experience with you.
The NanoBoard was released in September 2009 by Altium Ltd. It's a rapid prototyping platform for digital electronic designs consisting of an evaluation board, design software and royality free IP for use in the onboard FPGA. In short, all the tools you need to start implementing your ideas. You can stop reading now, buy it from e.g. Newark for $395 and get started. The 3000 is part of Altium's NanoBoard family a complementary product line besides the well know EDA tools. Maybe you're already working with Designer. ![]() When I received the delivery and removed the outer packing I was impressed. Note, I had not opened the box yet. It reminded me of some consumer lifestyle product. It could have been a mobile phone or high-end notebook. Apple is well know for such a kind of great packaging. Not bad for an evaluation board! ![]() Opening the box reveals the board + software. The black PCB with gold contacts has an undeniable elegance. Underneath is a separate box containing accessories, desktop stand, speaker board, IR remote (yes, a remote control) and the power supply. ![]() ![]() The Quickstart Guide provides instructions for mounting the desktop stand and connecting the speaker board. After 10 minutes I had the final setup sitting on my desk. The software is shipped on a DVD. Installation on Windows completed successfully after a couple of minutes. No license is required. The eval board is your license/dongle. ![]() Without having worked with it, this is clearly a highlight among the evaluation kits on my shelf. It's obvious that this product was designed by a team of professionals with a precise and common vision in mind. Every detail has been fine-tuned, from the packaging to the PCB. A good example for holistic product design. Thumbs up. And now, you know why it is worth to consider the in the box experience. More details in the next post... Sunday, February 28. 2010Embedded Events and Exhibitions 2010
In preparation for coming week's embedded world I've compiled a list of all events related to embedded computing and/or electronics for this year.
I'm sure it's not complete. So feel free to leave a comment and I'll extend the list. embedded world 2010 Nuremberg, Germany March 2 - 4, 2010 electronic displays 2010 Nuremberg, Germany March 3 - 4, 2010 IIC China Spring 2010 Shenzhen, China March 4 - 5, 2010 Chengdu, China March 11 - 12, 2010 Shanghai, China March 15 - 16, 2010 DATE 10 Dresden, Germany March 8 - 12, 2010 Intel Developer Forum 2010 Beijing April 13 - 14, 2010 ESC Silicon Valley San Jose, California, USA April 26 - 29, 2010 The Embedded Masterclass Cambridge, UK May 6, 2010 Reading, UK May 11, 2010 ESEC, 13th Embedded Systems Expo Tokyo, Japan May 12 - 14, 2010 ERTS2 2010 Toulouse, France May 19 - 21, 2010 TechEd North America New Orleans, Louisiana, USA June 7 - 11, 2010 ESC India Bangalore, India July 21 - 23, 2010 IIC China Fall 2010 Wuhan September 13 - 14, 2010 Dongguan September 16 - 17, 2010 Xi'an September 20 - 21, 2010 EIE-2010 Moscow, Russia October 26 - 28, 2010 TechEd Europe Berlin, Germany November 1 - 5, 2010 electronica Munich, Germany November 9 - 12, 2010 Sunday, October 4. 2009X86 Single board computers gaining momentum
With the growing popularity of single board computers (SBC), especially those based on x86-compatible CPUs, more and more applications pave their way into the embedded systems domain. What we're experiencing is the reach of traditional desktop-like applications into areas, that were restricted to full custom developed and RTOS-driven hardware a few months ago.
![]() SBC with enclosure Vice versa, there's a complementary trend of traditional embedded applications moving from custom hardware and niche software platforms to single board computers powered by main stream operating systems e.g. Linux or Windows embedded. Driven by narrow market windows, cost pressure and short technology cycles, product makers try to keep up with the pace by using pre-designed and verified hardware platforms and/or Open Source software frameworks hiding the underlying complexity from their development teams. This convergence from both sides of the embedded spectrum has gained momentum with the introduction of the Intel Atom architecture (and its compatibles) early last year. Its low TDP in combination with well-known interface standards e.g. USB, Gigabit Ethernet or SATA has lead to fanless and compact board designs, perfectly suited for single board computers used in industrial applications. So called computer on modules remove the burden of designing an entire processor platform from hardware engineers and shift the focus back to their core business: implementing solutions for your customers. A carrier or base board with standardized connector, e.g. via ETX/COM Express, to the processor module provides application specific functionality through interfaces like USB or PCIexpress. The same applies to the software development approach: a commodity Linux distribution replaces board support packages, embedded network stacks and error-prone third-party libraries. It should be obvious, that x86-compatibility gives you access to a giant amount of existing software and one of the largest developer communities today. ![]() Computer On Module Let's take an image processing system (could be used for quality inspection of an assembly line) as an example. The task is to capture N-images every second, perform a feature detection in the image, store the result and assert a signal in case of missing features to sort out items not conforming to spec (see following picture). ![]() The traditional embedded approach would be to connect a camera module e.g. via Camera Link to your custom-designed PCB equipped with a decent digital signal processor (DSP) and an output interface to connect to the real world e.g. serial/parallel GPIO. While there is nothing wrong with that, the question is: Is it worth designing (and maintaining!) full custom hard/software? Do we need hard realtime? How fast can you deliver the product? And at which costs? Basing our system on a single board computer instead, leads to the following simplified system level architecture. ![]() We use a CCD based camera module that supports continuous streaming over USB like the Lumenera Lm075. Isochronous transfers ensure a bounded transmission latency. Drivers and SDK are available so that image processing algorithms can be developed and tested right from the beginning on every PC with USB connectivity. Yes, this is a huge benefit! If you've ever had to setup a vendor specific DSP development environment, you know what I'm talking about..."Hell! Why can't I connect to the evaluation board!" The good news is, that current x86 architectures implement powerful DSP-operations like Streaming SIMD Extensions (http://en.wikipedia.org/wiki/SSE2) and are capable of dealing with double precision numbers which is still not given for most DSPs. Examples how to leverage these instructions for algorithm acceleration can be found here (Using SSE for image processing). Once the core algorithm is implemented, the outcome of each processed image needs to be (1) logged and (2) in case of errors signaled to external hardware dealing with the failing item e.g. sorting it out. The first task can be accomplished by simple logging to a file or, thanks to the Ethernet port, by storage in a SQL database. External signaling is slightly harder but can be done via SuperIO chipsets or GPIO pins if available. Usually, this involves another level conversion or buffering e.g. to 24V for industrial automation systems. And we're done... Well, what I've left out for the sake of brevity is the timing part. The critical path from image acquisition to external signaling needs to be carefully evaluated and finetuned to meet your timing requirements. Hard real-time is only available with RT-patched kernels but can be accomplished as well. The computing architecture described in this post is not restricted to image processing applications only, in fact it can be applied to many problems in the embedded systems domain. It integrates perfectly into corporate IT networks due to its Ethernet capability and can be administrated/monitored remotely. The benefit of using an architecture similar to the given example is, that the overall system design effort shifts noticeably to the software side, where your expertise is located, in the implementation of core business logic based on a proven and standardized platform.
(Page 1 of 4, totaling 19 entries)
» next page
|
QuicksearchCategoriesArchivesTwitter TimelineSyndicate This Blog |