<?xml version="1.0" encoding="utf-8" ?>

<rss version="0.91" >
<channel>
<title>streaming cores</title>
<link>http://blog.streamingcores.com/</link>
<description>[embedded] system engineering</description>
<language>en</language>
<image>
        <url>http://blog.streamingcores.com/templates/default/img/s9y_banner_small.png</url>
        <title>RSS: streaming cores - [embedded] system engineering</title>
        <link>http://blog.streamingcores.com/</link>
        <width>100</width>
        <height>21</height>
    </image>

<item>
    <title>Qt DevDays 2010 (#qtdd2010)</title>
    <link>http://blog.streamingcores.com/index.php?/archives/21-Qt-DevDays-2010-qtdd2010.html</link>

    <description>
        This year I&#039;ve attended Qt DevDays again. And I&#039;ve found a much bigger venue than previous years. The event is still growing. Roughly 1000 attendees this year. According to the organizers figures double every two years. DevDays moved outside of Munich half way to the airport which is not too bad because you don&#039;t have to spent an hour traveling into the city centre anymore. So I was on time to listen to the keynotes and had a chance to grab a coffee before.&lt;br /&gt;
&lt;br /&gt;
Two main topics dominated the event: Qt Quick (Qt User Interface Creation Kit) and the planned Open Governance model for Qt. The first technical topic was demonstrated by Lars Knoll in his keynote and presented in several sessions both days. Basically it&#039;s NOKIAs technology to address the app development for all its mobile platforms. The &lt;a href=&quot;http://qt.nokia.com/products/whats-new-in-qt&quot; title=&quot;What&#039;s new in Qt&quot;&gt;web page&lt;/a&gt; reads:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;Qt Quick (Qt User Interface Creation Kit) is a high-level UI technology that allows developers and UI designers to work together to create animated, touch-enabled UIs and lightweight applications.&lt;/blockquote&gt;&lt;br /&gt;
&lt;br /&gt;
This is an interesting strategical change towards lightweight applications a.k.a. apps to address mobile (NOKIA) devices. Because C++ is to heavy for that, they&#039;ve added QML &amp;quot;an easy to use, declarative language&amp;quot; deeply integrated with the Qt Creator IDE of course. Obviously, the Qt technology stack will be driven by trends in the mobile world. Feature development for niche *nix platforms will phase out.&lt;br /&gt;
&lt;br /&gt;
The Open Governance model was discussed in the keynote by NOKIA CTO Rich Green and in the Qt Labs. It&#039;s good to see a company like NOKIA embracing open source and their community. It&#039;s not only discussion that will move into the public, also the QA process will be opened. In the Labs, I got the impression that many details need to be fleshed out. It&#039;s not implementation ready but you can clearly identify the trend. And I don&#039;t believe it will be easy because this means true change.&lt;br /&gt;
&lt;br /&gt;
Speaking of QA reminds me on something: On day three I&#039;ve listened to Rohan McGovern in his talk about the &quot;The Qt Continuous Integration System&quot;. It was a really impressive and very interesting presentation. Thumbs up, Rohan! It confirmed my experience with QA and CI: choose your tools wisely, dedicate time to it and prepare for a lot of work!&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>Programming massively parallel systems (FPGA vs. GPU)</title>
    <link>http://blog.streamingcores.com/index.php?/archives/20-Programming-massively-parallel-systems-FPGA-vs.-GPU.html</link>

    <description>
        Modern Graphic Processing Units (GPU) experience a big hype about their stream processing capabilities. In fact, GPU companies like &lt;a href=&quot;http://www.amd.com&quot; title=&quot;AMD&quot;&gt;AMD&lt;/a&gt; (ATI) and &lt;a href=&quot;http://www.nvidia.com&quot; title=&quot;NVIDIA&quot;&gt;NVIDIA&lt;/a&gt; invest serious money to transform their former rasterizer hardware into general purpose computation engines. General Purpose GPU (GPGPU) is en vogue and attracts individuals and businesses from completely different application areas. The term GPGPU is synonymous for number crunching, high-performance/super computing, financial monte carlo simulation, etc.&lt;br /&gt;
&lt;br /&gt;
How comes that popularity? The top three reasons for the fast adoption of GPGPU are:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;Mainstream availability of the hardware.&lt;br /&gt;
While FPGAs are unknown to many people, although they&#039;re powering many products. Almost everybody who has used a computer knows about the existence of a graphics adaptor and where to get them.&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;Low costs (&lt;a href=&quot;http://en.wikipedia.org/wiki/Commercial_off-the-shelf&quot; title=&quot;Commercial Off The Shelf&quot;&gt;COTS&lt;/a&gt;).&lt;br /&gt;
GPUs are cheap. Their market is large and very price-sensitive. This is probably not true for top-notch products but even high-end GPUs are low priced compared to super-computing hardware or big FPGA devices.&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;Well known programming model.&lt;br /&gt;
&lt;a href=&quot;http://www.nvidia.com/object/cuda_home_new.html&quot; title=&quot;CUDA&quot;&gt;NVIDIA CUDA&lt;/a&gt;, &lt;a href=&quot;http://developer.amd.com/gpu/ATIStreamSDK&quot; title=&quot;ATI Stream&quot;&gt;ATI Stream&lt;/a&gt; and &lt;a href=&quot;http://www.khronos.org/opencl/&quot; title=&quot;OpenCL&quot;&gt;Chronos OpenCL&lt;/a&gt; are C-style languages. Their model is similiar to tradional imperative/sequential programming languages. It fits the mindset of many programmers and benefits from a large developer community. The FPGA development flow is fundamentally different, requires hardware knowledge and has a steep learning curve.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Graphic Processing Units (GPUs)&lt;/h3&gt;&lt;br /&gt;
&lt;br /&gt;
The evolution of GPUs is rooted in the 3D graphics pipeline and was heavily disrupted by the introduction of shaders. Shaders are small programs written in a low-level assembler like language. As the name implies, shaders where used to process pixels and vertices before being rasterized in graphics memory. Figure 1 sketches the general dataflow. Big/small arrows indicate bandwidth capacities between pipeline stages. Vertex and pixel processing stages are composed of several sub-steps. I&#039;ve omitted them for brevity.&lt;br /&gt;
&lt;br /&gt;
&lt;div align=&quot;center&quot;&gt;&lt;img src=&quot;http://streamingcores.com/images/blog18-1gpgpu.png&quot; alt=&quot;GPU dataflow&quot;/&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
Note, that the GPU can leverage its peak performance only if the data throughput is routed from/to its tightly connected video memory. The bottleneck is obviously the interface to system memory. But this constitutes no penalty for a graphics system. The asymmetric throughput is optimized for generating high resolution images with frame rates greater 30 fps from 3D-primitives. (To define a rectangle of arbitrary size you need two coordinate pairs (x1, y1) and (x2, y2) but the amount of pixels covered by the rendered rectangle on screen (in display memory) is proportional to width times height. You&#039;re getting the idea.) Textures and vertices are uploaded in advance before being fed into the pipeline. Each pipeline stage increases the amount of data by adding additional information like surface normals, texture coordinates etc.&lt;br /&gt;
&lt;br /&gt;
Since the fixed-function pipeline was limited to a certain set of pre-defined operations a more flexible and programmable pipeline was derived by introducing vertex and pixel shaders (see figure 2). Shaders were a big win for GPU developers. Programmers regained control over the graphics pipeline and GPU vendors were forced to disclose information about the underlying hardware yielding better and more efficient software.&lt;br /&gt;
&lt;br /&gt;
&lt;div align=&quot;center&quot;&gt;&lt;img src=&quot;http://streamingcores.com/images/blog18-2gpgpu.png&quot; alt=&quot;GPU shader pipeline&quot; /&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
Pixel shaders are often called kernels because they apply the same set of operations to every pixel without knowledge of neighboring pixels. (Vertex shaders work in a similar way but process vertices instead.) This data independency makes them very powerful because multiple kernels can run at the same time without interference. Thus shaders implicitly scale very well (in terms of parallelization) by adding more hardware processing resources. And that&#039;s exactly what GPU manufacturers did to meet the demands of their customers: they added more shader processors, introduced vector operations and supported standard floating point formats. They effectively created a new platform for data parallel stream processing.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Field Programmable Gate Arrays (FPGAs)&lt;/h3&gt;&lt;br /&gt;
&lt;br /&gt;
FPGAs never had a fixed processing pipeline. These devices were not designed for signal or image processing in the first place. Early FPGAs served as glue logic or replaced standard logic ICs. Hence they are categorized as programmable logic.&lt;br /&gt;
&lt;br /&gt;
The internal layout of FPGAs is dominated by a large routing matrix capable of connecting thousands of tiny processing elements called lookup tables (LUT) or function generators (FG). They are arranged in a rectangular grid sometimes called fabric (see figure 3). Each LUT or FG has one or more dedicated flipflops (FF) for storing data or creating deeply pipelined designs. A LUT can generate every logic function from n-inputs (where n is vendor dependent and ranges from 3-6). It can also have accompanying multiplexers or fast lines for carry signal propagation to support arithmetic operations. The routing matrix and LUT functions are freely programmable enabling the FPGA developer to build complex logic. Additional on-chip memory (of several MBit size) can be used for FIFOs or buffering purposes. Lately Multiply-Accumulate (MAC) units were added to the fabric to gain market share of traditional DSPs. High-end devices contain approx. 1000 MAC units. Running all of them in parallel leads to very impressive performance specs.&lt;br /&gt;
&lt;br /&gt;
&lt;div align=&quot;center&quot;&gt;&lt;img src=&quot;http://streamingcores.com/images/blog18-3fpga.png&quot; alt=&quot;FPGA routing matrix&quot; /&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
In theory, these building blocks enable you to build entire CPUs or even GPUs as long as there are enough LUTs. But the evolution of FPGA&#039;s went a different way. Programmability comes at high costs. Imagine all the transistors for configuration of the routing matrix or LUTs. And who will start to implement a CPU if you can get microcontrollers with full development toolchain support for single-digit bucks? FPGA vendors reacted by placing commonly used blocks like memory controllers, CPU cores, Ethernet MAC and high-speed serial transceivers (e.g. for PCIe) close to the fabric. No more bandwidth bottlenecks or memory shortage. This clever mix of high-speed IO and arithmetic/logic units turned FPGAs into todays reconfigurable System-on-a-Chips (see figure 4).&lt;br /&gt;
&lt;br /&gt;
&lt;div align=&quot;center&quot;&gt;&lt;img src=&quot;http://streamingcores.com/images/blog18-4fpga.png&quot; alt=&quot;FPGA System-on-a-Chip&quot; /&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
Many ventures tried but two companies dominate the market today: &lt;a href=&quot;http://www.xilinx.com&quot; title=&quot;Xilinx&quot;&gt;Xilinx&lt;/a&gt; and &lt;a href=&quot;http://www.altera.com&quot; title=&quot;ALTERA&quot;&gt;Altera&lt;/a&gt;. Compared to GPU vendors, these companies operate in a completely different market segment and their target audience is not the end customer. They compete with &lt;a href=&quot;http://en.wikipedia.org/wiki/Application-specific_integrated_circuit&quot; title=&quot;Application Specific Integrated Circuit&quot;&gt;ASIC&lt;/a&gt;s, &lt;a href=&quot;http://en.wikipedia.org/wiki/Application_specific_standard_product&quot; title=&quot;Application Specific Standard Product&quot;&gt;ASSP&lt;/a&gt;s and digital signal processors (DSP). You will find their FPGAs in many electronic systems, primarily in embedded systems e.g. backbone telecom routers. FPGAs are complex semiconductors tailored for specific applications. However, these devices remain freely programmable and reconfigurable. It&#039;s the designer defining the internal operation and that makes them so valuable and unique.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;
&lt;br /&gt;
Both technologies, GPU and FPGA, are built for stream processing. Depending on the problem you&#039;re facing one or the other may solve it better or worse. Development cycles are usually shorter for GPUs because of their well-known programming model. FPGAs perform better in real-time environments when latencies need to be low. The most significant aspect is that a compiled FPGA design results in real hardware.&lt;br /&gt;
&lt;br /&gt;
But it&#039;s not only technical factors constraining your design decision. It requires developers with very specific skills and experience. And those are hard to get...&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>Altium NanoBoard 3000 - In The Box Experience</title>
    <link>http://blog.streamingcores.com/index.php?/archives/19-Altium-NanoBoard-3000-In-The-Box-Experience.html</link>

    <description>
        This series of posts is dedicated to the Altium NanoBoard 3000. I had the opportunity to play with it and like to share my experience with you.&lt;br /&gt;
&lt;br /&gt;
The NanoBoard was released in September 2009 by &lt;a href=&quot;http://www.altium.com&quot; title=&quot;ALTIUM Website&quot;&gt;Altium Ltd.&lt;/a&gt; It&#039;s a rapid prototyping platform for digital electronic designs consisting of an evaluation board, design software and royality free &lt;a href=&quot;http://en.wikipedia.org/wiki/Semiconductor_intellectual_property_core&quot; title=&quot;IP Core&quot;&gt;IP&lt;/a&gt; for use in the onboard FPGA. In short, all the tools you need to start implementing your ideas. You can stop reading now, buy it from e.g. &lt;a href=&quot;http://www.newark.com/altium/12-400-nb3000xn-01/nanoboard-3000-with-xilinx-spartan/dp/10R0248&quot;&gt;Newark&lt;/a&gt; for $395 and get started. The 3000 is part of Altium&#039;s NanoBoard family a complementary product line besides the well know EDA tools. Maybe you&#039;re already working with Designer.&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;&lt;img src=&quot;http://streamingcores.com/images/blog17-nanoboard3000_1.jpg&quot; alt=&quot;Nanoboard Box&quot; /&gt;&lt;/center&gt;&lt;br /&gt;
&lt;br /&gt;
When I received the delivery and removed the outer packing I was impressed. Note, I had not opened the box yet. It reminded me of some consumer lifestyle product. It could have been a mobile phone or high-end notebook. Apple is well know for such a kind of great packaging. Not bad for an evaluation board!&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;&lt;img src=&quot;http://streamingcores.com/images/blog17-nanoboard3000_2.jpg&quot; alt=&quot;&quot; /&gt;&lt;/center&gt;&lt;br /&gt;
&lt;br /&gt;
Opening the box reveals the board + software. The black PCB with gold contacts has an undeniable elegance. Underneath is a separate box containing accessories, desktop stand, speaker board, IR remote (yes, a remote control) and the power supply.&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;&lt;img src=&quot;http://streamingcores.com/images/blog17-nanoboard3000_3.jpg&quot; alt=&quot;&quot; /&gt;&lt;/center&gt;&lt;br /&gt;
&lt;center&gt;&lt;img src=&quot;http://streamingcores.com/images/blog17-nanoboard3000_4.jpg&quot; alt=&quot;&quot; /&gt;&lt;/center&gt;&lt;br /&gt;
&lt;br /&gt;
The Quickstart Guide provides instructions for mounting the desktop stand and connecting the speaker board. After 10 minutes I had the final setup sitting on my desk. The software is shipped on a DVD. Installation on Windows completed successfully after a couple of minutes. No license is required. The eval board is your license/dongle.&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;&lt;img src=&quot;http://streamingcores.com/images/blog17-nanoboard3000_5.jpg&quot; alt=&quot;&quot; /&gt;&lt;/center&gt;&lt;br /&gt;
&lt;br /&gt;
Without having worked with it, this is clearly a highlight among the evaluation kits on my shelf. It&#039;s obvious that this product was designed by a team of professionals with a precise and common vision in mind. Every detail has been fine-tuned, from the packaging to the PCB. A good example for holistic product design. Thumbs up.&lt;br /&gt;
&lt;br /&gt;
And now, you know why it is worth to consider the in the box experience. More details in the next post... 
    </description>
</item>
<item>
    <title>Embedded Events and Exhibitions 2010</title>
    <link>http://blog.streamingcores.com/index.php?/archives/18-Embedded-Events-and-Exhibitions-2010.html</link>

    <description>
        In preparation for coming week&#039;s embedded world I&#039;ve compiled a list of all events related to embedded computing and/or electronics for this year.&lt;br /&gt;
&lt;br /&gt;
I&#039;m sure it&#039;s not complete. So feel free to leave a comment and I&#039;ll extend the list.&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.embedded-world.de/en/&quot;&gt;embedded world 2010&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Nuremberg, Germany&lt;/em&gt;&lt;br /&gt;
March 2 - 4, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.electronic-displays.de/&quot;&gt;electronic displays 2010&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Nuremberg, Germany&lt;/em&gt;&lt;br /&gt;
March 3 - 4, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.english.iic-china.com/&quot;&gt;IIC China Spring 2010&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Shenzhen, China&lt;/em&gt;&lt;br /&gt;
March 4 - 5, 2010&lt;br /&gt;
&lt;em&gt;Chengdu, China&lt;/em&gt;&lt;br /&gt;
March 11 - 12, 2010&lt;br /&gt;
&lt;em&gt;Shanghai, China&lt;/em&gt;&lt;br /&gt;
March 15 - 16, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.date-conference.com/&quot;&gt;DATE 10&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Dresden, Germany&lt;/em&gt;&lt;br /&gt;
March 8 - 12, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.intel.com/IDF/&quot;&gt;Intel Developer Forum 2010&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Beijing&lt;/em&gt;&lt;br /&gt;
April 13 - 14, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://esc-sv09.techinsightsevents.com/&quot;&gt;ESC Silicon Valley&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;San Jose, California, USA&lt;/em&gt;&lt;br /&gt;
April 26 - 29, 2010 &lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.embedded-masterclass.com/&quot;&gt;The Embedded Masterclass&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Cambridge, UK&lt;/em&gt;&lt;br /&gt;
May 6, 2010&lt;br /&gt;
&lt;em&gt;Reading, UK&lt;/em&gt;&lt;br /&gt;
May 11, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.esec.jp/en/&quot;&gt;ESEC, 13th Embedded Systems Expo&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Tokyo, Japan&lt;/em&gt;&lt;br /&gt;
May 12 - 14, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.erts2010.org/&quot;&gt;ERTS&lt;sup&gt;2&lt;/sup&gt; 2010&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Toulouse, France&lt;/em&gt;&lt;br /&gt;
May 19 - 21, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.microsoft.com/events/techednorthamerica/&quot;&gt;TechEd North America &lt;/a&gt;&lt;br /&gt;
&lt;em&gt;New Orleans, Louisiana, USA&lt;/em&gt;&lt;br /&gt;
June 7 - 11, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.esc-india.com/&quot;&gt;ESC India&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Bangalore, India&lt;/em&gt;&lt;br /&gt;
July 21 - 23, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.english.iic-china.com/&quot;&gt;IIC China Fall 2010&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Wuhan&lt;/em&gt;&lt;br /&gt;
September 13 - 14, 2010&lt;br /&gt;
&lt;em&gt;Dongguan&lt;/em&gt;&lt;br /&gt;
September 16 - 17, 2010&lt;br /&gt;
&lt;em&gt;Xi&#039;an&lt;/em&gt;&lt;br /&gt;
September 20 - 21, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.module-2010.ru/en/welcome&quot;&gt;EIE-2010&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Moscow, Russia&lt;/em&gt;&lt;br /&gt;
October 26 - 28, 2010 &lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.msteched.com/online/home.aspx&quot;&gt;TechEd Europe&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Berlin, Germany&lt;/em&gt;&lt;br /&gt;
November 1 - 5, 2010&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://www.electronica.de/en/home&quot;&gt;electronica&lt;/a&gt;&lt;br /&gt;
&lt;em&gt;Munich, Germany&lt;/em&gt;&lt;br /&gt;
November 9 - 12, 2010&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>X86 Single board computers gaining momentum</title>
    <link>http://blog.streamingcores.com/index.php?/archives/17-X86-Single-board-computers-gaining-momentum.html</link>

    <description>
        With the growing popularity of single board computers (SBC), especially those based on x86-compatible CPUs, more and more applications pave their way into the embedded systems domain. What we&#039;re experiencing is the reach of traditional desktop-like applications into areas, that were restricted to full custom developed and &lt;a href=&quot;http://en.wikipedia.org/wiki/Real-time_operating_system&quot; title=&quot;Real-time operating system&quot;&gt;RTOS&lt;/a&gt;-driven hardware a few months ago.&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://www.eurotech-inc.com/images/sbc/Rugged-Proteus-Back-View-large.jpg&quot; alt=&quot;&quot; width=&quot;320&quot; height=&quot;408&quot; /&gt;&lt;br /&gt;
&lt;em&gt;SBC with enclosure&lt;/em&gt;&lt;br /&gt;
&lt;br /&gt;
Vice versa, there&#039;s a complementary trend of traditional embedded applications moving from custom hardware and niche software platforms to single board computers powered by main stream operating systems e.g. Linux or Windows embedded. Driven by narrow market windows, cost pressure and short technology cycles, product makers try to keep up with the pace by using pre-designed and verified hardware platforms and/or Open Source software frameworks hiding the underlying complexity from their development teams.&lt;br /&gt;
&lt;br /&gt;
This convergence from both sides of the embedded spectrum has gained momentum with the introduction of the Intel Atom architecture (and its compatibles) early last year. Its low &lt;a href=&quot;http://en.wikipedia.org/wiki/Thermal_design_power&quot; title=&quot;Thermal design power&quot;&gt;TDP&lt;/a&gt; in combination with well-known interface standards e.g. USB, Gigabit Ethernet or SATA has lead to fanless and compact board designs, perfectly suited for single board computers used in industrial applications.&lt;br /&gt;
&lt;br /&gt;
So called computer on modules remove the burden of designing an entire processor platform from hardware engineers and shift the focus back to their core business: implementing solutions for your customers. A carrier or base board with standardized connector, e.g. via &lt;a href=&quot;http://en.wikipedia.org/wiki/ETX_(form_factor)&quot; title=&quot;ETX&quot;&gt;ETX&lt;/a&gt;/&lt;a href=&quot;http://www.picmg.org/pdf/COM_Express_tutorial.pdf&quot; title=&quot;COM Express Tutorial&quot;&gt;COM Express&lt;/a&gt;, to the processor module provides application specific functionality through interfaces like USB or PCIexpress. The same applies to the software development approach: a commodity Linux distribution replaces board support packages, embedded network stacks and error-prone third-party libraries. It should be obvious, that x86-compatibility gives you access to a giant amount of existing software and one of the largest developer communities today.&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://upload.wikimedia.org/wikipedia/commons/8/83/Colibri_Intel_XScale_PXA270_Single_Board_Computer_Module.jpg&quot; alt=&quot;&quot; width=&quot;320&quot; height=&quot;240&quot;/&gt;&lt;br /&gt;
&lt;em&gt;Computer On Module&lt;/em&gt;&lt;br /&gt;
&lt;br /&gt;
Let&#039;s take an image processing system (could be used for quality inspection of an assembly line) as an example. The task is to capture N-images every second, perform a feature detection in the image, store the result and assert a signal in case of missing features to sort out items not conforming to spec (see following picture).&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;&lt;img src=&quot;http://streamingcores.com/images/blog15-concept.png&quot; alt=&quot;&quot; width=&quot;480&quot; height=&quot;347&quot;/&gt;&lt;/center&gt;&lt;br /&gt;
The traditional embedded approach would be to connect a camera module e.g. via &lt;a href=&quot;http://en.wikipedia.org/wiki/Camera_Link&quot; title=&quot;Camera Link&quot;&gt;Camera Link&lt;/a&gt; to your custom-designed PCB equipped with a decent digital signal processor (DSP) and an output interface to connect to the real world e.g. serial/parallel GPIO. While there is nothing wrong with that, the question is: Is it worth designing (and maintaining!) full custom hard/software? Do we need hard realtime? How fast can you deliver the product? And at which costs? Basing our system on a single board computer instead, leads to the following simplified system level architecture.&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;&lt;img src=&quot;http://streamingcores.com/images/blog15-systemlevel.png&quot; alt=&quot;&quot; width=&quot;640&quot; height=&quot;277&quot; /&gt;&lt;/center&gt;&lt;br /&gt;
We use a CCD based camera module that supports continuous streaming over USB like the &lt;a href=&quot;http://www.lumenera.com/products/industrial-cameras/lm075.php&quot; title=&quot;Lumenera Lm075&quot;&gt;Lumenera Lm075&lt;/a&gt;. &lt;a href=&quot;http://www.beyondlogic.org/usbnutshell/usb4.htm#Isochronous&quot; title=&quot;Isochronous transfers&quot;&gt;Isochronous transfers&lt;/a&gt; ensure a bounded transmission latency. Drivers and SDK are available so that image processing algorithms can be developed and tested right from the beginning on every PC with USB connectivity. Yes, this is a huge benefit! If you&#039;ve ever had to setup a vendor specific DSP development environment, you know what I&#039;m talking about...&quot;Hell! Why can&#039;t I connect to the evaluation board!&quot;&lt;br /&gt;
&lt;br /&gt;
The good news is, that current x86 architectures implement powerful DSP-operations like Streaming SIMD Extensions (http://en.wikipedia.org/wiki/SSE2) and are capable of dealing with double precision numbers which is still not given for most DSPs. Examples how to leverage these instructions for algorithm acceleration can be found here (&lt;a href=&quot;http://ecee.colorado.edu/~ecen5033/ecen5033/code/Using-SSE-for-Image-Processing/Using%20SSE%20and%20IPP%20to%20Accelerate%20Algorithms.pdf&quot; title=&quot;Using SSE for image processing paper&quot;&gt;Using SSE for image processing&lt;/a&gt;).&lt;br /&gt;
&lt;br /&gt;
Once the core algorithm is implemented, the outcome of each processed image needs to be (1) logged and (2) in case of errors signaled to external hardware dealing with the failing item e.g. sorting it out. The first task can be accomplished by simple logging to a file or, thanks to the Ethernet port, by storage in a SQL database. External signaling is slightly harder but can be done via &lt;a href=&quot;http://en.wikipedia.org/wiki/Super_I/O&quot; title=&quot;Super IO&quot;&gt;SuperIO chipsets&lt;/a&gt; or GPIO pins if available. Usually, this involves another level conversion or buffering e.g. to 24V for industrial automation systems. And we&#039;re done...&lt;br /&gt;
&lt;br /&gt;
Well, what I&#039;ve left out for the sake of brevity is the timing part. The critical path from image acquisition to external signaling needs to be carefully evaluated and finetuned to meet your timing requirements. Hard real-time is only available with RT-patched kernels but can be accomplished as well.&lt;br /&gt;
&lt;br /&gt;
The computing architecture described in this post is not restricted to image processing applications only, in fact it can be applied to many problems in the embedded systems domain. It integrates perfectly into corporate IT networks due to its Ethernet capability and can be administrated/monitored remotely. The benefit of using an architecture similar to the given example is, that the overall system design effort shifts noticeably to the software side, where your expertise is located, in the implementation of core business logic based on a proven and standardized platform.&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>Recruiting the right people </title>
    <link>http://blog.streamingcores.com/index.php?/archives/16-Recruiting-the-right-people.html</link>

    <description>
        Recruiting people is always fun. Really. Not because I&#039;m a fan of the &lt;a href=&quot;http://www.google.com/search?q=technical+interview+puzzles&quot; title=&quot;Puzzles&quot;&gt;puzzles&lt;/a&gt; that you need to solve during the interview process. That&#039;s quite typical for big companies like Google. No, the reason is pretty simply: I like to talk to people. From time to time I need to help out HR folks at a career fair and so it happens that I talk to graduates or young professionals seeking for a job in the tech sector. The atmosphere there is more relaxed than in the sitting-in-the-office-and-being-observed interview situation. People show up at your booth, you start with some small-talk and suddenly you&#039;re in a conversation. (Or not, depending on your counterpart.)&lt;br /&gt;
&lt;br /&gt;
Neither a resumee nor a telephone interview can reach the level of face-to-face communication. You can look into the person&#039;s eyes, find out how they think about technologies, what&#039;s important for them and what they expect from their future employer. Wait a minute, this is an interview?! Right. You haven&#039;t noticed? Good. Of course, this is just an early stage of the whole procedure but it&#039;s a very effective way to reject 50% of the resumees that get sent to you where the applicants proved to be unable to read and understand your job description.&lt;br /&gt;
&lt;br /&gt;
OK, you&#039;re dedicating an entire day for talking to applicants. But how can you know if it&#039;s worth enough to schedule a follow-up interview? I&#039;ve developed a gut feeling over the years and some techniques:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;You&#039;re looking for &lt;strong&gt;self-motivated&lt;/strong&gt; people. It&#039;s when you ask for the last project and they get excited. They&#039;re saying: I love what I do and I&#039;m 100% convinced. See also &lt;a href=&quot;http://en.wikipedia.org/wiki/Motivation#Intrinsic_motivation&quot; title=&quot;Intrinsic motivation&quot;&gt;intrinsic motivation&lt;/a&gt;.&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;You need &lt;strong&gt;curious&lt;/strong&gt; people. Do they ask the right questions? If they do not have a serious interest in your products it&#039;s a waste of time for both sides.&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;You want &lt;strong&gt;quality&lt;/strong&gt;. I don&#039;t know about you, but my threshold is very high. I found it very useful to ask for samples from their last projects. This gives me the chance to discuss important parts of it in detail in order to get a better understanding of their problem solving skills.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;
Yes, it&#039;s June 2009, the world is in the middle of a recession, the news are full with layoffs, entire industries are collapsing and I&#039;m writing about hiring. But each downturn also has its winners. This is a chance for small and medium sized companies with long-term strategies and innovative products on their roadmaps. It&#039;s the chance to attract the best and creative people by demonstrating your strength: You are still hiring. And it feels good.&lt;br /&gt;
&lt;br /&gt;
In times of economic growth and rising stock markets when finding good engineers is unlike harder, you may end up in the following situation: the candidate is fresh from college, seems to fit the job and you&#039;re getting into the &quot;salary topics&quot;. Now things are getting a bit odd. It turns out that there is another offer from big-company and this and that bonus etc. everything is urgent, headhunter, &lt;em&gt;younameit&lt;/em&gt;.&lt;br /&gt;
Emergency stop!&lt;br /&gt;
You can play the money game but is it worth playing it? Remember technique #1? It&#039;s not! I&#039;m not saying &quot;Don&#039;t pay decent salaries.&quot; The opposite is true. You need to pay decent salaries in order to get the best people. It&#039;s even worse. You also need to offer top-quality working conditions (but that&#039;s another story)! I&#039;m saying you&#039;re not the two year projects driven consulting gig hoster throwing ridiculous amounts of money at applicants just because of urgent resource requirements.&lt;br /&gt;
Let him go! This can be very disappointing but you should be thankful because you found it out so early.&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>embedded world 2009 - personal retrospective</title>
    <link>http://blog.streamingcores.com/index.php?/archives/15-embedded-world-2009-personal-retrospective.html</link>

    <description>
        You can guess it from the title. I&#039;ve attended embedded world this year (again). This is my personal review from Europes biggest exhibition about embedded technologies held in Nuremberg (Germany) from March 3rd to 5th.&lt;br /&gt;
&lt;br /&gt;
I like the narrow focus on embedded hardware, software and services. The whole event fits in only four pavilions but has plenty to offer. Unlike some big events (e.g. &lt;a href=&quot;http://www.electronica.de/en&quot; title=&quot;electronica&quot;&gt;electronica&lt;/a&gt;, &lt;a href=&quot;http://www.cebit.de/homepage_e&quot; title=&quot;Cebit&quot;&gt;Cebit&lt;/a&gt;) it&#039;s still growing: a 25% increase in exhibitors compared to last year. (I&#039;ve found no official statistics about the visitor numbers, yet.) &lt;br /&gt;
&lt;br /&gt;
As last year I took the train and arrived around 11am in Nuremberg. Passing by the first busy booths gave me a good feeling. It seemed like recession was dispelled from these pavilions. Maybe I&#039;m wrong, but the general mood of the visitors didn&#039;t show a sign of crisis. Of course, everybody is aware of the exceptional situation and I expect more people getting laid off in the electronics industry but despite of all bad news the booths showed high load.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve noticed an interesting trend going on in the computer on modules (&lt;a href=&quot;http://en.wikipedia.org/wiki/Computer-on-module&quot;&gt;COM&lt;/a&gt;) business. Multiple vendors are jumping on the bandwagon of increased system-integration. In order to reduce the number of components, save PCB space and power they start to offer modules with on-board FPGAs. The FPGA is connected through a 1x PCIe lane to the chipset and provides external (serial) connectivity for I2C, CAN, Ethernet etc. This sounds contradictory to reducing power? Wait. The best news is that you have access to the unused FPGA fabric and can insert your logic there. Even better, get rid of the CAN core and friends  and occupy it all!&lt;br /&gt;
&lt;br /&gt;
In the software pavilion I&#039;ve went to the Trolls (aka &lt;a href=&quot;http://www.qtsoftware.com&quot;&gt;Qt Software&lt;/a&gt;) and talked about the latest Qt release. Starting with version 4.5 the Qt framework is also offered as &lt;a href=&quot;http://www.gnu.org/copyleft/lesser.html&quot;&gt;LGPL&lt;/a&gt;ed package. That means you can link your closed source applications to the library without paying any license fees. That&#039;s good news for independent developers and micro &lt;a href=&quot;http://en.wikipedia.org/wiki/Independent_software_vendor&quot;&gt;ISV&lt;/a&gt;s. I was wondering how Qt Software is making money now. I got a lengthy explanation which can be summarized to: More developers, more mobile applications, higher Nokia phone sales. Nokia pays the bills. Additionally, the Qt Extended stand-alone product is discontinued. The last release will be 4.4.3. All Qt Extended features will be moved later into the Qt framework.&lt;br /&gt;
&lt;br /&gt;
So far. In case things should get worse next year we can still ask Gov. Schwarzenegger for a keynote...erm &lt;a href=&quot;http://www.youtube.com/watch?v=PfXlITVuW6E&quot;&gt;click here&lt;/a&gt;. 
    </description>
</item>
<item>
    <title>New FPGAs for high-volume applications</title>
    <link>http://blog.streamingcores.com/index.php?/archives/14-New-FPGAs-for-high-volume-applications.html</link>

    <description>
        Recently &lt;a href=&quot;http://www.xilinx.com&quot;&gt;Xilinx&lt;/a&gt; caught my attention with the announcement of their &lt;a href=&quot;http://www.xilinx.com/products/v6s6.htm&quot; title=&quot;Spartan-6 / Virtex-6&quot;&gt;next generation Virtex and Spartan FPGA platforms&lt;/a&gt;. Having worked with the Spartan series I was really excited to see the version number double (!) from 3 to 6. Yes, say hello to Spartan-6. And...Virtex-6 of course but I&#039;m focusing on the low-cost family for now since it got the overall (and long awaited) face lift.&lt;br /&gt;
&lt;br /&gt;
The first three things I&#039;ve noticed were (1) the addition of serial transceivers, (2) an increased MUL : LUT ratio and (3) memory controller/&lt;a href=&quot;http://en.wikipedia.org/wiki/PCIe&quot; title=&quot;PCIexpress&quot;&gt;PCIe&lt;/a&gt; endpoint blocks. Finally! Finally, Xilinx made it and added built-in support for high-speed serial communication. No need to look over with envy at Altera or Lattice with their low-cost families anymore...wait a minute. &lt;a href=&quot;http://www.altera.com&quot;&gt;Altera&lt;/a&gt; announced new &lt;a href=&quot;http://www.altera.com/products/devices/arria-fpgas/arria-ii-gx/&quot;&gt;Arria II GX&lt;/a&gt; devices and &lt;a href=&quot;http://www.latticesemi.com&quot;&gt;Lattice&lt;/a&gt; is about to release the &lt;a href=&quot;http://www.latticesemi.com/products/fpga/ecp3/&quot;&gt;ECP3M&lt;/a&gt; family. But don&#039;t expect a comparison now. More or less, the three points apply to all low-cost families across vendors:&lt;br /&gt;
&lt;br /&gt;
(1) The Spartan-6 platform is split into two classes: seven devices without high-speed serial I/O and four devices with 2 / 4 / 8 GTP low-power transceivers/PCIe. While Altera and Lattice are pushing the second generation of low-cost devices with serial transceivers out of the door it&#039;s Xilinx first device in the price-sensitive market. And it&#039;s time. High-speed serial connectivity is becoming &lt;em&gt;the&lt;/em&gt; emerging standard in complex embedded systems. And believe me systems will get even more complex and demand more bandwidth in the future. To overcome complexity and reduce development time vendors need to support a well established communication standard. And what &lt;a href=&quot;http://en.wikipedia.org/wiki/USB&quot; title=&quot;Universal Serial Bus&quot;&gt;USB&lt;/a&gt; is for the desktop market will PCIe become for embedded/industrial applications.&lt;br /&gt;
&lt;br /&gt;
(2) Marketing folks identified mass market audio and video applications as typical innovation areas. These applications are driven by digital signal processing (DSP) and require lots of computing power. So give &#039;em multipliers + accumulators aka DSP slices aka (sys)DSP blocks! That&#039;s my second point: an increased number of multipliers per logic resources. Up to 182 DSP48E1 slices in the biggest Spartan-6 devices are waiting for your DSP algorithms. And DSP is everywhere especially inside todays FPGAs. Or are you (mis)using them for glue-logic?&lt;br /&gt;
&lt;br /&gt;
(3) Memory controller blocks are just another step towards systems-on-chip. They&#039;ll become handy when synthesizing a soft CPU core into the fabric or when large amounts of data need to be buffered. Lattice had them already built in the ECP2M Programmable I/O. They really set the benchmark in the price-sensitive market then. I can imagine that Xilinx lost some customers to Lattice here...&lt;br /&gt;
&lt;br /&gt;
However, competition goes on. You can meet all three FPGA vendors at &lt;a href=&quot;http://www.embedded-world.de/en/&quot;&gt;embedded world&lt;/a&gt; in Nuremberg this week from Tuesday to Thursday (3.-5 March 2009) and give them feedback on their latest technology.&lt;br /&gt;
&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>About templates and optimization</title>
    <link>http://blog.streamingcores.com/index.php?/archives/13-About-templates-and-optimization.html</link>

    <description>
        Aside from reverse engineering FPGA bitstreams I&#039;ve finished (Yes!) another pending programming project last week. I was helping out a fellow programmer, let&#039;s call him Bob, who was busy doing other things and had no time to hack away a set of high-prioritized items from his TODO list. So I pulled these items from his list over to mine. I have plenty of experience doing these troubleshooting jobs and I always feel like a surgeon carefully inspecting the innards (source code) of the patient (application) before cutting out tumors (bugs) or implanting organs (patches)...&lt;br /&gt;
&lt;br /&gt;
Ok, to come back to the problem: I was supposed to extend a desktop application that was designed and written from ground up by Bob two years ago in C++ (using &lt;a href=&quot;http://www.qtsoftware.com/&quot; title=&quot;Qt cross-platform development framework&quot;&gt;Qt&lt;/a&gt;). Hmmm...C++ and templates. I can hear your brain working now. Right, we&#039;re not talking about document, XML or whatever templates here. It&#039;s the well known C++ programming language feature (see &lt;a href=&quot;http://en.wikipedia.org/wiki/Template_metaprogramming&quot; title=&quot;Metaprogramming&quot;&gt;Template Metaprogramming&lt;/a&gt;). I use it regularly. It&#039;s a great feature if the compiler fully supports it. If you&#039;re familiar with templates feel free to skip the next paragraph.&lt;br /&gt;
&lt;br /&gt;
Templates implement the concept of parameterized types in C++ (&lt;a href=&quot;http://mindview.net/Books/TICPP/ThinkingInCPP2e.html&quot; title=&quot;Thinking in C++&quot;&gt;Bruce Eckel, Thinking In C++&lt;/a&gt;). It&#039;s a syntax extension that tells the compiler how to define a type from a generic type declaration. Wow...that was abstract. How does it work? The first time you instantiate a template with a given type (as parameter) in your code this parameter is inserted by the compiler into your template declaration and creates an entirely new type. Ideally templates are designed in such a way that you can throw almost every type on them (as parameter). Yes, even user-defined types. Think of it as another way of reusing code besides the &lt;a href=&quot;http://en.wikipedia.org/wiki/Object-oriented_programming&quot; title=&quot;Object Oriented Programming&quot;&gt;OOP&lt;/a&gt; based inheritance approach. The syntax can also be applied to function definitions (aka function templates). Templates are perfectly suited for container classes like lists, queues etc. of arbitrary objects. Personally, I&#039;ve used templates in digital signal processing algorithms to create number format independent filters amongst others. Other classic/good examples are the &lt;a href=&quot;http://www.sgi.com/tech/stl/&quot; title=&quot;Standard Template Library&quot;&gt;C++ Standard Template Library&lt;/a&gt; (STL), the &lt;a href=&quot;http://www.boost.org/&quot; title=&quot;BOOST C++ Libraries&quot;&gt;Boost C++ libraries&lt;/a&gt; or Intel&#039;s &lt;a href=&quot;http://www.threadingbuildingblocks.org/&quot; title=&quot;Threading Building Blocks&quot;&gt;Threading Building Blocks&lt;/a&gt; (TBB).&lt;br /&gt;
&lt;br /&gt;
The challenge with templates is that it&#039;s very convenient and easy to use them but significantly harder to create them in a structured and maintainable way. Did I mention that templates are entirely implemented in header files? That may become an issue as I&#039;ll explain later. Now guess what happened...the app I was working on was full of templates. Driven by the idea of increasing runtime performance Bob applied the template concept to nearly 100% of the data path related code. His intention was to let the compiler optimize e.g. (inline) nested function calls introduced by the object hierarchy and flatten them out at compile time. No virtual function lookups and stuff in the binary...anymore. Highly instruction level optimized code. The perfect solution for maximum performance...&lt;br /&gt;
&lt;br /&gt;
&lt;em&gt;Problem1:&lt;/em&gt; The template concept is no silver bullet for your performance problems. The 80/20 rule still applies: 80 percent of the execution time is spent in 20 percent of your code. (Or was it 90/10?) Use a profiler and analyze your code carefully &lt;strong&gt;before&lt;/strong&gt; delving into advanced and unreadable templates of templates of template constructs.&lt;br /&gt;
&lt;br /&gt;
&lt;em&gt;Problem2:&lt;/em&gt; While it might be a (good) academic exercise to create an entirely template based object hierarchy to gain a deeper understanding of the language concept it&#039;s probably a bad idea to apply this pattern to your entire framework. First, this makes it hard or even impossible to put your framework later in a library and second, the guy next door (patching your stuff) appreciates readable code.&lt;br /&gt;
&lt;br /&gt;
&lt;em&gt;Problem3:&lt;/em&gt; Have you ever tried to debug and/or patch this stuff? When 98% of the files are header files you&#039;ll trigger a recompile of the entire project just because you&#039;ve changed a variable name from i to tableIdx, and not to mention the weird error messages you might get in seemingly unrelated sections of the code (although compilers got better and precompiled headers may reduce the pain a bit). Yes, I&#039;m exaggerating here but you&#039;re getting the point.&lt;br /&gt;
&lt;br /&gt;
Don&#039;t get me wrong. This is no rant against templates! Letting the compiler write type-safe code for you is of great value. Also, generic datatypes tremendously reduce the amount of duplicated code. But be careful when using this powerful language feature. Use it only where applicable i.e. to maximize code reuse, restrict it to subsets of your framework, keep in mind who might read/use the code and please please don&#039;t misuse it for your global optimization strategies.&lt;br /&gt;
&lt;br /&gt;
You want to know the end of the story? Well, just another surgical intervention.&lt;br /&gt;
&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>Stepping into parallel computing</title>
    <link>http://blog.streamingcores.com/index.php?/archives/12-Stepping-into-parallel-computing.html</link>

    <description>
        While writing my last post I remembered how I came across parallel programming the first time. I was at university and at that time looking into cracking passwords from hashes e.g. on UNIX-like systems from /etc/shadow using &lt;a href=&quot;http://www.openwall.com/john/&quot;&gt;John the ripper&lt;/a&gt;. BTW when saying cracking I mean the real &lt;a href=&quot;http://en.wikipedia.org/wiki/Brute_force_attack&quot; title=&quot;Brute Force Attack&quot;&gt;brute-force&lt;/a&gt; approach, no dictionaries and stuff. The SUN workstations were too slow at that time. So I bought a bunch of 3G base station processor boards at ebay (see picture) armed with 4 DSPs and a PowerPC. Pretty hot stoff that time (and ridicously cheap, about 10 EUR each). They were used by a huge german company (starting with S) in a development project and sold at the end.&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://streamingcores.com/images/blog09-photo.jpg&quot; width=&quot;640&quot; height=&quot;438&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
I had no clue about FPGAs that time and DSPs were my natural choice. I was young, I knew how to program in C, I was using an open source password cracking tool and the DSPs simply matched my number &lt;a href=&quot;http://en.wikipedia.org/wiki/Number_crunching&quot; title=&quot;Number Crunshing&quot;&gt;crunching requirements&lt;/a&gt;. What else do I need more? Erm...board manuals, schematics, professional development tools (yep, no gcc port for TMS320C6X available), debugging cables, probably a VME backplane and more knowledge about these architectures as I found out. Finally I managed to power the board using an old PC power supply and connected to the console port. Woohoo...it was still working and the bootloader was looking for a FTP server to pull the OS image from. My first steps into the field of parallel computing...&lt;br /&gt;
&lt;br /&gt;
To bring the story to an end. I&#039;ve never run any code on the DSPs due to the lack of board manuals, schematics, professional development tools...you name it. However, I&#039;ve learned that (1) certain computing tasks can be broken down to a restricted set of arithmetic operations but need to be run at the maximum achievable speed and (2) it makes sense for a group of problems to split work among multiple processors (or cores) to finish computations in less time.&lt;br /&gt;
&lt;br /&gt;
From todays perspective I&#039;ve discovered another interesting aspect which is more related to the overall system concept. If you take a closer look at the architecture of the telco blade, you&#039;ll find similarities to modern processor architectures. There are specialized processing units (4 DSPs) grouped around a central processing unit (PowerPC) on the PCB. Does this remind you of something? No? Replace PowerPC with PPE, DSPs with SPEs and PCB with die and you&#039;ll get pretty close to the &lt;a href=&quot;http://en.wikipedia.org/wiki/Cell_(microprocessor)&quot; title=&quot;Cell Broadband Engine&quot;&gt;CellBE architecture&lt;/a&gt; (see figure below, taken from &quot;Introduction to the Cell Broadband Engine&quot;).&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://streamingcores.com/images/blog09-figure.jpg&quot; width=&quot;640&quot; height=&quot;468&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
Lessons learned from application specific processor boards lead to main stream processor architectures on a single die. I&#039;m wondering if the telecommunications industry is still the number one driver of the processor evolution. Or is it the gaming industry with its demand for high bandwidth graphics hardware that is most influential on modern processor designs? 
    </description>
</item>
<item>
    <title>Great multi-core resources from Intel</title>
    <link>http://blog.streamingcores.com/index.php?/archives/11-Great-multi-core-resources-from-Intel.html</link>

    <description>
        Today I want to point you to the parallel programming resources from Intel. Wether you&#039;re new to multi-core development or you&#039;re looking for a specific answer to a concurrent programming question, this website is the place to start:&lt;br /&gt;
&lt;br /&gt;
&lt;a href=&quot;http://software.intel.com/en-us/multi-core/&quot; title=&quot;Intel Software Network&quot;&gt;http://software.intel.com/en-us/multi-core/&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Even experienced programmers will find it useful to browse through the whitepapers section or take a closer look at the technology demos (source code included). I know that all content is focused on Intel&#039;s technology but I like the way Intel cares about their developer community. Erm...I must admit that I&#039;m a little biased towards their technology. Speaking of community, there are also several blogs from Intel employees covering all sorts of programming topics from threading to the latest Intel tools or libraries.&lt;br /&gt;
&lt;br /&gt;
Specifically, I want to highlight the great weekly &lt;a href=&quot;http://www.blogtalkradio.com/MulticoreSoftware&quot; title=&quot;Multicore Software Podcast&quot;&gt;podcast&lt;/a&gt; hosted by Aaron Teersteg. He&#039;s the community manager for Threading for Multi-Core. In his podcasts (approx. 15 minutes in length) &lt;a href=&quot;http://software.intel.com/en-us/blogs/author/aaron-tersteeg/&quot; title=&quot;Aaron&#039;s Blog&quot;&gt;Aaron&lt;/a&gt; interviews his guests about their relation to parallel programming. I&#039;m listening to it reguarly. It&#039;s short (but long enough for the daily ride to the office) and gives you a good start into related topics e.g. &lt;a href=&quot;http://www.khronos.org/opencl/&quot; title=&quot;The Open CL Standard&quot;&gt;OpenCL&lt;/a&gt; or &lt;a href=&quot;http://en.wikipedia.org/wiki/Functional_programming&quot; title=&quot;Functional programming information&quot;&gt;functional programming&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
A last tip for today: Check out the &lt;a href=&quot;http://software.intel.com/en-us/articles/smoke-game-technology-demo/&quot; title=&quot;Smoke Game technology demo&quot;&gt;Smoke Game technology demo&lt;/a&gt;!&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://software.intel.com/file/9782&quot; alt=&quot;&quot; /&gt; 
    </description>
</item>
<item>
    <title>Software backed watchdog timers (WDT part 3)</title>
    <link>http://blog.streamingcores.com/index.php?/archives/10-Software-backed-watchdog-timers-WDT-part-3.html</link>

    <description>
        This is the last post in my series (see &lt;a href=&quot;http://blog.streamingcores.com/index.php?/archives/8-Watchdog-timers-WDT.html&quot; title=&quot;WDT Part 1&quot;&gt;part 1&lt;/a&gt; and &lt;a href=&quot;http://blog.streamingcores.com/index.php?/archives/9-Somewhat-intelligent-watchdog-timers-WDT-part-2.html&quot; title=&quot;WDT Part 2&quot;&gt;part 2&lt;/a&gt;) about watchdog timers (WDTs). After that one I will stop bugging you with the topic. But probably you&#039;ve already got the impression I have a certain weakness for WDTs. Anyway let&#039;s take this to the next level and find out how to improve the solution from my last post even further.&lt;br /&gt;
&lt;br /&gt;
Right now we have our application directly pinging the watchdog hardware i.e. there&#039;s a watchdog module/component linked into the main executable doing all the things required to prevent a timeout. That was easy to do because it only required us to insert two function calls for (de)initialization and one for pinging inside our mainloop. Unfortunately it also introduced a tight coupling between the main application and the watchdog handling logic (see figure 1).&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://streamingcores.com/images/blog07-figure1.png&quot; alt=&quot;Figure 1&quot; width=&quot;320&quot; height=&quot;284&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
Other drawbacks of this approach are:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;loss of flexibility and reuse (OK, reuse is possible but only on the code-level)&lt;/li&gt;&lt;li&gt;adding support for new WDT hardware requires rebuilding the application or may have other side effects&lt;/li&gt;&lt;li&gt;it&#039;s harder to verify/test the watchdog handling logic&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;
To overcome these drawbacks we put the watchdog handling logic into a separate executable and introduce a small and light-weight layer that wraps the interprocess communication (&lt;a href=&quot;http://en.wikipedia.org/wiki/Inter-process_communication&quot; title=&quot;InterProcess Communication&quot;&gt;IPC&lt;/a&gt;) which is now required. I&#039;ll leave the choice of an appropriate IPC API to you. It should be non-blocking and easy to use (&lt;a href=&quot;http://en.wikipedia.org/wiki/SOAP_(protocol)&quot; title=&quot;Simple Object Access Protocol&quot;&gt;SOAP&lt;/a&gt; is probably a bad idea). Certainly we&#039;ve added slightly more complexity to the big picture (see figure 2) but at the same time it removes any dependencies from the main application to the WDT hardware and it can be run even when there is no watchdog present at all.&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://streamingcores.com/images/blog07-figure2.png&quot; alt=&quot;Figure 2&quot; width=&quot;320&quot; height=&quot;284&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
Having done these architectural changes we&#039;re now ready to introduce more logic to improve the WDT behavior. We can even add a user interface to the newly created watchdog application. Please note, that the term watchdog application/process refers now to a piece of software sitting on top of the WDT hardware.&lt;br /&gt;
&lt;br /&gt;
The nonobvious outcome of inserting a watchdog process between the main app and the WDT hardware is that we&#039;re getting an additional high-level fallback layer without degrading the original WDT functionality. Both applications + OS remain under WDT control (remember this is a piece of hardware hooked up to the reset line). This gives us more opportunities to take action when the main app stops pinging the watchdog process e.g. we could try to kill and restart the main application process once and ultimately initiate a shutdown sequence which is healthier than the hard reset. And that&#039;s not all! Further enhancements may include:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;Restore the factory settings of the main application after a reboot in order to undo user changes (works only if the watchdog process starts first).&lt;/li&gt;&lt;li&gt;Log all actions / errors or display a warning message on a local display. I recommend to use the operating system log.&lt;/li&gt;&lt;li&gt;Send or broadcast a message before shutdown over the network to a monitoring instance if available.&lt;/li&gt;&lt;li&gt;&lt;em&gt;&amp;lt;put your feature here&amp;gt;&lt;/em&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;
I&#039;m pretty sure you&#039;ll have more features to add, but always remember: &lt;strong&gt;Keep it simple! It&#039;s definitely not the right application to bloat with features.&lt;/strong&gt; 
    </description>
</item>
<item>
    <title>Somewhat intelligent watchdog timers (WDT part 2) </title>
    <link>http://blog.streamingcores.com/index.php?/archives/9-Somewhat-intelligent-watchdog-timers-WDT-part-2.html</link>

    <description>
        While my last post was focused on the hardware-side of watchdog timers (WDTs) I will now discuss more high-level/software concepts of WDTs. So, if you haven&#039;t read &lt;a href=&quot;http://blog.streamingcores.com/index.php?/archives/8-Watchdog-timers-WDT.html&quot; title=&quot;WDT (part 1)&quot;&gt;part 1&lt;/a&gt;. Now is the time! Done? Here we go.&lt;br /&gt;
&lt;br /&gt;
I&#039;m assuming we have a software powered application that requires a full-fledged operating system (OS) and relies on a bunch of peripheral hardware besides a CPU e.g. harddisks, network adaptors etc. Let&#039;s call it server since that might be a valid &lt;a href=&quot;http://en.wikipedia.org/wiki/Use_case&quot; title=&quot;Use case&quot;&gt;use case&lt;/a&gt;. OK. Our application is supposed to run 24/7 somewhere deeply buried at a customer&#039;s site and it&#039;s extremely costly to send out tech-support staff for on-site fixes, just to discover that e.g. someone played around with our settings or temporarily disabled the air condition for maintenance causing the system to lock-up.&lt;br /&gt;
&lt;br /&gt;
Being smart and having read part 1 we pull out this timer thing connected to the reset line of our server and we&#039;re set, right? Wrong. The problem with this approach is its simplicity and tempting ease of implementation. I understand that you want to get things done and push the box out of the door. But this approach brings further implications we havn&#039;t dealt with yet. Remember, our software is &lt;strong&gt;not&lt;/strong&gt; running on a microcontroller. A hard reset should only be the last resort since it puts a lot more stress on all components than a safe shutdown. And what if there is really a broken piece of hardware or the air-condition runs amok and starts heating? It will result in an endless reboot-reset cycle causing even more harm.&lt;br /&gt;
&lt;br /&gt;
Let&#039;s tackle the &quot;endless reboot-reset cycle&quot; problem first since it can be applied without changes to the hard reset implementation. The idea is quite simple: we extend the WDT by counting the number of timeouts. If a timeout occurs our application is obviously not running the way it was intended to. So we maintain a counter or some flags (ideally this is done in hardware) and additionally log the time in a non-volatile way. Observing more than X timeouts in a timeframe less than Y seconds (insert appropriate values for your application) will power down the system. Now, human intervention is required for a restart.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve tried to illustrate this in a loose &lt;a href=&quot;http://en.wikipedia.org/wiki/Unified_Modeling_Language&quot; title=&quot;Unified Modeling Language&quot;&gt;UML&lt;/a&gt;-style statechart (see figure), on the left hand-side the application process and on the right hand side the watchdog process. The dashed line denotes concurrency. State names are written in bold + underline. Lines with arrows denote state transitions. Transitions can be conditional (with label) or unconditional without label. Including the WDT reload/ping of the application process was a bit tricky. Note, the arrow which is overlapping the dashed line. I&#039;ve found no better notation. If somebody can shed light onto that issue, feel free to comment.&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://streamingcores.com/images/blog06-figure1.png&quot; alt=&quot;watchdog statechart&quot; width=&quot;480&quot; height=&quot;416&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
In my next post I will cover the &quot;safe shutdown&quot; issue and why it is important to split this functionality from your core software.&lt;br /&gt;
&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>Watchdog timers (WDT)</title>
    <link>http://blog.streamingcores.com/index.php?/archives/8-Watchdog-timers-WDT.html</link>

    <description>
        (Delayed) Happy New Year everyone! My New Year&#039;s resolution for this blog is to keep posts shorter than the last one. Looking over it again made me feel that it was a bit too heavy.&lt;br /&gt;
&lt;br /&gt;
Now back to the topic. I&#039;m going to blog about watchdog timers (WDT) today because it was one of the main topics that bothered me for this week. Watchdogs are very common in embedded systems to resolve system/software hangs. These non-interactive systems need a mechanism to automatically reboot or recover from failure states without human intervention. So how does it work? The idea is quite simple. As the name says it&#039;s a timer which itself finally boils down to a counter. That counter is part of a dedicated simple circuitry running independently from the processor and continuously counting down e.g. at a speed of 1 tick per second. If the counter reaches 0 a signal is asserted and ... something has to happen. Usually you&#039;ll find the following two scenarios:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;The hard way: the signal is hard-wired to the reset line and boom ... the system restarts.&lt;/li&gt;&lt;li&gt;An interrupt is triggered and a handler can perform error logging / cleanup operations or initiate a safe shutdown. What if the handler fails? Well, meanwhile the counter is reloaded and starts counting down again but this time going for the hard reset.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;
I&#039;m sure you&#039;ve already figured out what to do. To prevent timeouts one has to periodically reload the counter to assure your program is still alive. This is also referred to as pinging.&lt;br /&gt;
&lt;br /&gt;
WDTs can be found in nearly every microcontroller today, on server mainboards or dedicated &lt;a href=&quot;http://www.quancom.de/qprod01/eng/produkte/uebersicht/watchdog.htm&quot; title=&quot;Watchdog PCI/PCIe cards&quot;&gt;extension cards&lt;/a&gt; with programming support.&lt;br /&gt;
&lt;br /&gt;
In my case it happened to be a &lt;a href=&quot;http://www.supermicro.com/products/motherboard/xeon3000/3210/x7sbe.cfm&quot; title=&quot;Super Micro X7SBE&quot;&gt;Super Micro X7SBE motherboard&lt;/a&gt; with on-board watch dog. Since the board is equipped with an ICH9R chipset which contains WDT functionality (as part of Intel&#039;s TCO logic) I&#039;ve expected the BIOS to use that one. Later (about a dozen reboots) I figured out that it&#039;s the watchdog in the W83627HG I/O chip. Of course, this is not documented in the mainboard manuals and there&#039;s also no driver for the device on the drivers disc (except, if you count the winio library). So, if you&#039;re planning to use an on-board WDT: Watch out! Don&#039;t walk into this trap.&lt;br /&gt;
&lt;br /&gt;
&lt;img alt=&quot;&quot; src=&quot;http://upload.wikimedia.org/wikipedia/commons/thumb/c/cc/Buldog_angielski_000pl.jpg/150px-Buldog_angielski_000pl.jpg&quot; width=&quot;150&quot; height=&quot;215&quot; border=&quot;0&quot;/&gt;&lt;br /&gt;
&lt;br /&gt;
 
    </description>
</item>
<item>
    <title>Upgrading from Virtex to Spartan</title>
    <link>http://blog.streamingcores.com/index.php?/archives/7-Upgrading-from-Virtex-to-Spartan.html</link>

    <description>
        Granted, I&#039;ve left out the important numbers in the title. To be more precise: this post is only for those of you maintaining an existing PCB design containing one or more &lt;a href=&quot;http://www.xilinx.com/support/mysupport.htm#Virtex-II&quot; title=&quot;Xilinx Virtex-II documentation&quot;&gt;Xilinx Virtex-II devices&lt;/a&gt;.&lt;br /&gt;
Many times it happens that you&#039;re forced to touch the board again because of discontinued parts (no, &lt;strong&gt;not&lt;/strong&gt; the Virtex-II), minor circuit improvements or feature requests from your customers. So why not upgrade the Virtex-II to a package-compatible &lt;a href=&quot;http://www.xilinx.com/products/spartan3a/&quot; title=&quot;Spartan-3A products&quot;&gt;Spartan-3A&lt;/a&gt; at the same time?&lt;br /&gt;
No way!? Why should you do that? Valid points but before I&#039;m going to explain why it could make sense in certain situations I want to introduce the prerequisites for the upgrade:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;it works only for the &lt;strong&gt;FG256&lt;/strong&gt; package (Virtex-II devices from XC2V40 to XC2V1000)&lt;/li&gt;&lt;li&gt;the system frequency must not exceed 300 MHz&lt;/li&gt;&lt;li&gt;your VHDL/Verilog design is without (too many) device specific instantiations or physical constraints&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;
If these (heavy) requirements apply to your design this could be your migration path:&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;&lt;tr&gt;&lt;th&gt;Original part&lt;/th&gt;&lt;th&gt;Potential upgrade part&lt;/th&gt;&lt;th&gt;Upgrade part&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;XC2V40&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;XC3S200A&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;XC2V80&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;XC3S200A&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;XC2V250&lt;/td&gt;&lt;td&gt;XC3S400A/XC3S700A*&lt;/td&gt;&lt;td&gt;XC3S1400A&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;XC2V500&lt;/td&gt;&lt;td&gt;XC3S700A*&lt;/td&gt;&lt;td&gt;XC3S1400A&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;XC2V1000&lt;/td&gt;&lt;td&gt;XC3S1400A*&lt;/td&gt;&lt;td&gt;N/A&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;
With the extension of the Spartan-3A family in August 2008 Xilinx now ships the entire device family in the &lt;strong&gt;FT256&lt;/strong&gt; package. The &lt;a href=&quot;http://www.xilinx.com/support/documentation/package_specs/ft256.pdf&quot; title=&quot;FT256/FTG256 package&quot;&gt;FT256&lt;/a&gt; is slightly thinner but &lt;a href=&quot;http://www.xilinx.com/support/documentation/package_specs/fg256.pdf&quot; title=&quot;FG256/FGG256 package&quot;&gt;FG256&lt;/a&gt; &quot;compatible&quot;. Now, it&#039;s possible to replace a small to mid-range Virtex-II by using a large Spartan-3A.&lt;br /&gt;
&lt;br /&gt;
In the table above I did the logic, multiplier and DCM resources comparison for you to find the appropriate replacement candidates (see third column). The second column lists all Spartan-3A devices that have the same amount of CLB logic than their Virtex-II &quot;counterpart&quot; but less multiplier/BlockRAM resources. So if your design doesn&#039;t use 100% of these non-CLB resources they might be an alternative, too. That&#039;s why I&#039;ve called them potential upgrade parts.&lt;br /&gt;
&lt;br /&gt;
Now back to the question: Why? I know it&#039;s a lot of work and not as easy as replacing just the FPGA in the next board revision. It requires adjusting the core voltage, changing the layout, rewriting constraints, updating production specs etc. etc. But it will pay off in the following points:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;lower costs per board&lt;/li&gt;&lt;li&gt;lower power consumption (due to reduced core voltage)&lt;/li&gt;&lt;li&gt;less heat dissipation&lt;/li&gt;&lt;li&gt;more logic/CLB resources for future extensions&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;
Which I consider valid reasons for small to large volume products and especially battery powered applications. Please note, this post is not meant to be a HOWTO or guide nor can I guarantee you that it works for all designs. I wanted to share the idea and research work with you. In case you find this interesting or want to exchange experiences I would be glad to hear back from you. 
    </description>
</item>

</channel>
</rss>
