Recently I read a post by Steve Leibson that referred a very passionate on-going debate on the "ARM Based Group" on LinkedIn from a few industry veterans about the pros and cons of 8-bit versus 32-bit MCUs.I did think about jumping in myself and adding to the debate but instead I thought I'd step back and try to clear the air about where my colleagues and I at ARM think the benefits of 32-bit lie for your average current 8-bit user. First let's list out some of the main areas of debate in the discussion:
Here are my thoughts on each, for what they are worth, one at a time:
8-bit is always lower power. I struggle with this for lots of reasons. First the bus width doesn't tell you much about power, the amount of data/instructions I fetch from flash and RAM does. A few things that count in favour of ARM Cortex™-M processors here are our dense code (which means fewer expensive flash accesses when running your program) and our flexible support for bytes, half words and words. Flexible memory access means that the variables you read and write are always the size you need them to be; you aren't forced into all of your variables suddenly growing to 32-bits in size.
On top of that the simplicity position that some people take is very subjective. The ARM architecture is simple, in fact that is the point, no awkward banking of memory, registers, state, no fiddly hardware stacks and the like.
So to say that an 8-bit MCU will always be lower power just feels too simplistic to me and to be completely honest, is just not true.
Most applications can be handled by 8-bit MCUs. When I chat to software developers I start and end my story by talking about software productivity and reuse. It is often the case that the cost of the MCU is swamped by the cost of developing the firmware. This is why many of them love their Cortex-M processor-based MCUs, even for fairly simple applications. Trace, detailed profiling, very productive debuggers and great code reuse all count for a lot when you're part of a small team under a lot of pressure to get the firmware for your next product complete.
As an example, I had the pleasure of joining Freescale for their FTF event back in June and there I met one developer that was particularly excited about the prospect of no longer having to use any more PICs in his designs because he was tired of the unproductive and inefficient world that they were locked into.
8-bit is better at handling interrupts. This is another one that surprises me because no matter how I add things up, the time spent processing the interrupts dominates any calculations that I do. As a result the debate about five cycles here, ten there to get to the handler seems to disappear into nothing when the few hundred instructions to process the interrupt are so much more efficient on an ARM MCU. Some conclusions. What interested me in the LinkedIn discussion was that there was a real mixture of the processor and the devices. This can make it really hard for a user to unpick and understand. This debate was particularly poor in this regard and I found myself struggling to keep up with the discussion. Let's try to separate out these things to give a more transparent debate about the pros and cons of the latest MCU technology; I believe end users will thank us for it.
Another topic that also seems to rarely come up and yet can make a big impact is code footprint and flash memory usage. I've lost count of the number of times I've had one of my silicon partners tell me that design wins have been won (and fortunately never lost!) based purely on the flash memory size for an OEM's codebase.
Closing remark. To those that think 8-bit is still the right choice for many embedded system developers, take a close look at Freescale Kinetis L, STMicroelectronics STM32 F0, NXP LCP11XX, Energy Micro's Zero Gecko and other ARM MCUs. I know what I'd rather be using, what about you?
I agree. Perhaps 8-bit may have some advantages, but I think those would be small. One advantage is that there isn't really a 32-bit boundary problem. Another advantage: Memory usage for storing code would usually be lower.
But I think I'd always pick a 32-bit solution if I could. It's far easier emulating an 8-bit processor on a 32-bit processor, than emulating a 32-bit processor on an 8-bit procesor. Especially if you need to write 16 bits in one go.
About energy consumption... Someone recently succeeded in getting an 8-bit microcontroller to run Linux. The time it took for it to boot was uhm, I forgot... But probably more than 4 hours, before he saw the command-line prompt. I know, not a fair example, but it's making my point clearer...
In cases like that, it would be much better to wake up from sleep, get the job done, and issue a sleep instruction.
I think most designers (and certainly most embedded programmers) would prefer to work with 32-bit+ machines (and ARM instruction sets).
In most cases the total power consumed over time is usually less,
with the latest generation of small microcontrollers (i.e.M0/M0+).
This is especially true if you have a "bursty" application
where the processor can often be sleeping and only
wake up to do some work and then return to sleep mode.
However, I still need to use 8-bit machines in some applications
where the instantaneous power available is very small (< 3.0 mA)
and the operation is continuous, rather than "bursty";
where the processor needs to remain awake and active the majority of the time.
The lowest active power I've been able to get in an Cortex-M0 class processor
(doing real work with peripherals on and running) is about 1.2mA running at 1.89mHz,
while my older 8-bitter is about half of that for the same functionality.
So while I think that 32-bit class machines will replace 8-bit for the majority
of applications (as they should, IMHO...), there are still some small number of cases
where the older 8-bitters are the right answer...
Just a thought: Did you try setting up a timer interrupt to do a slice of the work on each invocation, and then use wfi in your main-loop ?
Yes, I typically write all interrupt driven code, and my main() is often just a "sleep();" call...
In this particular instance (a 4-20mA current loop bridge with Hart modem interface)
the bridge is constantly updating the line from the (isolated) host interface,
and since it is line powered, you only get about 3mA as a total power budget to play with;
most of which is chewed up by the line interface circuitry and the host isolation transceivers.
So if you get the clock speed low enough to be within the power budget,
there is not really much time that the processor can sleep...
Here's a real-world example.
I just found the smallest LZ4 decompression code I could. It's a 6502 version, which uses 132 bytes of code. An optimized version of the same code would use 118 bytes + some zeropage memory.
I translated this code into Cortex-M0 code, just for fun. I don't know if it will work, but the code uses 44 lines of Cortex-M0 instructions. Quick math tells us that on a Cortex-M0, you'll have to multiply the number of code-lines by two to get the code size in bytes (if you're writing clean assembly code). So we'll get 88 bytes for a LZ4 decompression routine. This could be made even shorter on a Cortex-M3 or M4.
I don't doubt that if your application is computationally intense that the Cortex (and 32 bitters is general)
would win all the time compared to 8 bitters. My real-world example (built and measured with an ammeter)
shows that for applications that are medium rate data transfer without much computation,
that there are still some small number of cases where the older 8-bitters use less instantaneous power.