8-bit versus 32-bit MCUs - The impassioned debate goes on

September 11, 2013

3 minute read time.

Recently I read a post by Steve Leibson that referred a very passionate on-going debate on the "ARM Based Group" on LinkedIn from a few industry veterans about the pros and cons of 8-bit versus 32-bit MCUs.

I did think about jumping in myself and adding to the debate but instead I thought I'd step back and try to clear the air about where my colleagues and I at ARM think the benefits of 32-bit lie for your average current 8-bit user.

First let's list out some of the main areas of debate in the discussion:

8-bit will always be lower power because they are simpler architectures with one-quarter the bus size
Most applications can be handled by 8-bit MCUs and if efficiency is more important than performance then a 32-bit MCU offers no advantage over an 8-bit one
8-bit is far the best solution because the number of cycles to get to an interrupt is less

Here are my thoughts on each, for what they are worth, one at a time:

8-bit is always lower power. I struggle with this for lots of reasons. First the bus width doesn't tell you much about power, the amount of data/instructions I fetch from flash and RAM does. A few things that count in favour of ARM Cortex™-M processors here are our dense code (which means fewer expensive flash accesses when running your program) and our flexible support for bytes, half words and words. Flexible memory access means that the variables you read and write are always the size you need them to be; you aren't forced into all of your variables suddenly growing to 32-bits in size.

On top of that the simplicity position that some people take is very subjective. The ARM architecture is simple, in fact that is the point, no awkward banking of memory, registers, state, no fiddly hardware stacks and the like.

So to say that an 8-bit MCU will always be lower power just feels too simplistic to me and to be completely honest, is just not true.

Most applications can be handled by 8-bit MCUs. When I chat to software developers I start and end my story by talking about software productivity and reuse. It is often the case that the cost of the MCU is swamped by the cost of developing the firmware. This is why many of them love their Cortex-M processor-based MCUs, even for fairly simple applications. Trace, detailed profiling, very productive debuggers and great code reuse all count for a lot when you're part of a small team under a lot of pressure to get the firmware for your next product complete.

As an example, I had the pleasure of joining Freescale for their FTF event back in June and there I met one developer that was particularly excited about the prospect of no longer having to use any more PICs in his designs because he was tired of the unproductive and inefficient world that they were locked into.

8-bit is better at handling interrupts. This is another one that surprises me because no matter how I add things up, the time spent processing the interrupts dominates any calculations that I do. As a result the debate about five cycles here, ten there to get to the handler seems to disappear into nothing when the few hundred instructions to process the interrupt are so much more efficient on an ARM MCU.

Some conclusions. What interested me in the LinkedIn discussion was that there was a real mixture of the processor and the devices. This can make it really hard for a user to unpick and understand. This debate was particularly poor in this regard and I found myself struggling to keep up with the discussion. Let's try to separate out these things to give a more transparent debate about the pros and cons of the latest MCU technology; I believe end users will thank us for it.

Another topic that also seems to rarely come up and yet can make a big impact is code footprint and flash memory usage. I've lost count of the number of times I've had one of my silicon partners tell me that design wins have been won (and fortunately never lost!) based purely on the flash memory size for an OEM's codebase.

Closing remark. To those that think 8-bit is still the right choice for many embedded system developers, take a close look at Freescale Kinetis L, STMicroelectronics STM32 F0, NXP LCP11XX, Energy Micro's Zero Gecko and other ARM MCUs. I know what I'd rather be using, what about you?

Tom Moxon over 10 years ago

I don't doubt that if your application is computationally intense that the Cortex (and 32 bitters is general)
would win all the time compared to 8 bitters. My real-world example (built and measured with an ammeter)
shows that for applications that are medium rate data transfer without much computation,
that there are still some small number of cases where the older 8-bitters use less instantaneous power.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Jens Bauer over 10 years ago

Here's a real-world example.
I just found the smallest LZ4 decompression code I could. It's a 6502 version, which uses 132 bytes of code. An optimized version of the same code would use 118 bytes + some zeropage memory.
I translated this code into Cortex-M0 code, just for fun. I don't know if it will work, but the code uses 44 lines of Cortex-M0 instructions. Quick math tells us that on a Cortex-M0, you'll have to multiply the number of code-lines by two to get the code size in bytes (if you're writing clean assembly code). So we'll get 88 bytes for a LZ4 decompression routine. This could be made even shorter on a Cortex-M3 or M4.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Tom Moxon over 10 years ago

Yes, I typically write all interrupt driven code, and my main() is often just a "sleep();" call...
In this particular instance (a 4-20mA current loop bridge with Hart modem interface)
the bridge is constantly updating the line from the (isolated) host interface,
and since it is line powered, you only get about 3mA as a total power budget to play with;
most of which is chewed up by the line interface circuitry and the host isolation transceivers.
So if you get the clock speed low enough to be within the power budget,
there is not really much time that the processor can sleep...
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Jens Bauer over 10 years ago

Just a thought: Did you try setting up a timer interrupt to do a slice of the work on each invocation, and then use wfi in your main-loop ?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Tom Moxon over 10 years ago

I think most designers (and certainly most embedded programmers)
would prefer to work with 32-bit+ machines (and ARM instruction sets).
In most cases the total power consumed over time is usually less,
with the latest generation of small microcontrollers (i.e.M0/M0+).
This is especially true if you have a "bursty" application
where the processor can often be sleeping and only
wake up to do some work and then return to sleep mode.
However, I still need to use 8-bit machines in some applications
where the instantaneous power available is very small (< 3.0 mA)
and the operation is continuous, rather than "bursty";
where the processor needs to remain awake and active the majority of the time.
The lowest active power I've been able to get in an Cortex-M0 class processor
(doing real work with peripherals on and running) is about 1.2mA running at 1.89mHz,
while my older 8-bitter is about half of that for the same functionality.
So while I think that 32-bit class machines will replace 8-bit for the majority
of applications (as they should, IMHO...), there are still some small number of cases
where the older 8-bitters are the right answer...
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Architectures and Processors blog

Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server

Ker Liu

In-depth analysis of what the PMU of L2D_CACHE_WR counts on the Neoverse N2 server.
- April 15, 2024
Arm SPE: SoC Telemetry & Performance Analysis using Statistical Profiling Extension

Brian Jeff

We refer to the SPE performance methodology whitepaper published by Arm for details on the content of this blog.
- December 8, 2023
Implementing the WebAssembly bitmask operations on the 64-bit Arm architecture

Anton Kirilov

We discuss some of the challenges that we face when we are trying to implement the WebAssembly SIMD bitmask operations on the 64-bit Arm architecture.
- December 6, 2023

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

8-bit versus 32-bit MCUs - The impassioned debate goes on

Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server

Arm SPE: SoC Telemetry & Performance Analysis using Statistical Profiling Extension

Implementing the WebAssembly bitmask operations on the 64-bit Arm architecture