Need to do some thinking out loud…  This coding stuff is getting a lot more complicated than I thought.

My long enforced layoff from coding perhaps had a silver lining?  Before I left I was churning out code as fast as I could to get the ranging working.  I didn’t manage to get it going, but I have left a pile of badly engineered, and undocumented code in my wake which I now find makes little sense.  I decided to start off again by carefully reviewing everything I wrote, and taking the time to make sure I fully understand it all.

With TS06, I’m going to try to be more sophisticated in pursuit of power efficiency.  It would be nice not to have to bother about this aspect yet, but I think I must.   The problem is that I need to adopt a more reasonable real-time architecture to have any hope of improving battery life over TS05.

Old Scheme

TS05 (Atmel) handled everything in the simplest way possible – a giant polling loop.  Two downsides of that were:

  1. It spent a lot of time polling devices that had no data ready.  That was wasting processor nap opportunities.
  2. That seemed trivial at the time, especially when I was finding it such a challenge to get any response from the wide array of devices.  In the light of all that experience I know that each of those polling operations involved sending commands between the processor and the device, at SPI, or I2C speeds.  In the case of the SPI, all the devices were on a single bus, and the communication protocol sometimes had to be changed between devices that used incompatible communication protocols. That’s a pretty insignificant processor overhead.

    Communication was at about 400Kb, and so a single average transaction took about 0.5 millisecond to discover that the peripheral had nothing to say!  That’s not insignificant.

    With about ten devices to poll, a single poll cycle took 5ms, which limited the maximum polling rate to 200Hz.  This polling rate in itself was not a limitation – TS05 polled at about 10Hz, too slow for some devices, and way to fast for others.  The 10Hz rate allowed samples to be collected at a regular rate which, to my mind at the time, made the signal processing easier to fathom.  Stupidly, a lot of the time the processor was polling the clock to see if it was time to collect another round of samples.  How dumb can you get?

    This scheme meant that the the processor and all the devices were on and running for 5ms every second.   As things worked out, the processor was almost never sleeping, and so the drain on power was substantial, and the huge battery would last for 1-2 days in continuous use.

New Scheme

TS06 (Nordic) needs to take a more modern, sophisticated, and one might say, standard approach.
The modern way to do things is to start with an almost trivial main loop:

for (;;)

In this scheme the processor’s prime strategic objective is to turn the power off to as many components as it can, and then go to sleep until something generates an interrupt.

Behind the scenes, when an interrupt occurs, the processor is woken up, and the Touchstone(TS) interrupt handle (IRQ) is automatically invoked by the hardware.

The interrupt handler does whatever it can to gather up and save the raw data pertaining to the interrupt.  For an accelerometer, it might collect the x,y and z acceleration values, along with an accurate time stamp. Then it bundles up the data and posts it on the scheduler event queue: a queue of notifications that something significant has happened. TS maintains this queue behind-the-scenes.  Finally the IRQ returns.  

The main loop routine “sleep-till-interrupt” learns that an interrupt has occurred, and it returns control to the main loop, which immediately invokes the scheduler.

Now the scheduler takes each item of the head of the queue and calls a previously registered event handler to do the, possibly heavy, processing work required to transform the event data into some Touchstone activity.  Normally this would simply involve transforming the data into some calibrated, standard form, and storing it, but it might eventually bubble up through the application layers to cause some alert message to be issued to the user, for example.   Several interrupts may occur in quick succession, and so the scheduler works it’s way through them all before finally returning to the main loop, where the processor tries to go to sleep again.

Advantages of this approach

This approach has many advantages.  Here are a few:

  • The processor, and perhaps a bunch of subsystems can be turned off until some interrupt occurs. Depending on what has been left running, the sleep current can be a few uA, or even less.
  • Events can be processed much sooner after they occur (lower latency).
  • Some interrupts can be processed without even waking the processor – button de-bouncing for example.
  • There is no pointless traffic on the SPI, and I2C bus which saves power.
  • Changes in sensor data can be handled at a higher rate.


This approach is rather more difficult to implement, understand, and debug.
One of the challenges is that the whole architecture has to be designed as a state-machine.  In other words, the current state of TS has to be succinctly described, so that it is possible to figure out what the right thing to do is for each possible interrupt.   For example, some events have to be delayed until others have completely finished.
The other is that the standard routines that Nordic issues don’t consistently implement this model either!

And in particular

The generic SPI driver issued by Nordic is necessarily pretty complicated.  Obviously I want to try and use their code as much as possible because it has been well tested.  Unfortunately it is rather limited.  In particular it is not written to send events to the scheduler.
I managed to implement the new architecture for the timer, RTC and UART, but their generic SPI drivers are much more of a challenge.  So the question is, should I rewrite it entirely, or should I try to kludge a layer on top of it to handle it’s shortcomings?
I took the second approach to save time.  I have managed to get an event stream working for the DW-SPI device, and it seems to be doing the right thing.  Obviously the DW is not the only device on the SPI bus, but their driver pretty much assumes that it is the only device on the bus.  If devices on the bus all use exactly the same protocol (POL, PHA, speed, and bitorder) then it might be possible to kludge it, but that’s probably wishful thinking in the long run.
So the question is, should I just bite the bullet at this stage, while I have my head in the code, do a decent engineering job, and make it handle multiple devices, or should I just forget other devices (like the uSD card) and move on to get the vital stuff done, and then come back.  I’m afraid if I just leave it, then I’ll never have the stomach to come back and fix it.
What to do?

I wondered if we could make our own Pebble.

While Apple only sports an 18-hour battery life, and Andoid 36 hours maybe, the pebble claims 7 days.  That has to be down to the display technology.,review-2156.html

Maybe we can’t stuff all the TS gear inside the Pebble case, but perhaps we can extract the display and figure out how to drive it?

So all this watch does is:

  1. Show the time
  2. Some other stuff – c.f.

It doesn’t provide an interface to your phone like the real Pebble.

According to this teardown:

Pebble makes use of a new Sharp Memory LCD that puts it in the realm of e-ink and e-paper.
The display is capable of maintaining its current image with very low power draw (less than 15 μW for a static image, according to the manufacturer).

Sharp claims to have a 0.99″ display with 10uW static power, and 45uW dynamic.  The square display is the same overall size, but consumes a lot more power: 60uW static.

0.99″ display
1.28″ display

I’m having a bit of trouble with the buttons.  The button closest to the LEDs creates an interrupt on pin16 and works fine.

The other one seems to cause a trap to 0 when I press it – like a reset.

My definitions file says:
#define BUTTON_0       16
#define BUTTON_1       17
I looked at the schematic to try and confirm it is on 17, but the relationship between the pin and the signal name confuses me.   Could you sort me out please?

David responds
The hinges on the door are so rusty I can hardly open it.

This is the from last schematic I sent to Yali, dated June 29, 2015. It doesn’t match the one you sent. I remember now you found the mistake of BUT_0 being connected to SWDCLK. Your button definitions are correct and match the latest schematic.
Inline image 1
Inline image 2
To work reliably this setup depends on there being a pull-up enabled on P0.17 and P0.16.


The scheduler is used to off-load intensive processing from the interrupt routines.

Consider an example where each of the IMU components (accelerometer(A), compass(C), gyro(G)) generates an interrupt, perhaps at 100 Hz.  It’s not hard to imagine cases where all three interrupts may be triggered by the same physical activity.   Each interrupt may demand some substantial signal processing that can itself take some time.

First come, first served

The simplest programs will process each interrupt to completion.  While, for example, interrupt A is being processed, interrupts from C and G are disabled, and are only re-enabled when A is finished. If handling A takes a long time to process, say longer than 10ms for 100Hz sampling, then C and G may have had to wait so long, that the hardware has already experienced a new interrupt condition, and has updated it’s internal registers with new data before the processor has had a chance to read the first lot.  This is known as exceeding the “crisis time” of the device.

void Accel_IRQ_Handler(void) {
  xyzAccelData = getAccelDataFromIMU(); 

Prioritized interrupt handling

A more sophisticated approach is to use interrupt priorities to make sure that the interrupt with the shortest crisis time will preempt interrupts with less critical timing requirements.  There are not usually very many priority levels, and a bunch of them are usually assigned to processor-related interrupts, so there may only be two, or three priority levels anyway.   But let’s assume that we have enough priority levels available.

When an interrupt occurs, the processor saves the current values of it’s various registers onto the program stack, and transfers control to the appropriate interrupt request handler (IRQ).  This then uses the stack to do some processing and returns.  The processor pops stuff off the stack back into the hardware registers, and continues where it left off, unless…  a higher priority interrupt occurs before the original IRQ has finished.  So more registers get pushed onto the stack.  And if another even higher priority interrupt occurs, then even more get pushed onto the stack.  This means that we have to make the stack size, at least “three interrupts worth of register space” bigger than the maximum it can ever reach during normal background processing.  Otherwise we get stack overflow, which provoke squirrelly behavior that is often very hard to diagnose.  And remember that this whole approach only works if we have enough different levels of priority, and IRQs can complete their tasks within the slowest interrupt crisis time.

Off-loading interrupt processing

To overcome these issues a real-time system generally attempts to minimize the time spent inside an IRQ, thus allowing all other IRQs to run before their crisis time expires.  The way they do that is to capture, and save the interrupt data, and not to think about it too much.  They off-load the processing task to a lower-priority process – normally the main loop.  When all the IRQs have serviced their interrupts, saved their data, and got their pants pulled up, the main loop resumes.  The main loop looks to see if any interrupt routines has indicated that there is some more work to be done – some heavy signal processing, for example.  If there are several tasks, it figures out the order to do them in, and gets on with the job.

It’s worth noting here that there are situations where further signal processing does not make much sense until multiple devices have captured data.  For example, and IMU may prefer to wait until it has data from A, C and G interrupts before it attempts to update it’s new view of the world.  Updating it’s view using data from just one device may actually we worse than waiting for three samples that share the same approximate time stamp.

A simple way to do this is for each IRQ to set some kind of flag that the main loop can read.  For example, the IRQ for A may look like this:

void Accel_IRQ_Handler(void) {
  xyzAccelData = getAccelDataFromIMU(); 
  accelWorkToDoFlag = true; 

The main loop look like this:

main() {
  sleep(); // processor sleeps till next IRQ has been handled
  if (accelWorkToDoFlag) {
    accelWorkToDoFlag = false;


The scheduler provides a formalized, more disciplined and flexible way to deal with this situation. The programmer defines an IRQ post processor to do the hard signal processing, for example.

postProcessor(void *data) {

Essentially the scheduler manages a queue of post-processing tasks submitted by the various IRQs. The scheduler runs in the main loop.  It takes each each task description, starting at the head of the queue, and calls the designated post-processor to deal with it.

void Accel_IRQ_Handler(void) {
  xyzAccelData = getAccelDataFromIMU(); 
  scheduler_enQueue(&xyzAccelData, postProcessor); 

main() {
  sleep(); // processor sleeps till next IRQ has been handled

Back from 105 days in the UK.  I didn’t even get the devKit out of the boxes.  I could have left it all in CA for all the difference it would have made.

I am trying to figure out what the hell I was doing when I stopped.  I can’t even remember what the program was called I was working on.

There has been a new release of the GNU Arm Eclipse package which I’m hesitant to install until I have my pants pulled up.  How many unknowns do I want to wrestle with at one time?