Friday, May 22, 2015

Computing fractional multipliers and divisors using continued fractions.

This blog post will discuss how to find the best rational approximation with bounded numerator or denominator to a a preferred frequency ratio. This will be a bit mathematical but actually nothing more than standard addition, subtraction, multiplication and division will be used. Some example code is given at the end.

Fractional baudrate generation

Some microprocessors uses fractional multipliers and divisors to generate accurate baud rates from system clocks that are not an integer multiple of the bitslice clock frequency. The bitslice frequency is the baudrate times an oversampling count.

\[ f_{bitslice} = oversample \cdot baudrate = ratio \cdot f_{sysclock} \tag{1} \]

An example is the USIC in the Infineon XMC1100 and XMC4500 family microprocessors. Here the clock for the bitslices is derived from the system clock, slightly simplified, as 

\[ f_{bitslice} = (DCQT+1) baudrate = \frac 1 {(PDIV+1)} \cdot \frac {STEP} {1024} \cdot f_{sysclock} \]

DCQT gives the oversampling count, typically set to 15. PDIV and STEP are register values controlling the frequency division. On the XMC1100 PDIV and STEP are 10 bit wide so they have values from 0 to 1023. Rearranging the above we see that we must find a good approximation

\[ \frac {STEP} {(PDIV+1)} \approx \frac {(DCQT+1) baudrate \cdot 1024} { f_{sysclock} } \tag{2} \]

with values of PDIV and STEP between 0 and 1023.

Example: 38400 baud from a 32MHz clock and 16 times oversampling

We enter the values in (2) to find

\( \frac {STEP} {(PDIV+1)} = \frac {16 \cdot 38400 \cdot 1024} { 32000000 } = \frac {629145600} { 32000000 }  = 19.6608 \) 

Working by hand we would now factor out common factors from numerator and denominator, but for computer implementations that is unnecessary extra work. As we shall see the algorithm works as well without that step.
Now we calculate the signed integer division with remainder using the rounded value 20 as quotient.

\( \frac {16 \cdot 38400 \cdot 1024} { 32000000 } = 20 - \frac {10854400} { 32000000 }  = 20 - \frac {1} {  \frac { 32000000 } {10854400}  } \tag{3} \) 

Rewrite the denominator in the last fraction by carrying out the division with remainder

\(  \frac { 32000000 } {10854400}  = 3 - \frac {563200}{ 10854400 } \approx  2.948 \approx 3  \) 

We enter this value 3 in the previous formula (3) to get the approximation

\[ \frac {16 \cdot 38400 \cdot 1024} { 32000000 } \approx 20 - \frac {1} { 3 } = \frac {59} {3} = 19.6667 \] 

This is good, after one iteration the error is less than 0.008 or about 300ppm.
We can improve this by using a better using a more precise value

\(\frac { 32000000 } {10854400} = 3 - \frac {563200}{ 10854400} = 3 - \frac 1 { \frac { 10854400 }{563200}}  = 3 - \frac 1 {19 + \frac {153600 }{563200}}  \approx 3 - \frac 1 { 19 } = \frac {56}{19}\)
Inserted into (3) this gives

\( \frac {16 \cdot 38400 \cdot 1024} { 32000000 } \approx 20 - \frac {1} {  \frac {56}{19} } = 20 - \frac {19} {56} =  \frac {1101} {56} = 19.6607  \)  

This approximation is better, error is about 3ppm, but the STEP value 1101 is to large for the 10 bit range, so we keep the previous approximation of 59/3.

Best rational approximation with continued fractions

In order to find a general method to calculate good rational approximations as in the previous example we use the theory of partial quotients, or convergents, of continued fraction expansions and best rational approximation theory. [TO DO add links ]   This is a well developed general theory for finding the best rational apprimations to a number with a given size of the denominator. There are recursive algoritms for calculating succusively better approximations.

So we are given a rational number p/q and we want to find succesive approximations \( a_n/b_n \) to p/q with smaller values for \( a_n \) and \(b_n \) than p and q. For the baud rate generations we want the best approximation where \( a_n \) and \(b_n \) fits in the respective fractional multiplier and divisor registers.

The formula

\[ \frac p q = c_0 + \frac {1}{c_1 + \frac {1}{c_2 + \ddots \frac {1}{c_n + \frac {1} {r_{n+1}}}}} \approx  c_0 + \frac {1}{c_1 + \frac {1}{c_2 + \ddots \frac {1}{c_n }}}  = \frac {a_{n+1}}{b_{n+1}} \]

For the first few values of n we get:

\[ a_{-1} =0,  b_{-1}=1 \]
\[ a_0 =1,  b_0=0 \]
\[  \frac {a_{1}}{b_{1}}  =  c_0 = \frac {0 + 1 \cdot c_0}{1 + 0 \cdot c_0}  = \frac {a_{-1} + a_0 \cdot c_0}{b_{-1} + b_0 \cdot c_0}   \]
\[ a_1 =c_0,  b_1=1 \]
\[  \frac {a_{2}}{b_{2}}  =  c_0 +  \frac {1}{c_1} =  \frac {c_0 \cdot c_1 + 1}{c_1} = \frac {a_{0} + a_1 \cdot c_1}{b_{0} + b_1 \cdot c_1}   \]
\[  \frac {a_{3}}{b_{3}}  =   \frac {a_{0} + a_1  ( c_1+ \frac 1 {c_2})}{b_{0} + b_1 ( c_1+ \frac 1 {c_2})} = \frac {(a_{0} + a_1  c_1 ) c_2+ a_1}{(b_{0} + b_1  c_1 ) {c_2}+  b_1} = \frac {a_1 + a_{2}{c_2}}{b_1+b_{2}{c_2}} \]
We now can see the recursion formula develop
\[  \frac {a_{4}}{b_{4}}  =   \frac {a_{1} + a_2  ( c_2+ \frac 1 {c_3})}{b_{1} + b_2 ( c_2+ \frac 1 {c_3})} = \frac {(a_{1} + a_2  c_2 ) c_3+ a_2}{(b_{1} + b_2  c_2 ) {c_3}+  b_2} = \frac {a_2 + a_{3}{c_3}}{b_2+b_{3}{c_3}} \] 

\[  \frac {a_{n+1}}{b_{n+1}}  =   \frac {a_{n-2} + a_{n-1}  ( c_{n-1}+ \frac 1 {c_n})}{b_{n-2} + b_{n-1} ( c_{n-1}+ \frac 1 {c_{n}})} = \frac {(a_{n-2} + a_{n-1}  c_{n-1} ) c_n+ a_{n-1}}{(b_{n-2} + b_{n-1}  c_{n-1} ) {c_n}+  b_{n-1}} = \frac {a_{n-1} + a_{n}{c_n}}{b_{n-1}+b_{n}{c_n}} \] 


The algorithm

We start with a rational number r = p/q.  We will generate a sequence of five values \(a_n, b_n, c_n, p_n\) and \(q_n\). The quotients \(a_n/b_n\) will be our succesive approximations to r. The values \(r_n\) are defined as \(p_n/q_n\) but they need not be explicitly calculated, the formula for \(r_n\) is used as the template for how \(p_n\) and \(q_n\) are updated.

Startup:
\( r_0 = r  \)
\( p_0 = p,  q_0 = q \)
\( a_{-1} =0,  b_{-1}=1 \)
\( a_0 =1,  b_0=0 \)
 
Loop until q_n+1 is 0 or an or bn are to large
\( c_{n} = round(r_{n}) = round ( p_{n}/q_{n} ) \)
\( r_{n+1} = 1/(r_n-c_n) \)    This is done calculated terms of \(p_n\) and \(q_n\)
\( p_{n+1} = q_n \)
\( q_{n+1}=p_{n}-c_n \cdot q_n \)
\( a_{n+1}=a_{n-1}+a_{n} \cdot c_{n} \)
\( b_{n+1}=b_{n-1}+b_{n} \cdot c_{n} \)

The rounding operations can be done downwards, keeping all values positive, or towards the nearest integer improving precision but also introducing negative numers and extra complexity. Note that the \(c_n\) value is calculated before updating \(p_n\) and \(q_n\) and is not remembered and used in the next iteration but recalculated.

A skeleton C implementation

Following code lacks error handling and only checks limit for numerator, but it illustrates the algorithm. It does lack a few comments, but follows the algortithm structure closely.


void cfractr(int32_t p, int32_t q,int32_t alim, uint32_t * ares, uint32_t * bres) {
    int ap = 0;
    int a1 = 1;
    int bp = 1;
    int b1 = 0;
    int cn = a1, anext, bnext, pnext;
    while (1) {
        /* Signed rounded rational division */
        if ((q>0)&&(p>0)||(q<0)&&(p<0))
            cn = (p+q/2)/q;
        else
            cn = (p-q/2)/q;
        /* Next value for partial quotients and remainder */
        anext = ap + cn*a1;
        bnext = bp + cn*b1;
        pnext = p-cn*q;
        /* Exact value, remainder is 0, break */
        if (pnext == 0) {
            a1 = anext;
            b1 = bnext;
            break;
        }
        /* Numerator too large, break */
        if ((anext<-alim)||(anext>alim)) {
            break;
        }
        /* Shift one step before next iteration */
        ap = a1;
        bp = b1;
        a1 = anext;
        b1 = bnext;
        p = q;
        q = pnext;
    }
    if (a1<0){a1=-a1;b1=-b1;}
    *ares = a1;
    *bres = b1;
}



   

Monday, July 8, 2013

Breath control revisited

I have been using my breath control and openpipe breakout for two months now and it makes for a really enjoyable instrument. There are some things to develop, first the hardware pipe and pressure sensor compartment should be redesigned, the current is a quick hack. So here comes some notes on designing a new mouthpiece. In some weeks time I hope to be able to add some sound examples and also describe some of the programming.

Designing a mouthpiece

The mouthpiece should allow some air to pass through while playing to make breathing more natural, but also stop the airflow enough for a clearly measurable pressure to build up. To avoid moisture on the sensor board i place the sensor in a compartment after the exhaust hole so that the air stream does not pass directly over the sensor board. Closing the exhaust hole and just using the pressure makes the end of notes sound bad since the pressure doesn't drop cleanly when you stop blowing. The best option is probably to make the size of the exhaust hole adjustable and to let the player decide.

The air pressure in a recorder mouthpiece varies between 200 and 1000Pa depending on the note played with high notes having more pressure. The difference in pressure between pp and ff (loud and quiet) is about 200Pa, these numbers can be found in Modeling of Gesture-Sound Relationships in Recorder Playing: A Study of Blowing Pressure, a master thesis by Leny Vinceslas.
An exhaust hole with 3-4mm diameter gives this kind of pressure on the sensor and feels quite nice to play. I will test more with different sized exhaust holes, how hard to blow and how the pressure varies on the sensor. 

Here is my design sketch for the next version of breath sensor mouthpiece.  I have found very cheap nylon tubing used for electrical installation work that fits snugly around the Open Pipe. I am fairly confident this can built at home with simple tools, the only remaining part is the silicone rubber film. It can be bought 0.3 mm thick 50x50 cm from Germany for 90 euros, a bit much money but its probably enough for more than 600 such mouthpieces  ( I might find some use for a lot of silicone rubber film :) ).

The sensors

BMP050
Reading both temperature and pressure and calculating the calibrated values takes around 11ms, this time is mostly spent waiting for the chip to complete a conversion.  With careful programming other calculations and sampling of the touch sensors can be done during this wait time. A breakout board can be found for around $15

MPL3115A2
This sensor seems to have as good or better performance than the BMP085 with faster sampling rate. The calibration and temperature compensation is done in the sensor ASIC and the convoluted calculations needed for the BMP085 are not needed. I have ordered a breakout board for testing.

A further enhancement would be to use a very open mouthpiece and sense both pressure in the middle of  the airstream and total flow, this would more correspond to playing a flute. Not sure what sensors to use for this and how to mount them.

Relation between pressure, tone height and volume

Using the data in L Vinceslas work I set up a table of the normal pressure used to to play the different notes at medium volume.  This value is used as baseline for the note, corresponding to midi volume 64. This means that like in a real flute or recorder, in order to keep a constant volume, the pressure must increase as we play higher notes.

    int volume;
    int midpressure = note_pressure[note-60];


    volume = 64 + ((pressure - midpressure)*psensitivity)/128;
    if (volume < 0) volume = 0;
    if (volume > 127) volume = 127;

This code fragment shows the midi volume calculation, the psensitiviy gives the sensitivity to pressure variation around the standard note_pressure from the table. A value of around 15-20 seems to work quite well. In my test sketch I have assigned this value to a CC controller so it can be changed dynamically while playing.

This has been tested and its easy to dynamically control the expression of the sound.

Using the pressure to control the octave of the note played

If the pressure is more than 2/3 of the pressure difference to the note one octave higher than the one fingered then scale is shifted one octave up and later if it is below 2/3 of the difference down to the original note the scale is shifted back.  This code is still in planning.

Detecting the start of a note

The program recognises the start of a note when the pressure has been more than 50 Pa above ambient for three sample periods (30ms). This is the number of samples needed for the pressure to reach its peak value so that the midi note volume can be calculated. If aftertouch, channel pressure or the expression continuous controller is active then this may be decreased at the risk of losing the initial attack.

Thursday, July 4, 2013

Adventures with the Terasic DE0 Nano

I have for a long time been fascinated by the idea of programmable logic as a complement to standard MCU's. Ideas like running 32 pwm channels and as many quadrature detectors on one chip for servo control is definitely beyond todays MCU's, powerful as they are.

I have previously played a bit with the Terasic Trac C1 and the Dallas Logic Quickgate EP2C8 Cyclone II boards, trying to learn VHDL and how to build things like an audio synthesizer with them.  So when I saw the Terasic DE0 Nano I simply couldn't resist the urge to buy one. At €74 from Mouser it is not dirt cheap, but for an FPGA board of this kind it is very good value.

Designing FPGA logic is quite different from ordinary C/C++ microprocessor programming. The best book I have found to help me is "Rtl Hardware Design using VHDL" by Pong P. Chau.

So after reviving some old VHDL projects I started to install the Quartus software on my Fedora 18 system. Quartus 13 refused to run without frequent crashes even after I changed and added several system libraries to conform to the ones coded into the Quartus 13 executables. After this I tried installing Quartus Free Web Edition 11, and it seems to run perfectly,  this might be because of the changes done to make Q 13 run, or not, but at the moment it works. Older Quartus versions can be found at  ftp://ftp.altera.com/outgoing/release/.

Most of the get started manuals for complex systems like this tells you to install some precoded development package and just click menu boxes in a specified sequence without giving the logic for that. For me this is not really learning a new tool. So I try to build small things from scratch to see what happens before using the heavyweight preprogrammed IP in the component libraries.



Right now I have a Serial Port echo running on the DE0 Nano that displays incoming serial bytes on the 8 LED's and then echoes them back, the small chip is a Teensy 3 that acts as a Serial-USB bridge. Its not very advanced yet but writing the logic from scratch is fun and rewarding.  Next step is SPI and some PWM.


Thursday, May 9, 2013

OpenPipe and breath control


I have been playing around with the OpenPipe Breakout, the electronic pipe/flute control, for a few weeks now, trying to revive some old and mostly forgottens skills on how to play a flute or Irish tinwhistle.

The pipe is connected through a I2C interface to a Maple clone, the Olimexino STM32 and then with MIDI to Garageband on my iMac. Its a fun instrument but I find it a bit hard to balance, holding it and playing some fast fingering at the same time, using a thumb for note on/off is also a bit unusal.

So I decided to try and make a breath control so that the pipe can be played almost like a real flute.



The breath control sensor is a BMP085 breakout board, this atmospheric pressure sensor
connects to the Maple board over I2C. The mouthpiece is made from two pieces of nylon tubing. A cork from a bottle of good Italian wine holds things in place. The sensor is placed inside the tube and the end is sealed with the cork, a small ventilation hole lets some air pass thrugh the mouthpiece.






The sketch reads the BMP085 and the touch sensor in the OpenPipe Breakout and starts a note if the pressure is more than 50Pa above ambient. Some early tests shows that the basic setup works but theres a lot more to do before the sound can be controlled by breath like in a real flute.

Selecting a pressure sensor

BMP085 is an absolute pressure sensor accessed using the I2C protocol. No extra components are needed. The drawbacks are that the breath only represents a small fraction of the sensors range and the baseline pressure, ambient pressure, must me calibrated for.. Price is ___

The other major type of pressure sensor is a MEMS bridge giving a small voltage representing the difference between measured pressure and ambient. The problem here is that the small sensor output must be amplified before the signal is input to a AD converter. No calibration for changing ambient temperature is needed.

Saturday, April 6, 2013

MIDI USB Class for the Maple board



I got myself an OpenPipe breakout board and want to use a Maple board to connect it to a soft synth on my computer or a hardware synth. For this I want the Maple to implement a MIDI USB class device.

The Maple has as standard a USB serial device that gets setup and loaded as part of building a sketch and its then available as SerialUSB object. The MIDI USB will replace the Serial USB, and register the device as a MIDI class compliant device. The Maple bootloader is not affected, but the remote reset into bootloader is not implemented, so a manual reset is needed to get into the bootloader, I can live with that.

The MIDI USB needs a few things to setup

  • USB Setup and handling of Control Requests
  • A MIDI USB device descriptor to present itself to a host computer as a MIDI USB device
  • Bulk IN and OUT endpoints for MIDI USB packets, 32 bit/4 byte blocks of data
  • Code that interprets the MIDI USB packets as standard MIDI events.

Building the MIDI USB class as a variant of the existing USB serial code, the first and third parts are almost identical for MIDI and Serial, actually easier for MIDI since no modem control line handling is necessary and no management endpoint is needed.
The device descriptor is bit harder, but its a static datastructure and just following the MIDI USB documentation carefully will get you through this.
The USB MIDI package handling is standard MIDI code, and does not depend on the details of the USB transport layer.  

The code has been tested and registers as a MIDI device both under OSX and Android, and seems to be working.

A git repository can be found at    https://github.com/mlu/maple-ide

The MIDI USB is built from the following files:
High level device object, Wirish style, replaces usb_serial.cpp
  • usb_midi.cpp
  • include/wirish/usb_midi.h
Low level USB driver, replaces usb_cdcacm.c
  • stm32f1/usb_midi_device.c
  • include/libmaple/usb_midi_device.h
The process of setting up a sketch to use MIDI instead of Serial is still clumsy and needs some manual editing of the boards.h file.

The development is done on a modified Maple-IDE that uses a current arm toolchain and a libmaple layout that is closer to the present libmaple layout so the files are placed in different locations than the standard Maple-IDE file layout.

UPDATE 2013/0412

The descriptor definitions have been factored out of usb_midi_device and placed into usb_midi_descr.c/h . A working copy of the libmaple git repository with the midi usb files placed in their proper place in the hierarchy can be found at https://github.com/mlu/libmaple .


      

Saturday, September 12, 2009

Using a STM32 based board for Arduino Development

Exploring ways to enhance Arduino while still keeping the ease and experience.

This is work in progress, more details, updates and pictures coming soon. But the basic stuff works as described today.

Using Arduino boards and the Arduino GUI is a simple and fast way to develop embedded applications. There are predefined functions for input and output, serial communications and timing, many example sketches and the user does not have to worry about low level initialisation, interrupt vectors and timer configurations. On the other hand, ATMega 168 chips have limited resources, it is an 8 bit chip and not very fast.
Modern Cortex-M3 chips like the STM32 are fast, have much more RAM and flash and they have powerful and well documented debugging subsystems using JTAG tools, but starting to program these chips can be a huge step.

So here is my project to use a slightly modified, but from the users point standard Arduino IDE, standard Arduino sketches and run them on a powerful 32 bit Cortex-M3 (STM32) chip. The process of writing sketches, uploading them and then running them is exactly the same as for ordinary Arduino development. It is just that the processor board we use is not an Arduino. And you can use it all for your old Arduino boards also.

More complicated sketches that uses lowlevel access to the ATMega chip must be rewritten for this new platform.

The components we need:
  • Hardware.
  • Modified Arduino IDE.
  • Library code for this chip and board configuration.
  • Toolchain to compile and build code for a Cortex-M3 processor
  • An uplader that is used to program the chip.
The software setup has been developed and tested under Linux.

Hardware

The board I use is an ET-STAMP-STM32, a chip carrier module that brings out all chip i/o lines but not much more. I bought it for $24.90 from Futurlec (ET STM32 Stamp). It is mounted on a breadboard together with a 3.3V power supply, a serial USB adapter, a LED and some extra stuff for experimentation lika a potentiometer connected to an analog input and a push button.


The FTDI USB is only used to supply 5V to the small 3.3V regulator board. It can also be connected to UART2 or UART3.
The board is at the moment running a sketch that read the analog voltage from the potentiometer and adjusts the LED blink frequency:
volatile unsigned int count=-1;
int ledPin = 44;  // STM32_P103 Board - PC12
int dly;
int analogChn = 10;  // Analog channel 10 is PC0 = pin 32

void setup()
{
  pinMode(ledPin, OUTPUT);      // sets the digital pin as output }
  Serial.begin(115200);         // opens serial port, 31250bps (MIDI speed)
  Serial.write("\n\n   ***   Hello from Arduino 32   ***\n");
}

void loop()
{
  int k;
  count++;
  dly = analogRead(10)/2+20;
  digitalWrite(ledPin, HIGH);   // sets the LED off
  delay(dly);                  // waits for a second
  digitalWrite(ledPin, LOW);    // sets the LED on
  dly = analogRead(10)/2+20;
  delay(dly);                  // waits for a second  
  if (count%100 ==0)
  {
    Serial.print(count);    
    Serial.write("\n");
  }   
} 
So you can see that the sketch is a totally standard Arduino sketch.

Modified Arduino IDE.

The Arduino IDE is modified in order to be able to build code with the ARM toolchain. The modified files and some new files that are added can be found at http://github.com/mlu/arduino-stm32/tree/master . These files must be placed in the source tree for Arduino 0017, and then the IDE must be rebuilt.

Library code for this chip and board configuration.

The special code for this chip and board are found under hardware/stm32 and to use this you just have to select the board "STM32 Arduino32" in the Tools menu in the IDE.

Toolchain to compile and build code for a Cortex-M3 processor

I use the Codesourcery G++ Lite Edition for ARM, EABI version, that can be downloaded from the Codesourcery website (Codesourcery G++ Lite ).
Install this and make sure that the top bin directory containing "arm-none-eabi-gcc" and the rest of the cross compilation binaries are in the path.

Uploader

I have written a small uploader, using similar command line arguments as avrdude, that uploads the compiled and linked sketches to the processor.
The code and also a compiled binary that runs under Fedora 10 are included in the files at github.com/mlu/arduino-stm32.

Almost ready to rock

The first thing to notice is that the pin numberings are different
Arduino sketch              Board
digital pin number 0..15    pin PA0 .. PA15
analog pin number 0..7      pin PA0  .. PA7
analog pin number 8,9       pin PB0,PB1
analog pin number 10..15    pin PC0  .. PC5
Pins 9 and 10 (PA9 and PA10) are used by USART1 and are connected to the RS232 level shifter for the serial port and should not be used.
 
We must also manually switch the board between bootloader mode and normal run mode with the blue switch, and also do all resets manually with the reset switch.

Testing Blink

Blink uses the LED on pin number 13, This corresponds to pin PA13 so we add a LED and a 220 ohm serial resistor to PA13. Connect a serial adapter to the board and select the corresponding serial port in the Arduino IDE.
Load the Example/Digital/Blink sketch and select board type "STM32 Arduino 32". Set the board in bootloader mode by depressing the blue button, the green bootloader LED lights up, and reset the board. Now it should be possible to simply upload the sketch :), go back to normal run mode with the blue button and reboot with the reset button.

Now this is new and quite untested code so many things could go wrong when trying to do this on a computer with a different setup.

More to follow, especially with reader feedback

Saturday, September 5, 2009

Debugging on the Cortex-A8, System Components

Cortex-A8 for dummies, part 1

I have been working on the Cortex-A8 subsystem for OpenOCD for some time this summer. This is great fun, when there is enough time, but also takes a lot of work. It is a complicated system and the documentation is large, spread over several big TRM's and sometimes hard to grasp. So here comes the "Cortex-A8 Debugging for Dummies" version 0.0. This first post looks at the main components involved and the access methods.

You can also take a look at OpenOCD for the BeagleBoard at
http://elinux.org/BeagleBoardOpenOCD

The new members of the ARM family of processor cores, the Cortex-M3 and the Cortex-A8, share some features but in many fundamental ways they are not similar. They both support the Thumb2 instruction set and they both use the ARM Debug Interface v5 for debugging and direct access to the core debug units and the AHP and APB buses. Cortex-M3 processors also use the NVIC interrupt controller for handling peripheral interrupts. This makes debugging easier and interrupt handling more consistent when using processors from different suppliers. Cortex-A8 processor have implementation defined handling of external interrupts.

Architecturally there are big differences. The Cortex-M3 uses the ARMv7M profile with a very simplified set of processor modes and a reduced set of shadow registers. Cortex-M3 can only run Thumb2 code. This is in contrast to the Cortex-A8 that can run ARM, Thumb and ThumbEE instructions, and where the architecture is a variant of the standard ARM architecture seen in ARM7, ARM9 and ARM11 cores.

System Components

Here is a simplified picture of a Cortex-A8 system, based on the Texas Instrument OMAP3530 Applications Processor, used in the BeagleBoard. There are four main components involved in our picture of the system:
  • Debug Access Port, connects an external debugger through JTAG or SW, serial wire, to our system. The DAP can have one or several Access Ports, AP, that connects to different parts of the system.
  • MPU, Microprocessor Unit Subsystem. Here we find the processor core, core registers, system coprocessor CP15, Memory Management Unit, L1 and L2 Cache.
  • A Core Debug Unit that can pass data and instructions directly to the processor core, and also halt and resume the processor. This is connected to the AP through a local memory bus, an Advanced Peripheral Bus, APB.
  • External high speed bus, this is the L3 bus and it is an implementation of the Advanced High Speed Bus, AHB,  specified by ARM. Peripheral components and memory are connected to this bus.

Main Components for Core Debug System with one APB and one AHB MEMAP


In this example we have two access ports, both of them access system resources by a local memory address space, so called memory mapped access ports or MEMAP. The access ports are marked with the type of bus it is connected to. The APB-AP is connected to the debug resources without going through the system AHB bus. Each MEMAP have a access port number to identify it in the DAP. For the OMAP3530 processor the AHB-AP is number 0 and the APB-AP is number 1. 

A debug program like OpenOCD connects through JTAG to the DAP, and all communications are passed through an access port into the system. When using OpenOCD we can select which access port to use with the command:
>dap apsel n
Here n is 0 or 1 for a system with two AP's. After selecting a MEMAP we can access the memory space of the AP with the memory display word, mdw, and memory write word, mww, commands:
>dap apsel 1
>mdw 0x80000000
>dap apsel 0
>mdw 0x80000000
>mww 0x80000000 0x12ab34cd
>mdw 0x80000000
If there is no memory at the specified address or if the access is prohibited by the security manager a "Sticky Error" is generated by the AP and reported by OpenOCD. We can get more information about the type of resources a MEMAP is connected to by using the command
>dap info n
This information is encoded by the AP in a so called ROM Table. The details of how to identify system resources from  a ROM Table will be explained in a later post.

The APB AP, with MEMAP access port number 1, is reported as "MEMTYPE system memory not present. Dedicated debug bus."This indicates that only CoreSight components and other debug resources identified in the ROMTABLE for this AP are accessible. Memory and memory mapped peripheral control registers are not available.

The ADP v5 Debug Port is described in ARM IHI 0031A (ARM Debug Interface v5).

AHB and APB buses are described in ARM IHI 0011A (AMBA Specification Rev. 2.0).

The Debug Unit, the debug registers, MPU and the cache systems are described in ARM DDI 0344H (Cortex-A8 TRM).

The system overview and relation between the components can also bee sen in OMAP35x Technical Reference Manual (spruf98b.pdf) Figure 1-1. Interconnect Overview.

Core and Memory Access

The core is accessed through the APB MEMAP using the communications registers in the debug unit DTRRX, DTRTX and ITR.

Access to the full memory address range of the system can be done through the AHB AP to the L3 bus and is controlled by the security and access restrictions that are active in the system. This access always uses physical memory addresses (PA). There is risk for cache coherency problems when accessing memory this way, for volatile resources like I/O registers the problem should be less. This access can be done while the MPU Core is running.

When MMU is active all memory addresses in the core like PC, LR and data pointers are virtual addresses and must be translated to physical addresses before accessing them through the AHB AP.

Another way to access memory and memory mapped resources from the debug port is through the MPU using LDR and STR instructions written to the ITR and data in the DCC registers, DTRRX and DTRTX. [Cortex_A8 TRM, sec 12.11.6]. This access will go through the Cache and Memory Management Units of the Cortex_A8 and thus use virtual addressing when this is activated.

Strategies

For configuring systems at start up, such as setting memory controller parameters, clocks and PLL registers, memory access through the AHB access port is good. This should also work well for writing to flash memories.

For debugging code running on the MPU, the APB and access through the MPU core probably should be used, since this method avoids problems with virtual to physical address translations and also helps avoid cache coherency problems.

A debug system should implement both access methods and some method to choose which one to use.


Acknowledgements

Many thanks to Dirk Behme for proof reading and helpful comments.