Tuesday, February 16, 2016

PSoC 5LP DFB assembler

The PSoC 5LP DFB and assembler

The digital filter block DFB on the PSoC 5LP is a very powerful element on the PSoC chip. It is available through the DFB component or the Filter component in PSoC creator. The filter block can be configured with cascaded FIR or BiQuad filters. This is a very powerful configuration but the filter parameters cannot be changed at run-time using the component API. The DFB component, on the other hand, is programmed using the supplied assembler and all parameters stored in DFB RAM can be dynamically changed at run-time. The assembler syntax is documented in the component and in the chip data sheets and the component contains a simulator to test the user supplied code. The problem using these tools is that the data-flow architecture and vliw instruction syntax have somewhat steep learning curve and there is not many commented examples available to guide the new user through the learning process.

It is not really very complicated, but it is a pipelined architecture and its important to know at every cycle/instruction what data element is available in what section of the pipeline.

The main elements are
  • a dual input stage where only one can be connected to the data path at a time, and an input register can only be read once before written again from the main bus, I might be wrong here but this is my experience from my early learning process.
  • multiplexers controlling what data is routed to the input of the RAM, MAC and the ALU
  • a pair of RAM blocks, they can be read and written independently and connected to 
  • a 24x24 multiply accumulator block MAC using q23 data format followed by
  • a 24 bit arithmetic logic unit ALU and a shifter
  • an output stage 
There is a control store and state handling but that can mostly be left to the assembler in basic applications.

Number format

The numerical format is signed 24 bit, and multiply and accumulate operations returns the signed bits 23:46, that is the 24 top minus one bits. A natural interpretation of this data is as signed decimal numbers in q23 format, that is a sign bit followed by a decimal point and 23 decimal bits. For the simulator these values are written as hexadecimal values.

First lessons learned

Some of the things that is mentioned in the documentation but, at least to me, are not obvious:
  • Input and Output channel A is numbered 1, and  B is numbered 0.
  • Input is the same as Staging register, Output is Holding register.
  • The DFB component code uses 3x8 bit byte access to the Staging and Holding registers in the  LoadInputValue and GetOutputValue functions, but it works well using 32 bit access.
  • In test examples for the simulator, use values where the product is not zero in the upper 24 bits, otherwise the MAC product stays zero and it seems nothing is happening.
  • The two input buffers cannot be read at the same time, and they can each be read only once before reloaded from the exterior bus. This means that if an input value must be used several times it has to be stored/held somewhere. This can be in one of the RAM blocks or in the ALU, of course its not possible to hold more than one value in the ALU and only for a few cycles, until some other data must flow through the ALU.
  •  All output from the MAC must go through the ALU
  • Writing a value to a RAM location puts this same value on the output of the RAM during the same cycle.
  • Addressing a specific RAM location in one of the RAM buffers is a bit involved
    • acu(clear, ...)  for location 0
    • acu(incr, ...)    for next location
    • acu(decr, ...) for previous location
    • acu(read, ...) addr(xx)  to read ACU RAM row xx as RAM A register address
    • acu(write, ...) addr(xx) to write current RAM A register address to ACU RAM row xx
  •  The saturation logic is for the ALU, the MAC will overflow in the accumulation even with saturation detection enabled.

Example: Squaring the input

The steps in this code are:
  • Wait for input
  • Send input buffer A to ALU
  • Route ALU to both MAC ports and clear, this places the product in the MAC with no accumulation.
  • Route the result through ALU to the Output register.

The asm code for the DFB block
initial:
acu(clear, clear) dmux(sa,sa) alu(set0) mac(hold)
acu(setmod, setmod) dmux(sa,sa) alu(hold) mac(clra) jmp(eob, waitForNew)

// Wait for data to be written to Staging Register Input 1
waitForNew:
acu(clear,clear) dmux(sa,sa) alu(hold) mac(hold) jmpl(in1,dataRead)

dataRead:
// Read staging register A into ALU
acu(hold, hold) addr(1) dmux(sa,ba) alu(setb) mac(hold)

// Multiply ALU out with ALU out and place in cleared MAC ACC
acu(hold, hold) dmux(sa,sa) alu(setb) mac(clra)            

//Move MAC o/p to ALU
acu(hold, hold) dmux(sm,sm) alu(seta) mac(hold)             

//Wait for ALU output
acu(hold, hold) dmux(sm,sm) alu(hold) mac(hold)

// Write the MAC content to holding register A
acu(hold, hold) addr(1) dmux(sa,sa) alu(hold) mac(hold) write(bus) jmp(eob,waitForNew)

Note: It is possible to send the staging register directly to both  ports of the MAC, saving one instruction, but the assembler generates a warning so I guess more testing is needed.
Use the assembler and simulator in the DFB block to test your code carefully before trying in on the live chip. The code reads from staging register A so some data should be placed in the simulator bus data area 'Bus1' (far out to the right).


To use this, the following code is put in the program:
float finput = 0.3, foutput;
uint32 input, output;

DFB_1_Start();
DFB_1_SetInterruptMode(DFB_1_HOLDA);
input = finput*(1<<23); /* float to q23 */
Loop code:
input = finput*(1<<23);            /* float to q23 */
DFB_1_LoadInputValue(1, input);

while (!(DFB_1_GetInterruptSource() & DFB_HOLDA) ) ;

output = DFB_1_GetOutputValue(1);
foutput = ((float)output)/(1<<23); /* q23 to float */

To use 32 bit access we can use the following code to write and read the staging and holding registers:
*( (reg32 *) DFB_1_DFB__STAGEA) = input;
output = *(  (reg32 *) DFB_1_DFB__HOLDA);

Example: First order LP filter

A basic first order filter calculates the recurrence relation: 
\[ y_{n} = a_1 y_{n-1} + b_0 x_{n} \]
The filter coefficients are stored in the first two locations of RAM-A and the previous output  \( y_{n-1} \) is remembered in the ALU hold register between loops.

When new input is available \( y_{n-1} \) is routed from ALU output (shift) to MAC input B and multiplied with \( a_1 \) (RAM-A[0] ) without accumulation, mac(clra), next the new input is routed to MAC input B and multiplied with \( b_0 \) (RAM-A[1]) and added to the previous product. The result is then transferred to the ALU and stored in the holding register.

For the example we use \( a_1 = 0.9 \) and \( b_0 = 0.1 \) for a DC unity gain filter.

0x733333 // a1 = 0.9
0x0CCCCC // b0 = 0.1

The asm code for the DFB block:
area data_a
org 0
dw 0x733333 // a1 = 0.9
dw 0x0CCCCC // b0 = 0.1

initial:// Clear ALU and MAC
acu(clear, clear) dmux(sa,sa) alu(set0) mac(hold)
acu(setmod, setmod) dmux(sa,sa) alu(hold) mac(clra) jmp(eob, waitForNew)

// Wait for data to be written to Staging Register Input 1
waitForNew:
acu(clear,clear) dmux(sa,sa) alu(hold) mac(hold) jmpl(in1,dataRead)

dataRead:
// Multiply ALU out with RAMA[0] and place in cleared MAC ACC
acu(hold, hold) dmux(sra,sa) alu(setb) mac(clra)

// Read staging register A to MAC port B and multiply with RAMA[1]
acu(incr, hold) addr(1) dmux(sra,ba) alu(hold) mac(macc)

//Move MAC o/p to ALU
acu(hold, hold) dmux(sm,sm) alu(seta) mac(hold)

//Wait for ALU output
acu(hold, hold) dmux(sm,sm) alu(hold) mac(hold)

// Write the MAC content to holding register A
acu(hold, hold) addr(1) dmux(sa,sa) alu(hold) mac(hold) write(bus) jmp(eob,waitForNew)


To monitor performance a semaphore can be output during the calculation and cleared in the wait loop. This is then connected to an output pin and monitored with a logic analyzer or a oscilloscope.


Saturday, February 6, 2016

The Cypress PSoC 5LP

I recently started to work on a project that needs several analog input and output channels connected to some sensors and a PID control loop and it will probably work better with 5V than the currently popular 3.3 for Cortex M systems. So I decided to dig out a  CY8CKIT-059 PSoC 5LP Prototyping Kit  that has been gathering dust waiting for the right project to come along.

This is actually an amazing chip, even if the processor is a standard Cortex-M3 that runs up to 80MHz (there are signs that Cypress is working on a PSoC7 with a M7 core and probably a price to match). The 64K RAM and 256K FLASH sizes are modest but what makes this chip special are the configurable analog and digital blocks on the chip. There are three ADC's, one with high impedance buffers, two DAC's and four opamps, all connected to an analog switching network. The digital side has a number of universal digital blocks to implement your own digital logic or preconfigured communication interfaces and a digital filter block, DFB. This is a 24 bit datastream co-processor with a multiply-accumulate unit and a ALU.  The analog and digital I/O can be run from 1.71 to 5.5 volts.

The vendor supplied tool chain PSoC Creator only runs under Windows but it is not a bad experience even though I am always skeptic of systems that generates code that is hard to know where one can change and how. I find it often less trouble to implement stuff directly from the data sheet than to learn the ins and outs of the library calls. In order to use the extra analog and digital blocks in the chip I think it is really necessary to use the vendor supplied toolchain, and without these extras the chip is not very special.

Bildresultat för cy8ckit-059


Add the fact that the CY8CKIT-059 PSoC 5LP Prototyping Kit can be bought for $10 this is definitely a system worth trying out, even if USB connectors made from four strips on top of the circuit board is not the most professional and stable connection method. That can be fixed with an old USB cable and a soldering iron, and the price will still be attractive. The on board programmer and debugger is also a programmable PSoC 5LP and could be used for a project requiring very few I/O pins.

Getting started is quite easy, install some USB drivers, the PSoC Creator and PSoC Programmer software with example projects. Open an example project and hit the debug button. I started with the "CE95277 ADC and UART" project and soon had the board sending ADC samples over USB serial to my PC. Getting used to all the tools and panels takes a few days but the help functions are easy to access and components have datasheets and code examples the opens with a right click.