Saturday, November 19, 2016

PSoC 5LP - DFB assembler and some more ACU techniques

Saving ACU state and using per loop block addressing

Working with the DFB assembler for a while reveals how clever and powerful the architecture is, the ACU registers and the ACU RAM can be used to generate address patterns to pointing into the RAMA and RAMB memory blocks. 

We can use the ACU and ACU RAM to create several blocks of RAM memory that are used by consecutive loops. A possible configuration using three blocks with 3 and 4 elements respectively. is seen in the figure below.
  • ACU RAM[0] saves the start positions for the current block, this can be read into ACU REG with acu(read,read) addr(0) instead of using acu(clear,clear) to address the base of the block
  • ACU RAM[1] saves the size of the blocks, read into FREG and used to increase memory pointers at the end of a loop 
  • ACU RAM[1] saves the last position in the last block, this is used but the ACU modulo arithmetic to loop back to the beginning of the first block. 

At the end of a loop, before waiting for new input, the current base position is loaded and incremented with the values in FREG, this new memory base is then saved into ACU RAM[0] and used in the next loop.
When new input data has arrived the addresses are read from ACU RAM[0] at the start of the next loop, this gives correct RAM addressing even if the DFB has been paused and restarted to change memory parameters between the loops. The code also tests if the updated base value is 0, indicating the end of a major loop.

The following assembler code shows the control flow, the value of the current RAMB base address is written to holding register A but no other useful work is done. The process input section only reads, and discards the input from staging register A. The acu magic is marked in orange.

// ACU and block addressing
// Every loop uses a separate memory block in RAMA and RAMB
// The base address of this block is updated and saved at the end of the loop, before waiting for input data
// This base address is read at the start of the loop, in case there was a Pause/Resume event while waiting for input
// At any point in the loop the base address can be reloaded 
// with "acu(read, read) addr(0)"
area acu
org 0
dw 0x0000    // Memory location to save block base address
dw 0x0403    // Size of block, equals the increase of block base address for each loop
dw 0x130E    // REGM values, maximal RAMA and RAMB addresses before wraparound

area data_a
org 0

area data_b
org 0
dw 0x0000
dw 0x0001
dw 0x0002
dw 0x0003
dw 0x0004
dw 0x0005
dw 0x0006
dw 0x0007
dw 0x0008
dw 0x0009
dw 0x000A
dw 0x000B
dw 0x000C
dw 0x000D
dw 0x000E
dw 0x000F

acu(clear, clear) dmux(sa,sa) alu(set0) mac(hold)
acu(loadf, loadf) addr(1) dmux(sa,sa) alu(set0) mac(hold)
acu(loadm, loadm) addr(2) dmux(sa,sa) alu(set0) mac(clra) jmp(eob,wait_input)

acu(read, read) addr(0) dmux(sa,sa) alu(hold) mac(hold) jmp(eob,process_input)

// Only outputs current RAMB base for testing
acu(hold, hold) dmux(sa, sa) alu(clearsem, 001) mac(hold)
acu(hold, hold) addr(1) dmux(sa ,ba) alu(setb) mac(hold)
// Set ALU to RAMB[ACUB]
acu(hold, hold) dmux(sa, sra) alu(setb) mac(hold)
acu(hold, hold) dmux(sa, sa) alu(hold) mac(hold)
acu(hold, hold) addr(1) dmux(sa, sa) alu(hold) mac(hold) write(bus) jmp(eob,loop_end)

// Move acu registers to point to next memory block, and save in ACU RAM[0]
acu(read, read) addr(0) dmux(sa,sa) alu(hold) mac(hold) write(da)
acu(addf, addf) dmux(sa,sa) alu(hold) mac(hold) jmp(acubeq, major_loop_end)

acu(write, write) addr(0) dmux(sm,sm) alu(hold) mac(hold) jmp(eob,wait_input)

// Set alu 1 to show that we have detected end of the major loop,
// used in component simulator for testing
acu(hold, hold) dmux(sa, sa) alu(set1) mac(hold)
acu(write, write) addr(0) dmux(sa,sa) alu(hold) mac(hold) jmp(eob,wait_input)

acu(hold, hold) dmux(sa,sa) alu(setsem, 001) mac(hold) jmpl(in1,loop_start)

Monday, November 7, 2016

PSoC 5LP DFB assembler - ACU as a loop counter

This time we will use the address calculation unit ACU as a loop counter. We create a PSoC creator project to test our ideas. The DFB assembler code is checked out in the component simulator until everything seems to work as planned and then we place the code onto a real chip under control of the debugger.

Test project

The test project is setup to transfer ADC readings to the DFB using DMA. A new output value from the DFB is signaled by raising an interrupt. The interrupt flag can be read from the main loop or handled by an interrupt handler. In the test code the interrupt status is checked in the main look and DFB output data is saved in a SRAM array.

Example - Mean value of N samples

This example calculates the mean value of successive blocks of 10 samples.
Samples are read from staging register A, multiplied with 0.1 (1/N) and accumulated in the MAC.
The address calculation register ACUB is used as loop counter. Modulo arithmetic for ACUB is enabled and the final loop counter value N (10) is loaded to the MREG register from ACU RAM[0] during setup . When the loop counter reaches 10 the accumulated value is written to output register A and the accumulator is cleared before waiting for the next input value.

// Calculates average of 10 inputs
area acu
org 0
dw 0x000A

area data_a
org 0
dw 0x0CCCCC // RAMA[0] = 0.1

acu(clear, clear) dmux(sa,sa) alu(set0) mac(hold)
acu(hold, loadm) addr(0) dmux(sa,sa) alu(hold) mac(clra)
acu(setmod, setmod) dmux(sa,sa) alu(hold) mac(hold) jmp(eob,wait_input)
// Read staging register A to MAC port B and multiply-accumulate with RAMA[0]
acu(hold, incr) addr(1) dmux(sra,ba) alu(clearsem, 001) mac(macc)
// Move MAC o/p to ALU
// When ACUB is 10 then block is complete and output is written to holding register
acu(hold,hold) dmux(sm,sm) alu(seta) mac(hold) jmp(acubeq,write_output)
// Use semaphore0 to signal that DFB is waiting for input
acu(hold,hold) dmux(sa,sa) alu(setsem, 001) mac(hold) jmpl(in1,process_input)
//Wait for ALU output
acu(hold, hold) dmux(sm,sm) alu(hold) mac(hold)
// Write the MAC content to holding register A and clear the MAC
acu(clear, clear) addr(1) dmux(sa,sa) alu(set0) mac(hold) write(bus)
acu(clear, clear) dmux(sa,sa) alu(hold) mac(clra) jmp(eob,wait_input)

There are a few notes:
  1. The acubeq condition is true BOTH when the ACUBREG is equal to MREG and when its 0, so we catch the end of the loop when ACUBREG equals MREG and reset it in code
  2. Placing the wait_input state after the process input eliminates one jump 
  3. The jump when waiting for input is done with a loop jump jmpl(in1,process_input) that transfers control to the beginning of the block if the condition is not true, the block is this single instruction. 
  4. The jump when checking the loop count jmp(acubeq,write_output) is a simple jmp that falls through to the next block ( wait_input) if the condition is not set.
  5. Each sample is processed in less than 4 cycles, including the write of the output. If the DFB runs at 48MHz this equals 12MSamp/s.
  6. Mux settings are only important in two instructions in this example: 
      1. When reading from RAMA to channel A and the input register A to channel B, passing these values to the MAC input. dmux(sr,ba) , mux3 settings are not used.
      2. When outputting accumulated value from MAC to ALU input A. dmux(sm,sm) , only mux3a setting is used.

The example project can be found at: