> UARTs are not used for SPI
Actually, you can configure the UART module on the ATMega to be used as a SPI master
Well, I wasn’t right to call it a UART in the first place, this is a USART indeed ; so a lowly ATMega328p can have two SPI master peripherals, one with the “proper” SPI module, and one with the USART (which is, surprisingly, faster in some situations).
> the isr is taking so much time that midi events are dropped when there is too much incoming midi data causing notes to be locked when pitch bending for > example.
Two solutions to that :
- Use a non-blocking ISR (further interrupts are still allowed while the handler is executed) so that the UART RX handler interrupt can be triggered even while the “audio” timer interrupt runs. They are declared with ISR ( TIMER1_OVF_vect, ISR_NOBLOCK )
- Do not use the arduino serial thing, just poll the UART from your audio timer interrupt (I’m now using this in the Shruthi code, it’s cheaper), put the byte in a ring buffer, and pick data from the ring buffer in your main loop. If there’s already a timer interrupt happening, and going faster than the MIDI data rate, why not poll from there instead of using a dedicated RX interrupt?
> When I say push/pop I am referring to a write/read from the top of the stack.
> When I say register (in regards to code) I am referring to a register in the cpu, not a ram or stack location.
Yes, that’s what I am referring to
I’ve written a few compilers/code generators in the past (university and commercial products), so I’m a bit obsessed about how things are turned into code. The AVR behaves like many load/store architectures. Variables are loaded into registers from memory, processed in registers, and stored back in memory. If your code does something like this:
// do something with A
// do something with B
// do something with C
// compute A + B + C
There are two options for the compiler: keep A, B and C loaded in different registers to save a load at the 4th step; or reload them from memory at the 4th step. The first option is often the one picked by the compiler because memory load/save are slow ; but the downside is that A will “occupy” a register until step 4. So this code has a “register depth” of 3 ; and that’s the “lifetime” I mentioned. There’s a significant chunk of code during which A will be maintained “alive” in a register.
This variant:
// do something with A
// acc += A
// do something with B
// acc += B
// do something with C
// acc += C
With this code, the compiler can do a much better job : one register is used to hold acc ; and one register is used to hold A, then reused for B, then reused for C. Each variable has a shorter “lifetime” because it needs to be held in a register for a short number of instructions.
And that’s pretty much why the reorder version I gave you is likely to perform better.
> From my perspective gcc shouldn’t generate even a single push/pop (unless the obvious entry/exit and possibly for operations that are not supported by hardware, 16-bit add perhaps?).
All the phase accumulator incrementations in your ISR are translated into the following code:
// update pcm values for all 16 polyphonys
// ppcm[ 0 ][ 0 ] += pfreq[ 0 ][ 0 ];
lds r18, 0x08CC
lds r19, 0x08CD
lds r24, 0x08A9
lds r25, 0x08AA
add r18, r24
adc r19, r25
std Y+4, r19 ; 0x04
std Y+3, r18 ; 0x03
sts 0x08AA, r19
sts 0x08A9, r18
// ppcm[ 0 ][ 1 ] += pfreq[ 0 ][ 1 ];
lds r30, 0x08CE
lds r31, 0x08CF
lds r24, 0x08AB
lds r25, 0x08AC
add r30, r24
adc r31, r25
sts 0x08AC, r31
sts 0x08AB, r30
// ppcm[ 0 ][ 2 ] += pfreq[ 0 ][ 2 ];
lds r22, 0x08D0
lds r23, 0x08D1
lds r24, 0x08AD
lds r25, 0x08AE
add r22, r24
adc r23, r25
std Y+2, r23 ; 0x02
std Y+1, r22 ; 0x01
sts 0x08AE, r23
sts 0x08AD, r22
…
// ppcm[ 2 ][ 0 ] += pfreq[ 2 ][ 0 ];
lds r6, 0x08DC
lds r7, 0x08DD
lds r24, 0x08B9
lds r25, 0x08BA
add r6, r24
adc r7, r25
sts 0x08BA, r7
sts 0x08B9, r6
You see the pattern? gcc is reusing different registers for each incrementation, because this allows each ppcm to be accessed later for the “big sum” (pout) without a reload from memory. The downside is that your ISR uses pretty much all the registers.
And thus, this is no surprise that your ISR starts with:
push r1
push r0
in r0, 0x3f ; 63
push r0
eor r1, r1
push r2
push r3
push r4
push r5
push r6
push r7
push r8
push r9
push r10
push r11
push r12
push r13
push r14
push r15
push r16
push r17
push r18
push r19
push r20
push r21
push r22
push r23
push r24
push r25
push r26
push r27
push r30
push r31
push r29
push r28