How to compute the ram still available

Hi, I’d need to know how much ram is still available before bumping against the stack.
I.e. given the last shruthi1.size:
text data bss dec hex filename
61386 546 3374 65306 ff1a build/shruthi1/shruthi1.elf
and bss being the ram statically allocated, we would have 4096-3374=722 bytes free for stack.
(Edit: should I include the data segment? In this case this question is pointless because the stack has only 176bytes to grow)
Is there a “known” safety margin about it or should we assume that when Shruthi get stuck it means it bumped?

Thanks on advance

Yes, I took into account the fact that there’s a read sync buffer for osc2 and a write sync buffer for osc1.

For the UPDATE_PHASE_MORE_REGISTERS case:

Previous code : 6 + 2 + 6 + 2 = 16
New code: 4 + 6 + 11 + 4 = 25

UPDATE_PHASE_MORE_REGISTERS is loading the sync buffer pointers in registers (2x 16 bits pointers). This means that 4 registers won’t be available for the synthesis code. For simple synthesis algorithms that do not need many registers, it’s fine. However, if the synthesis code is actually needing many registers, there’ll be spilling to the stack, and the code becomes less efficient than if the sync buffer pointers are directly read from memory. I have compared the generated code for each variant and each oscillator algo and picked the shortest one.

Olivier, I have a doubt about the cycle count.
Did you already take in account the fact that in the new code the sync is on only for one input/output oscillator at a time?
If it wasn’t the case the penalty would occur only in the more registers macro, even though not a light one.
I’m wondering why do you use the more registers macro only in some cases and not for all.

Old code:
UPDATE_PHASE

Read sync buffer: 21 cycles
Write sync buffer: 18 cycles
=39*2=78

UPDATE_PHASE_MORE_REGISTERS

Read sync buffer: 6 cycles
Write sync buffer: 2 cycles
=8*2=16

Whether sync is enabled has no impact.

New code:

UPDATE_PHASE

Read sync buffer: 12 cycles when sync is off / 25 cycles when sync is on
Write sync buffer: 12 cycles when sync is off / 22 cycles when sync is on
=24 or 37 or 34=48 or 74 or 68

UPDATE_PHASE_MORE_REGISTERS

Read sync buffer: 4 cycles when sync is off / 11 cycles when sync is on
Write sync buffer: 4 cycles when sync is off / 6 cycles when sync is on
=10 or 15=20 or 30

Yes, randomly degrading is an option, and I had seen the PRNG state, I’ll keep experimenting and see if I can make something useful, otherwise I’ll move to another oscillator or hack.
You can give any suggestion you want, they are welcomed!

woah. that needs to go on a t shirt

BTW, are you aware of the noise sync trick on the Shruthi? If you set the second oscillator to noise, and mix mode to sync, the PRNG of the noise oscillator is reseeded with the same value every time the master oscillator completes a cycle, so you actually get the oscillator to spit a cyclic loop of random samples. Send that into the analog low-pass with a decaying envelope and you’re close to karplus strong territory indeed…

I don’t want to give you too many ideas, but you can decide to “degrade” a sample in the table or not by comparing a random number with a threshold. With the threshold at 0, it’s classic KS (always degraded), with the threshold very high, it’s a loop of noise all the time. In between you have a kind of filter decay time parameter…

Nice trick, actually at the moment I’m feeding it with an existing rom wavetable chosen by parameter, this seems to give more character to the sound than noise.
Another reason I’m trying to use a longer buffer is that even with a minimum damp factor sound decays too rapidly on a short buffer and with 8 bits there is not enough headroom to soften things (not speaking about aliasing, let’s pretend it’s a feature…) unless I make it a 512/16bit buffer, but probably it would overrun the time span. I have to try.

Another trick for karplus strong: you don’t have to fill the whole buffer with noise when the note is trigger. You can fill it with noise along the way during the first “cycle” through the wavetable.

We are already scraping the stack then…
Ehm, about the userwavetable, that was already taken :slight_smile:
I did several tests, and changed my mind several time in the process.
My first approach was indeed the degrading wavetable but I wasn’t very happy with the sonic result.
On the other side it is true that rendering low notes takes more time in the burst phase, and in any case
with 1k or so, frequency must be adjusted by interpolation nonetheless, because of scarcity of resolution along the keyboard.
I take note of your suggestions and rethink about it, if I’ll manage to get some decent results I’ll post on github.

Well, it seems there is a penalty with sync enabled, especially when used with the more register macro, so I don’t want you take the risk to break anything in order to saving ram.
The reason for the ram hunt is that I’m trying to implement a Karplus-Strong algorithm, where you need more buffer to get to low notes, so I’ll have to find some workaround about it.
Anyway, thanks for the test and the time you dedicated to this.

One important thing: I was wrong about the RAM count. You indeed have to add data and bss so we’re very close to the stack top. Sorry for that.

You could do your karplus strong thing in the 1024 bytes buffer allocated to the user wavetable, though. That’s more than enough. I think that the naive implementation in which you change the buffer length as you move in frequency is the “wrong” one - it takes more and more space for low notes; and it does not allow pitch bending. I have implemented it in a past project using a fixed size buffer of 512 samples (it’s more than enough). You have to think of karplus strong as a wavetable oscillator in which you “degrade” the samples from the table once the phase accumulator has gone through them - nothing prevents your phase accumulator to move 0.25 samples at a time using interpolation so that you can hit very low notes, or 1.01 samples at a time for fine pitch bending. I think this is how it is done in the csound version too.

So I’ve counted the cycles… Ignoring the 3 cycles actually required to reset the phase accumulator when the sync is enabled and there’s a flag set in the sync buffer.

Old code:

UPDATE_PHASE

Read sync buffer: 21 cycles
Write sync buffer: 18 cycles

UPDATE_PHASE_MORE_REGISTERS

Read sync buffer: 6 cycles
Write sync buffer: 2 cycles

Whether sync is enabled has no impact.

New code:

UPDATE_PHASE

Read sync buffer: 12 cycles when sync is off / 25 cycles when sync is on
Write sync buffer: 12 cycles when sync is off / 22 cycles when sync is on

UPDATE_PHASE_MORE_REGISTERS

Read sync buffer: 4 cycles when sync is off / 11 cycles when sync is on
Write sync buffer: 4 cycles when sync is off / 6 cycles when sync is on

So the worst case (sync enabled) performance is actually worse, and I’m a bit reluctant to implement this, especially that close to a firmware release… Maybe you have a secret motive in this campaign to free RAM that would make me reconsider this?

I’ll try the osc sync thing.

I have modified the code to have the fn table in flash, saving the 100 bytes of RAM.

Good catch!

I don’t mind using the trick, I’m just wondering how it would improve execution time since it’s adding more branching. One advantage is that it would save 80 bytes of RAM, though RAM is not a problem for me at the moment…

I looked at the lss, to be honest I’m not expert navigating in that mangled mess so I’m not sure, but we have the following scenarios:

  • no sync and sync, previous method
    osc1 -> 2 test memory via pointer with increment
    osc2 -> 2 test memory via pointer with increment
  • no sync, method proposed
    2 tests for value of the pointer in and out
  • sync, method proposed
    osc1 -> 1 test pointer value false for input, 1 test pointer value true for output, one more test memory via pointer with increment
    osc2 -> 1 test pointer value true for input, 1 test pointer false for output, one more test memory via pointer with increment

I think in the worst case (sync) they should be equivalent, if you have time get a look at the lss.

Shouldn’t be 120 bytes of ram?: sync_state_, no_sync_,dummy_sync_state_

I have reproduced your suggestion in the code and confirmed it works for a few “torture patches”. I’ll count cycles and see if there are no corner cases.

In the fight with the Vanilla editor, i stripped a few lines from the source after RenderBandlimitedPwm and there was a missing underscore, but it should have been obvious by the context.
And the scenario analysis counts a test in excess for each oscillator for the previous method.
Sorry but today I’m bit messy…