Sinc interpolation


#1

Hi,

while reading this paper about band limited impulse train synthesis I stumbled across sinc interpolation, which apparently can generate resampled versions of a discrete signal without aliasing.

This might be a really handy tool for wavetable oscillators. The obvious problem here is that it contains a sum from minus infinity to infinity which we right now cannot calculate in a finite timeframe. Given that the influence of higher/lower parts of this sum decreases rapidly, one could only calculate a window of that sum and still get a good approximation which would be pratically aliasing free.
However, that still is a lot of computation and I wonder if something like this has ever been used in a sythesizer.

Then I had a second thought: Apparently you can record any sample from any source, extract one oscillator cycle of a low frequency note and use that as a wavetable. The you can use the sinc interpolation to pre-compute more waves with less harmonics. Lets say we have a base wavetable that is aliasing free up to C0, and then we sinc-interpolate a new wavetable thats aliasing free up to C1, then one for C2, etc. etc.

In the final oscillator algorithm, its just a matter of picking the right wavetable (so for E0 you would use the wavetable for C1) and playing it back at the right speed (which will basically be less or equal that one wavetable-step per sample => aliasing free). Obviously that means that when going from C1 to C#1 there will be a change in the underlying wavetable (because C1 can use the wave thats valid up to C1 and C#1 has to use the next wave which is valid up to C2). An interpolation could make this change less rapid and less audible.


#2

Sinc interpolation = filtering higher harmonics with an ideal low-pass filter (0dB in pass-band, -oo dB outside of it, infinitely steep slope).

On pretty much all my products (Shruthi, Ambika, Tides, Yarns, Warps, some Braids algorithms…), the basic waveforms like sawtooth, square, etc. are generated exactly by the technique you describe. A band-limited single cycle waveform is pre-computed for every 8th or 16th note in the scale (not for every 12th note - octave - because it’s easier to extract the index/fraction without division in base 2), and crossfading is used for notes inbetween.

There are drawbacks to this method:

  • Pre-computing all the band-limited variants requires 7 or 8 times as much data as storing just one cycle and processing it on the fly. It’s OK for something like a sawtooth or square oscillator, but eats too much storage for big wavetables.
  • It doesn’t handle well the case of audio-rate FM, hardsync, waveshaping…
  • Say you have a single cycle, band-limited table for C1 and C2. What do you use for F#1? If you use a 50%50% mix you’ll still have some residual aliasing because the wavetable for C1 cannot be pitched higher than C1. This leaves you with two options: either you accept this faint aliasing, or you use, for F#1, a crossfade of the wavetable pre-computed for C2 and C3 the latter method guarantees aliasing will never show up but you lose the top octave of the spectrum (not an issue if you oversample by 2x, as is the case for Braids).

Another nice method that works for arbitrary wavetables is to use a wavetable storing the derivative and run through an integrator after playback. You get free 6dB/octave attenuation of higher (aliased) harmonics (DPW), at the cost of some extra processing and integrator-hassles (compromises between DC offset vs loss of low-frequencies vs accuracy issues in fixed point).

For “classic” waveforms there is a trove of other techniques.


#3

Thank you for those very interesting papers and links.
There are some very neat tricks you pointed out. I like the idea of using power-of-two mutliples as the “switching point” between two pre-computed waves to avoid expensive division.

How can we deal with audio rate FM in general? Even if we have two sine waves modulating each other, there will be aliasing once the modulation gets wild enough, right? Which maybe is not a problem, as hard FM (or PM) tends to sound harsh and noisy by definition.

I’ll be taking a look at the braids code now to figure out how you dealt with FM there.

Regarding the mix-problem you described: A while back I coded a wavetable osc that works this way (interpolating so that the in-appropriate wavetable is perfectly faded out). Yes, losing the top end is a problem, but its not critical IMHO. 1) on 44.1kHz (or higher) you won’t be hearing much of that anyways and 2) usually there’s a low pass filter involed somewhere in the patch as well.
I was surprised to find out that when sweeping through the frequency range, my implementation perfectly faded out the higher harmonics but it was not audible at all. I expected it to cause some weird sounds because I’m basically fading out one complete octave at once (rather than just removing the topmost partials above f_Nyq), which is producing a spectrum that differs a lot from a real sawtooth/square/whatever. But then again, those high partials are so low in volume that you probably dont hear much of it anyway.


#4

Am I thinking correctly that the only fool-proof and general way of handling hardsync/audio-rate-FM/waveshaping etc. is to compute all of that at a higher samplerate and downsampling it afterwards? (oversampling)


#5

> Even if we have two sine waves modulating each other, there will be aliasing once the modulation gets wild enough, right?

Yes. One way of handling this (used in Braids, Tides, Elements’ FM easter egg) is to dynamically limit the range of the modulation amount as a function of the main carrier frequency and modulator/carrier ratio. You can estimate a polynomial (from aliasing measurements on a large dataset) that gives you how far you can modulate without generating too much aliasing - but simple rules like dividing the FM amount by a function of the carrier frequency work well. All these methods are only possible when all the modulations/non-linear distortions are computed “in the box”.

As soon as you have an external FM input, the only options (to my knowledge) is to low-pass filter the FM input signal, or oversample. For example Warps works at 576kHz, with resampling filters starting to roll-off at 18kHz. The FPGA on the Shapeshifter works at MHz frequencies.


#6

I have taken some time this week to read the latest papers and write proof of concept code for a bunch of ideas, and I have been very impressed by polyBlep. It replaces the sinc table by a small polynomial segment (think resampling by Lagrange interpolation instead of sinc). Some of its advantages over minBlep:

  • the discontinuity is very localized in time (2 samples), so there is no need to keep track of several overlapping bleps - unless your discontinuities occur at a faster rate than sr / 4. At high frequencies, the signal degrades to a triangle (rather than a sinewave).
  • because the compensation is well-localized in time, I expect it to handle FM better.
  • you don’t need a big precomputed table!
  • it doesn’t seem to perform worse than a truncated minBlep.
  • the compensation signal doesn’t “wiggle”, so you don’t need extra headroom for the wiggles (the ringing you get at the edges with minBlep, and the reason why some of Braids’ oscillators are not as loud as other).
  • computationally, I don’t expect it to be faster on Braids’ STM32F1, but it is faster than minBlep on the STM32F4 (float implementation) and very, very probably on x86.

It is low on my priority list but I might rewrite some of the Braids’ “analog” oscillators to use this. The benefit would be mostly code size savings.


#7

That sounds interesting, especially if it is more efficient on an STM32F4, as might be found in, say, a “HyperBraids”…


#8

Haha, you know we’re all waiting for your poly-synth!

Seriously though: While reading through your Braids code, I stumbled across the BLEP stuff and started reading some papers. I haven’t found enough time to really do the math, but it seems pretty cool to me. I’d really like to fully understand the math behind it but that’ll have to wait until my exams are over.
As far as I understand, it creates anti-aliased waveforms with discontinuities (such as square, saw, etc) by adding up pre-computed sinc impulses to the naive waveform.
Correct me if I’m wrong, but apparently polyBLEP replaces the sinc of the original minBLEP with a simple polynomial that resembles a masked sinc. Which (I guess) yields less computation time and memory but more audible aliasing.

I get your positive points about the polyBLEP, except for the FM part. When polyBLEP is just an optimized version (= approximation) of minBLEP, why should it be better?

PS: I like this comparison


#9

> apparently polyBLEP replaces the sinc of the original minBLEP with a simple polynomial that resembles a masked sinc

That’s correct. polyBLEP is to minBLEP what spline/polynomial interpolators are to exact sinc resampling.

In theory, minBLEP is exact and polyBLEP is an approximation. In practice, minBLEP is an approximation too because:

  • You need to interpolate the sinc waveform from a wavetable, rather than compute it exactly.
  • The support of the sinc function is infinite, so you need to truncate it at some point, and truncation creates discontinuities. Or you’ll need to add an additional window on top of the minBLEP and compromise.
  • An implementation with limited computational resources might be able to keep track of only a few minBLEP at a time (Braids keeps track of only 2!).

Regarding FM, I might be wrong, but here is my intuition: with minBLEP a transition at a particular sample will trigger the generation of a rather large number of compensation samples (say 20) to be added to the following samples. If the frequency changes over these next samples, the compensation samples will be incorrect - so you’ll get 20 samples during which the compensation will not be exactly aligned with what it is supposed to compensate. With polyBLEP, a transition will impact only the current and next sample.


#10

hmm, I always thought that the BLEPs are the same size and shape no matter what the frequency of the waveform is. Which makes sense to me, because all they are depending on is the current sample rate - which is fixed.

Yes, minBELP is an approximation, too. But IMHO polyBLEP is an approximation of minBLEP which itself is an approximation, so sound wise, there should be a quality loss. At least according to this thread minBLEP sounds less dull than polyBLEP.
But I’m talking of something I can’t really compare. You seemingly have done some testing on this and you seem to like the result - so I guess the performance loss is not very audible.

Oh and one more thing: AFAIK a big part of the improvement of polyBLEP is that you don’t need to keep track of more than one discontinuity-correction. Which holds true over the whole frequency range, if the correction is just 2 samples wide. But what would you do with PWM? It’s pretty easy to make a pulse short enough to get two corrections to overlap. Maybe thats more of a academic question, as most people won’t use super-short pulses or super high pitched PWM.


#11

> I always thought that the BLEPs are the same size and shape no matter what the frequency of the waveform is.

The (min)BLEPs won’t have the same shape when you add them to the waveform - it depends on the transition time. If the transition occurs at a fractional sample of 0.1, you play your minBLEP function evaluated at times 0.1, 1.1, 2.1, 3.1 and so on. If the transition occurs at a fractional sample of 0.3, you play it at 0.3, 1.3, 2.3, 3.3. So you need to be able to evaluate the minBLEP function at arbitrary times. This minBLEP evaluation is done by a combination of oversampling and linear interpolation (you compute the minBLEP for t = 0, 1/256, 2/256, 3/256… in a big table, then interpolate for other times. Just like the high quality sinc filter in SRC, which is ridiculously oversampled, and linearly interpolated inbetween. That’s why it eats so much ROM/RAM.

The truncation compromise happens at design time, when you create your minBLEP table. If you decide to go with a short minBLEP table (by tapering the coefficients), you’ll save memory and computational resources, but the performances won’t be good.

> But IMHO polyBLEP is an approximation of minBLEP

The point is that minBLEP and all the corner-cutting required to make it work on a modest platform
doesn’t perform better than vanilla polyBLEP. “Practical, corner-cut minBLEP” and “polyBLEP” are both approximations with similar performance.

> improvement of polyBLEP is that you don’t need to keep track of more than one discontinuity-correction

Even better: you do not need to keep track of any discontinuity-correction at all! It doesn’t imply that discontinuities cannot overlap. I have actually coded yesterday a case where two discontinuities occur on the same sample (hardsync + waveform edge occurring within the same sample, but at different times) - it wasn’t easy to code and I’m not 100% sure I got it right, but it’s possible.

The difference in implementation with minBLEP is in what happens at a discontinuity.

With minBLEP, when a discontinuity occurs, you will have to modify a bunch of forthcoming samples. The common way of implementing it is to schedule the playback in a “minBLEP player” - a data structure keeping track of all running minbleps, and to ask the minBLEP player the sum of all pending minBLEPs for each sample. So you effectively have to do some bookkeeping. To avoid this, you could decide to pre-compute the corrections in one go and add them to a buffer, but then some samples/blocks will be more expensive to compute than others (I know at least one commercial product which uses this approach rather than keeping track of all minBLEPs - but it’s not a good idea for an embedded system in which you want the run-time per sample, or at least per block, to not deviate much from its mean value).

With polyBLEP, when a discontinuity occurs, the only samples that need a correction are the current sample and the next sample. So you directly add the correction to the current sample, and add a correction to a “next sample” variable that will be added to the next sample, and you’re done, you can forget that the transition has occurred.


#12

You can access some of the code I’ve been playing with here in typical Mutable Instruments “bells-and-whistles-enabled-by-template-parameters” style.


#13

Hmm, thanks for all the nice input :slight_smile: I’ll take a look at the repo shortly.

You are right, the transition time matters for the shape of the (min)BLEP but not the frequency. You said:
[…] If the frequency changes over these next samples, the compensation samples will be incorrect […]
But I think that’s not correct, given the fact that only the exact position of the BLEP is important and not the frequency of the waveform for which they are generated.

Oh wait…
Did you mean that when you start to add the correction samples for a transition thats 20 samples ahead, due to the FM the transition might actually be 21 samples ahead - which yields a different transition-point than the original prediciton without FM?
Ah, now it’s starting to make sense to me!

One more thing…
Just thinking out loud: According to the BLEP idea, a bandlimited waveform basically is a naive waveform with some corrections before-at-and-after the discontinuity - this holds true in the time-discrete and time-continuous domain. Now what confuses me is: Imagine an square oscillator with hard-sync. It is -1V until the hard-sync event arrives. Then it switches to +1V. The digital (min)BLEP representation of this would try to estimate the exact time of the sync event and start deviating from the 1V (by adding the BLEP) before the event has even happened. Simply speaking: How can the analog sync oscillator produce the (approximately) same waveform as the digital BLEP-based algorithm? There’s no way that the analog circuit can “know” when the sync event will occur so there’s no way it can start deviating from the -1V in the same way the BLEP-algorithm would do.

Where’s the error in my thinking?


#14

> Where’s the error in my thinking?

Two errors I think:

  • The minBLEP is causal, so it is mixed with the samples that follow the transition.
  • We’re comparing waveforms while we should be comparing spectra.

For example, a truncated Fourier series for a square wave has ringing on the edges - that the analog waveform won’t have. But within a frequency band of interests, both signals can be considered as similar.