LPC Playback Formant-Shifting


#43

For a smoother, less bit-crushed-sounding formant shift, would some kind of granular pitch-shift, combined with an opposing shift to the chirp pitch be the way to go, do you think?


#44

No, I’d directly mess with the coefficients to shift the filter responses up and down.


#45

I’m vague (at best) on the way the LPC filter works, but it was my understanding that the 10 coefficients represented gains at fixed frequencies (a bit like a vocoder), so manipulating the coefficient values wouldn’t have the effect of shifting the formants up and down, in the same way that altering the filter rendering sample-rate does.

I guess you could tilt the values of the bands one way or the other, to emphasise low of high bands, but I don’t think that would have the same effect. I imagine it sounding more like a tilt EQ.

Maybe if I could get an LPC10 encoder working in realtime, I could manipulate some of the parameters of the encoding, to shift formants when the encoded signal is decoded again.

That’s quite a different project, though, and wouldn’t easily be applicable to playback of pre-encoded LPC data.


#46

Wrong!

LPC coefficients are the coefficients of the polynomial at the denominator of the filter response of the filter. The more coefficients there are, the peakier the filter can be, but a specific coefficient cannot easily be mapped to a specific frequency band.

It’s possible to modify the coefficients to get the same spectrum shifted up or down.


#47

When you say ‘peakier’, do you mean there can be more peaks in the spectrum of the filtered source?
Do the 10 coefficients represent 10 peaks in the filter’s response, or is it not as simple as that?

Ah, OK.

OK, but would it be possible to apply the same transformation to all possible filter coefficient states, and achieve the same formant-shifting effect?

If so, how might one go about finding the necessary transformation?

I have a Teenage Engineering PO-35, which does realtime LPC playback. It’s not clear if it does realtime encoding of recorded audio sample data at runtime, or pre-encodes the sample to an LPC data-stream as an offline (but very fast) process.

It definitely seems to be able to do much cleaner-sounding formant-shifting, and on an inferior MCU, too (Cortex M3, I believe).

I did try contacting the firmware developer to see if he could give me some clues, but he never got back to me, and the source isn’t publicly available, sadly.


#48

Not necessarily… Think of it this way: you have a 10 “peak tickets” (poles). You can use 2 “peak tickets” to make a bump that corresponds to a 12dB band-pass. If you want to make a sharper bump, you can spend more tickets on this bump. So you can have 5 peaks, or 1 very narrow peak, or 3 peaks + 1 sharper one.

The roots of the polynomial represented by these 10 coefficients work in pair to give you the cutoff and resonance of those elementary band-pass filters.

Note that we’re talking about the LPC coefficients here. The coefficients you’re probably manipulating on your side in your LPC code are lattice filter coefficients (or reflection coefficients)… but there’s a simple formula to convert back and forth.

Any textbook about AR models and their applications to speech processing. My university textbook has the formulas. They are not pretty looking.


#49

Ah, so it’s really nothing like a vocoder, then because the 5 peaks can be moved around and combined arbitrarily.

So, the LPC coefficients aren’t encoded into the LPC bit-stream at all, and there’s no transformation that can be done on the lattice-filter coefficients (which are what IS encoded into the bit-stream) that would produce a formant-shifting effect, I guess.


#50

No… The lattice filter coefficients and the LPC coefficients can be transformed from one into the other.


#51

Ok… I’ll look into that. If even you say the equations ‘don’t look pretty’, though, I don’t think I have much hope… :wink:

I also suspect that formant-shifting LPC output isn’t something many people will have wanted to do, so there probably won’t be any freely-available code I can study. I certainly haven’t found any, so far…

Thank you very much for your patient explanations. They’re much appreciated.

If I may, I have some (hopefully simpler) questions related to the blep setup you mentioned at the top of the thread.

My current LPC filter is based on the one used in the MAME tms5110 emulation.

It’s fixed-point, and output is 14-bit signed integer.

Because the filter result is fixed-point, I’d like also to implement your variable sample-rate and blep setup in fixed point.

I’m unclear how big to make the bleps, though.

Here’s your code (to save you having to scroll all the way up):

Plaits variable sample-rate render loop:

while (size--) {
    float this_sample;
    this_sample = next_sample_;
    next_sample_ = 0.0f;
    clock_phase_ += rate;
    if (clock_phase_ >= 1.0f) {
      clock_phase_ -= 1.0f;
      float reset_time = clock_phase_ / rate;
      float new_sample = synth_.RenderSample();
      float discontinuity = new_sample - sample_,
      this_sample += discontinuity * ThisBlepSample(reset_time);
      next_sample_ += discontinuity * NextBlepSample(reset_time);
      sample_ = new_sample;
    }
    next_sample_ += sample_;
    *output++ = this_sample * gain;
   }

It looks like reset_time in your code above goes from 0 to the sample-rate divisor so, in my case, where I’m shifting ± 12 semitones, that will be 3 at the highest sample-rate, and 12 at the lowest). This means the bleps are larger for lower sample-rates, I guess. Not sure why.

If I were to convert this setup to fixed-point, I’m confused about how I would go about choosing the appropriate numbers for the phase-increment and phase-increment wraparound values, in order to get the bleps the right size to do their thing.

I’m also beginning to think that maybe the blep functions you pointed me to, from the Warps firmware (below) is predicated on the value of t always being in the 0 - 1 range, since, looking at the equivalent fixed-point blep functions from Braids, it looks like the NextBlepSample() function produces a value in the same 16-bit range as the ThisBlepSample(), but inverted.

This wouldn’t be the case with the floating-point version, in your variable-sample-rate Plaits LPC setup, since reset_time (I think) goes as high as 24.0f at the minimum sample-rate.

Warps blep functions:

static inline float ThisBlepSample(float t) {
    return 0.5f * t * t;
  }
  static inline float NextBlepSample(float t) {
    t = 1.0f - t;
    return -0.5f * t * t;
  }

Braids blep functions:

inline int32_t ThisBlepSample(uint32_t t) {
    if (t > 65535) {
        t = 65535;
    }
    return t * t >> 18;
  }
  inline int32_t NextBlepSample(uint32_t t) {
      if (t > 65535) {
          t = 65535;
      }
      t = 65535 - t;
      return -static_cast<int32_t>(t * t >> 18);
  }

Any pointers would be much appreciated, and apologies for the super-long post.


#52

is always between 0.0 and 1.0 because after a phase reset, clock_phase_ is always less than rate.


#53

Ah… because clock_phase is reset to 0 before the divide… stupid me…


#54

Sooo… I just need to know how big I need reset_time to get, in order for the fixed-point blep function to produce bleps of the correct amplitude.

Can I ask what the maximum value produced by your floating-point lattice filter function is?


#55
  1. Check the Braids code.

No idea… The filters can get quite resonant! I clip anything to [-1.0, 1.0] anyway.


#56

I’ve tried this:

 for(uint8_t j = 0; j < BUFSIZE; j++){
	// Grab excitation signal
	m_excitation_data = inlet_EXCITATION[j];
	
	int32_t this_sample;
	this_sample = next_sample_;
	next_sample_ = 0;
	clock_phase += sample_rate;
	if (clock_phase >= 1.0f) {
		clock_phase -= 1.0f;
		uint32_t reset_time = (uint32_t)((clock_phase / sample_rate) * 65535);
		int32_t new_sample = lpc_lattice_filter();
		int32_t discontinuity = new_sample - sample_;
		this_sample  += discontinuity * ThisBlepSample(reset_time) >> 15;
		next_sample_ += discontinuity * NextBlepSample(reset_time) >> 15;
		sample_ = new_sample;
	};
	next_sample_ += sample_;
	// Voice output sample
	if(param_bleps)
		outlet_wave[j] = this_sample << 13;
 	else
 		outlet_wave[j] = sample_ << 13;
 };

With the blep functions from Braids:

inline int32_t ThisBlepSample(uint32_t t) {
	if (t > 65535) {
		t = 65535;
	};
	return t * t >> 18;
};
  
inline int32_t NextBlepSample(uint32_t t) {
	if (t > 65535) {
		t = 65535;
	};
	t = 65535 - t;
	return -static_cast<int32_t>(t * t >> 18);
};

I left-shifted the output of the filter by 2, to get it to 16 bits.

Unfortunately, I can’t hear any appreciable difference with the bleps option on or off, so I guess something is wrong somewhere…


#57

One though… since you’re not trying to emulate and underclocked speak & spell, why note use linear interpolation?

This would look like this:

current_sample_ = 0.0f;
next_sample_ = 0.0f;
while (size--) {
  clock_phase_ += rate;
  if (clock_phase_ >= 1.0f) {
    clock_phase_ -= 1.0f;
    current_sample_ = next_sample_;
    next_sample_ = synth_.RenderSample();
  }
  *output++ = current_sample_ + (next_sample_ - current_sample_) * clock_phase_;
}

Simpler because you don’t have to deal with sharp transitions.


#58

Ah, good thought! Thank you, will try that!

Any idea why my bleps weren’t working, though?


#59

You output samples << 13 but you use something not shifted to compute the discontinuity?


#60

Ah, OK. I up-shifted the LPC filter output to 16 bits, but maybe it should be more.

In Braids, the final sample data seems to be 16-bit.

In Axoloti-land, audio signals are 27-bit (but shifting up 11, it was still a bit quiet).

The weird thing is, I hear absolutely no difference, with bleps turned on- no shift in level, or any discernible change in tone at all.

I’ll try the lerp, anyway, and see how it sounds.


#61

My attempt at a fixed-point version.

for(uint8_t j = 0; j < BUFSIZE; j++) {
	// Grab excitation signal
	excitation = inlet_EXCITATION[j];

	// Increment clock-phase
	clock_phase += sample_rate;

	// Variable sample-rate mechanism
	if (clock_phase >= 65535) {
		clock_phase -= 65535;
		current_sample_ = next_sample_;
		next_sample_ = lpc_lattice_filter(excitation);
	};

	// Voice output sample (with linear interpolation)
	outlet_wave[j] = next_sample_ Xfade16(current_sample_, next_sample_, clock_phase) << 13;
};

And the Xfade16() function:

int32_t Xfade16(int16_t a, int16_t b, uint16_t x) {
	int32_t ccompl = (1 << 15) - x;
	int32_t result = (int32_t)b * x;
	result += (int32_t)a * ccompl;
	return result >> 16;
};

Seems to work. Sort of. I get lots of annoying clicks, though.


#62

Resetting phase to exactly 0 helps.