Friday, 1 July 2016

MUSINGS: Digital Interpolation Filters and Ringing (plus other Nyquist discussions and "proof" of High-Resolution Audio audibility)


A couple weeks ago, Whackamus posed this interesting comment and question which I thought would be a good topic to discuss and explore in greater detail and with some examples/samples:
"I've been reading your blog for years. Or for almost four years, at any rate. I have to thank you for doing what you do. I've likewise always wanted to ask you a question, too, but I don't know how the bleep to to contact you. In any case, since I've been fretting over it afresh, I thought I'd just post it here. If you ever do decide to get to/address it, that'd be great. If not -- hey, no sweat. :)

In any case, I read the following (tonight) on the Stereophile forums:

"I personally think that MQA has some noble goals, in terms of getting as close to the original master as possible, but I think that is far less important than the elimination of the damaging pre-ringing distortion. This has been the bane of digital playback for 30 years, and over-sampling and various filter techniques have tried to deal with it, with limited success."

I won't say that I've never heard ringing -- because I probably have -- but I will say that I've never explicitly said: "Aha! Eureka! Thar be ringing!" Because -- outside of maybe a blurring during transients? -- I have no idea what it sounds like. But my question is less about MY having heard ringing than about the AUDIBILITY of ringing -- pre, post, or otherwise. In a quality DAC (which I've got to assume most of the folks posting on Stereophile.com have access to), how audible are ringing effects? Or, rather, how COMMON are they? I kind of imagine that the Meitners, Lavrys, Levinsons, Stuarts, etc. of the audio world take great care to minimize (pre-/post-)ringing effects and to eliminate ringing in the audible realm. I likewise imagine that both such things are doable, inasmuch as most of us have been enjoying digital audio for decades now. But the Stereophile poster makes it seem as if ringing is the apodeictic bane of digital audio. What am I missing?"


Beautiful question! Like other audiophiles, I've heard that the "dreaded ringing" (like the "dreaded jitter"), over the years has been on the minds of audiophiles as a nemesis which must be slaughtered! Typically, we see images like this in magazines which are of course extremely frightening to look at:

Terrible! That nice little clean digital "impulse" with defined onset and offset has become mangled into this "time-smeared" mess with all kinds of "unnatural" ringing. Most horribly of course is that "pre-ringing" before the main waveform itself (what kind of Hellspawn is an "echo" before the sound itself???!!!). Isn't it unbelievable how awful digital audio is?!

Before freaking out, let's think this through.

Since 2013, I had been exploring this phenomenon and trying to figure out for myself just how much of a problem this is from the perspective of magnitude of audible effect. Folks might want to have a look at previous articles on this:
MEASUREMENTS: Digital Filters and Impulse Response... (TEAC UD-501)
MEASUREMENTS: "Pulse Response" - 5kHz & 10kHz.

Consider for a moment what an "impulse" is in the digital world. It's a sharp transition or transient where from a baseline of 0, it instantaneously goes up to full amplitude. Numerically it looks like this (+32767 being the largest signed number for 16-bits, and -32768 the smallest):

...0, 0, 0, 0, +32767, 0, 0, 0, 0...

I think it's useful to see it as a number sequence of discreet sample points rather than some kind of waveform image as a start. When we look at images of this data with an audio editor where the "points" are conveniently connected for us, we are actually seeing the calculated interpolation as applied by the software. How this interpolation happens is a result of the function being applied which in an audio editor is represented by the line drawing we see.

When I measure an "impulse response", basically what I'm asking the DAC to reproduce (typically with a 16/44.1 signal), is that sudden sharp transition of exactly one sample in duration, asking the device to interpolate all the individual samples around that discontinuity with the filter function programmed into it. For a typical 8x oversampling DAC, that 44.1kHz is upsampled to 44.1 x 8 = 352.8kHz; or 8 intermediate samples are calculated for every single point. Realize that a "Dirac impulse" (the idealized single point spike) is not inherent in natural sounds. We do not get instantaneous transitions like this that suddenly start and stop the air waves in real life physical systems. Nor would single impulses like this sound any good anyhow! We can model it in a computer of course just like in electrical systems we can show true square waves even though in nature, ideal square waves of vertical slope do not exist either.

Suppose we start with the most basic DAC, one that does NOT offer an antialiasing filter. A system where that single impulse point is held over the sample duration. This results in a square waveform representing that impulse across the time of the single sample as shown in the image above. When we do this, our digital data gets converted to an analogue electrical output with all the aliasing of square waves - remember, an ideal square wave is a composite of all the odd-order harmonics ad infinitum. Instead of smooth sine waves, we see these blocky "digital" tracings and if we are to pass the "impulse" data through like this, the result is literally an unmodified square wave. This is what's called a "zero order hold" model of signal reconstruction; more commonly known in the audiophile world as the "non-oversampling" DAC, "NOS" DAC, and people like Audio Note might call it "1X oversampling".

When you see people show images of the squarish digital waveform like this image of a blocky sine wave:
Image from this Kickstarter project.
That's literally what NOS DAC output looks like. And these days, few DACs show disregard for aliasing artifacts like this anymore thankfully!

By turning off the digital filter on my TEAC UD-501 DAC, I can listen to this, measure it and demonstrate the effect of the lack of filtering.


Notice the "jaggy" unfiltered 1kHz sine wave at 16/44, with a rather "nice" looking impulse response measured without significant ringing (since this is an actual recording using a 24/192 ADC, note the "Gibbs Phenomenon" with the impulse waveform - see below).

But look at the "Digital Filter Composite" (again, thanks to Jürgen Reis for suggesting the use of this measurement method):


We see a terribly "dirty" result when examined in the frequency domain. Tons of noise beyond Nyquist (22.05kHz), plus the 19 and 20kHz sine waves are echoed across the spectrum. As much as some would want us to believe that time domain qualities are extremely important down to the ringing, remember that for human hearing, the frequency domain is no doubt essential to get right (the cochlea performs a type of FFT processing, and similarly this is how cochlear implants function to artificially aid in hearing when the natural cochlea fails).

Remember, digital audio is by definition bandwidth limited. That is, when we sample using a CD samplerate of 44.1kHz, reconstruction of the waveform is accurate based on Nyquist-Shannon theorem up to Fs/2, or the "Nyquist frequency" of 22.05kHz for the CD. When we reconstruct the output and do not bandwidth limit the signal, as in the case of these NOS DACs, notice all the harmonics and distortion products seeping through beyond 22.05kHz. The analog to this in the world of video and digital photography would be Moiré patterns either in the fine details or in the color banding of the image. We clearly recognize this as unwanted "detail" which was not found in the original image we captured.

So, how do we remove all that extra high frequency distortion? We use a filter of course! And in modern DAC's this is typically done with a digital oversampling process that interpolates the data so it doesn't look like these nasty square waveforms any more, but rather something approximating the sinusoidal physical air waves that we eventually hear, while suppressing frequencies not represented in the original digital signal as best we can.

Enter the Whittaker-Shannon interpolation formula - commonly known as the sinc filter. This is the mathematically "ideal" impulse response for a brick-wall low-pass filter. Behold... "Ringing":

A filter function that respects the bandwidth limited nature of the sampling theorem obviously means that the output waveform when faced with such an extreme input as the unnatural "impulse" should interpolate the signal with minimal seepage beyond the Nyquist frequency. You will see this ringing phenomenon wherever there are sudden transients containing constituent frequencies above Nyquist. For example square waves will show the "Gibbs Phenomenon" during the transitions:

Despite ringing in the time domain, when we examine the frequency domain, things look much nicer! Here then again is my TEAC UD-501, but with a sharp/steep 8X oversampling antialiasing filter turned on:



As you can see, sine waves are smoothed out and the frequency-domain FFT composite demonstrates the benefit of the filter - good suppression of high frequency aliasing; a relatively sharp "cliff" around 22.05kHz, and clean 19 & 20kHz signals with no high amplitude harmonics and intermodulation products. IMO, this is a much better result than a NOS DAC.

Which brings us to the main issue. Whereas frequency domain aliasing distortion and intermodulation distortion clearly can be audible (for an example of this, go download Monty's "Intermod Tests" and have a listen), just how audible is the impulse ringing which is unavoidable for a steep low-pass filter? Specifically, how audible is the pre-ringing (because post-ringing will likely be masked naturally by reverb trails)?

IMO, the audibility is minimal if at all. Here's why:
1. The ringing is typically at Nyquist. For CD samplerate, this is 22.05kHz folks. What human can hear a low amplitude pre-ringing coming about a millisecond before an impulse at this frequency? Remember that the amplitude of the ringing is correlated to the amplitude of the "impulse". When you see measurements of the impulse response, typically this is at 100% amplitude (like that +32767 above) so the ringing you see is really a "worst case scenario", not representative of actual music.

2. Microphones and ADCs are bandwidth limited devices. Most microphones have little frequency response above 20kHz anyway as discussed recently. Remember, as I noted above, square waves and certainly single sample impulse signals are not natural sonic phenomena. Furthermore, the analogue signal from the microphone will typically be filtered by the ADC's low-pass filter as well which we never talk or obsess about in the audiophile world!

You can in fact take some music you have and upsample it from 44kHz to 176.4kHz with a steep upsampler that demonstrates strong ringing with an impulse response. Have a look in an audio editor with the "Spectral Frequency Display" and see if you notice much ringing being added around the Nyquist frequency. I have done this many times and cannot recall ever having seen any strong ringing other than with artificial test signals.

3. Empirical evidence is lacking. Talk is cheap and testimony is legion, including folks like the fellow quoted above by Whackamus, from Bob Stuart, and audiophile folk heroes like John Swenson. There seems to be this belief out there that digital filters somehow play a huge role in the sound and that somehow it needs to be specially tuned by the "gurus". I suppose promoting this point of view allows manufacturers to differentiate themselves with their version of digital filtering and allows talk of fancy terminology like an FPGA programmed to perform the signal processing. Furthermore, these claims seem to be gobbled up by the mainstream audiophile media as some kind of massive step forward in digital audio design!

Seriously folks, many audiophiles feel that NOS DACs sound great to them, yet most digital audio is designed with relatively steep filters with ringing and generally people don't complain, how much difference is there really? I have never seen a purely subjective reviewer come out and say "Aha! I know this device used a steep filter and I hear ringing!" without them knowing what the impulse response for the device looked like a priori. The difference is clearly not very obvious.

You might recall that we looked at one part of the audibility question last year on this blog with a little blind test:
INTERNET BLIND TEST: Linear vs. Minimum Phase Upsampling Filters
Using naturally recorded music starting at 24/44, a comparison was made between two upsampling filters (interpolation to 176.4kHz) with impulse responses looking like this:

Guess what, as a group, there was no evidence in the blind test results that the 45 audiophiles who tried this test actually had a significant subjective preference for one or the other filter setting. You would think that the linear phase filter with the long pre-echo would be less desirable if the effects were all that big. (See the results beginning here: The Linear vs. Minimum Phase Upsampling Filters Test [Part I]: RESULTS.)

[Please folks, let's not bring up Meridian's AES 2014 paper: The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System which confounds all kinds of things like sub-optimal dithering and as far as I can tell, didn't convincingly prove what the title claims.]

Having said this, am I saying then that filtering settings are not important? Well, I guess that depends on how one defines "important". I do want the low-pass filtering because I believe clean frequency domain performance is important - NOS would not be my preference. A flat frequency response to 20kHz, reasonable suppression of aliasing, and maybe modest suppression of impulse response ringing IMO is good enough. Therefore I suspect the majority of typical settings used by DAC manufacturers would be fine if not indistinguishable.

Whether one hears it or not, as I suggested above, I think there's nothing wrong with achieving modest suppression of the ringing, especially the pre-ringing... It's a "perfectionist audio" argument rather than empirical claims of audibility I believe. What could be done? Here are a few options.

1. Go high-res. With 88.2kHz samplerate, Nyquist would be 44.1kHz, and ringing at that frequency would be way beyond the hearing ability of humans. Basically we've bought even more insurance in the event that in some situations the 22.05kHz ringing from a steep "brick wall" filter may seep into the audible range. Furthermore, it's unlikely many speakers would be able to reproduce this frequency without significant attenuation. Whether one uses a sharp digital filter or a weak one or even none at all will make little difference. Of course, not all albums currently are available in high-res (and sadly very few are deserving to be called high-resolution recordings). Note that this does not include albums that are just upsampled which applies the ringing of the algorithm used and may in fact be worse than your DAC's interpolation.

2. Use a minimum phase filter setting. Technically this isn't reducing ringing, just removing the pre-ringing component. Over the years, we've seen minimum phase settings be used in all kinds of devices from the iPhone 4/6, to the Samsung Galaxy Note 5, and even motherboards like the Gigabyte GA-Z170X-Gaming 7 a couple weeks back. Obviously even inexpensive devices can be programmed to do this. I've been using iZotope RX 5 these days as an easy tool to experiment and listen to different settings. Changing the "Pre-ringing" setting to 0 will result in a minimum phase filter.

iZotope RX 5 - Upsampling of 44kHz to 176.4kHz with linear phase interpolation.
iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with minimum phase interpolation, same steepness.
Notice that the pre-ringing energy has been transferred to the post-ringing amplitude and duration when using the minimum phase setting. Another compromise is that there is a phase shift in the frequency domain when using minimum phase settings (not shown, but you can see a graph of this in my previous post). Finally, we can appreciate also that more energy has been transferred to the post-ringing "side lobe", and the amplitude of the initial "main lobe" isn't as strong for the same filter steepness setting. I have not heard this talked about much; sure, perhaps masking with removal of the pre-ringing is a good thing, but there is more smearing of the energy across time with a strict minimum phase setting.

For the sake of completeness, there are "intermediate phase" settings you can use for filter design. We actually have see this type of setting used over the years in my hardware tests like the old WD TV Live! This can be demonstrated by using an intermediate setting in iZotope with the "Pre-ringing" set to 0.5:

Notice at this setting, we see very mild pre-ringing with most of the energy transferred to the post-ringing like with minimum phase though the amount of post-ringing energy isn't as strong if we were to quantify it.

3. Use a slow roll-off setting. Many DACs including my TEAC UD-501 has a slow roll-off filter setting these days. One can easily do this in iZotope RX 5 by changing the "Filter steepness" setting:
iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with steepness setting of "200". Lots of ringing.

iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with steepness setting of "10". Ringing obviously attenuated.
As you can see, a less strong, more gentle filter will allow some aliasing to pass through. However, clearly the ringing is less intense. This "Fourier transformation" correlation is important to keep in mind, lower ringing in the time domain implies a less steep filter and likely more aliasing artifacts in the frequency domain. This is why when you see a "nice" looking impulse response like that of the PonoPlayer or emm Labs DAC2X, the first thing you should also wonder about is "does this device have a weak antialiasing filter?"

Like many things in nature, the act of "beautifying" one characteristic will result in less ideal performance in another domain. It would be great if we could have a nice and clean sharp low-pass filter but this would be at the expense of time domain ringing and potential temporal smear demonstrated by the impulse response. Conversely, reduction of ringing in the time domain means the strength of the low-pass filter will be reduced and the ability to suppress aliasing will weaken.

Of course, there's nothing to stop us from combining points 2 and 3. For example, we can model what I found with the PonoPlayer with these settings:

Using my Focusrite Forte ADC, here are the actual measured impulse response and "digital filter composites" from the PonoPlayer compared to test tones played back using my TEAC UD-501 with 16/44 files upsampled to 24/176.4 using the filter settings in iZotope above:



Pretty close, right? In fact, I should have used an even weaker filter setting in iZotope to approximate the PonoPlayer. I think a steepness factor of 1.2 would be very close. It's of course unlikely that the "filter composite" image would look exactly the same... These are quite different DACs after all with analogue electronics different and the 64-bit iZotope RX calculations likely would be different from the mathematical precision in the PonoPlayer hardware.

There is an important point here though. If you know one of the transform pairs, like what the impulse response looks like, you'll be able to predict the frequency domain result. As you can see, it looks like Ayre used a very gentle minimum phase filter setting that allows significant amounts of frequencies >22.05kHz to pass through when playing 44.1kHz music. The designers obviously felt that this was a desirable balance for this device and the target audience.

Conclusions:
Go experiment. Have a listen to a NOS DAC or if your DAC allows the filter to be turned off, give that a try. Go try listening to various filter settings with SoX or even easier, iZotope RX with all these parameters to play with. Try some unsighted listening and see if you can consistently tell a difference. Try different types of music. For example, an aggressive, over-compressed "loud" mastering, with clipping may excite more ringing and aliasing distortions (but then this kind of music is inherently distorted anyway).

No matter how much we obsess over the design of these filters, realize that there are a multitude of other extremely important factors in ultimate sound quality. No matter how picky we become as consumers, there's nothing we can do about the production side. For example, what do we know about the quality of the ADC used to convert the original performance and the nature of the low pass filtering used (see this article on the use of analogue vs. digital filters before an ADC)? Even more importantly, the quality of the mastering job. We have already seen examples of suboptimal studio mixes, pseudo 24-bit audio, and music resellers providing nothing more than Loudness Wars "hi-res" files. Unless the DAC digital filter settings are truly atrocious, do we honestly think it would make much difference given all the factors outside of our control?

Let me know about your experiences when experimenting with digital filters. Do you think the difference in magnitude is worth exploring further? Also, let me know if you come across conclusions from actual listening tests where these filter settings were assessed in a controlled fashion.

Realize that back in 2006, before ringing was brought to the spotlight with Meridian and their "apodizing" filter setting or Ayre and their whitepaper around 2009, Stereophile had an interesting article on this already. Despite the main writer wringing his hands about the importance of these filters, notice that the editors admitted to not being able to hear much difference. I concur. Certainly if I were a manufacturer looking to squeeze everything out of a design, I might want to customize the filtering to taste based on the hardware and target audience. But as consumers listening to all sorts of music with variable quality out of our control, I'd be pretty happy with a typical linear antialiasing filter of moderate steepness.

For those who want to read more, consider this article in Secrets of Home Theater and High Fidelity:
Up-sampling, Aliasing, Filtering, and Ringing: A Clarification of Terminology

Notice the article above is focused on ringing in video (specifically 4K video and quality of upsampling like 1080P to 4K). Digital signal processing concepts of course apply to video as well as audio. One big difference with audio is that time only goes in one direction... You can get away with more post-ringing whereas in video, around sharp transitions, pre- and post- effects may both be very noticeable in the image.

For those who remember their maths, here's a YouTube video discussing "impulse response", "convolution", "Laplace transform", etc... Have fun!

 

Addendum:
A great resource to check out:
Infinite Wave SRC Comparisons
Nice interactive website to look at the various sample rate converters on the market. You can easily flip between frequency sweeps to look at aliasing, cleanliness of test signal, transition bands, and impulse response ringing.

-------------

To end off this post, let's talk about a couple of items in the blogosphere lately.

First, I find it rather odd that a digital audio site would post an article like this ("Sampling: What Nyquist Didn't Say, And What To Do About It"). As a general practical article on the limits of the sampling theorem, pragmatic questions including whether one needs a filter in some instances, and how to select them in real-life engineering applications (eg. digital sampling of EKGs...), this is a great article. But what does this tell us about practical implications in audio and how is this applicable to audibility of high-fidelity playback? Sure, filters need to be selected for the application and obviously for different purposes, one can and should understand the waveform being sampled. Furthermore, sampling rate obviously needs to be commensurate with the frequency of the event being recorded. But CD sampling rate was decreed as 44.1kHz, we generally know that humans can't hear above 20kHz (sampling rate 10% above that 20kHz audibility threshold), digital audio has had at least 3 decades to refine the sound quality including filters, and as discussed above, there are some reasonable compromises to keep in mind which can be understood without a PhD in theoretical physics. Without some useful conclusions in articles like this about high-fidelity audio when posted on an audio site targeted at non-technical audiences, a typical audiophile probably leaves scratching his/her head with more questions than answers thinking there's something terribly complex and mystical in all this. IMO, this is not the case and it does the hobby a disservice to promote unnecessary uncertainties typical of FUD.

Second is of course the recent bruhaha around the audibility of high resolution audio (Reiss' "A Meta-Analysis of High Resolution Audio Perception Evaluation" in the AES). That's nice. Does it mean that suddenly Neil Young's interviews with musicians in a car and seeing them "blown away" from the sound is now true? Should we now storm HDTracks/Pono/etc. to re-buy all our favourite albums in hi-res now that it's "official"? Should we now demand audio streaming sites to carry hi-res material and greatly anticipate Tidal's MQA stream?

Of course not! Mark Waldrep (aka Dr. AIX) has already reminded us that the vast majority of what's being peddled as "hi-res" isn't higher-than-CD resolution anyway. Remember folks, this paper is a meta-analytic compilation of 18 other research papers, most of which used experimental audio signals recorded in true high-resolution. We don't know how many of these are using actual music to test. Also have a look at Table 1 and see just how disparate the methodologies are and ponder as to whether many of these methods have bearing on listening and enjoying music! Even including papers where training was used, the composite score of "% correct" identification as summarized by the typical meta-analytic "forest plot" in Figure 2 was 52.3% out of 12,645 total trials (range of 50.6-54.0%)! (And this forest plot did not include the Meyer & Moran 2007 results which were summarized elsewhere in the paper.)

Seriously folks, if we're trying to decide whether a high-res album sounds different from a CD 16/44 (of the same mastering of course), it should not need a meta-analysis. As a consumer, I can go on HDTracks this morning and see that a 24/192 version of Eric Clapton's recent album I Still Do costs US$27.98. And the CD on Amazon is US$10.90. It looks like both the CD and download are from the same DR11 master. The question for me in considering the purchase is not whether they may sound different, but rather does this difference justify a 250% markup!? In this context, does a 52.3% accuracy rate in a research setting sound like a valuable proposition to grab the high-resolution version?

You know guys, the fact that we're even going through the contortions of complex statistical analysis after >15 years since the release of SACD and DVD-A clearly indicates that those who claim to hear "obvious" differences are plainly wrong. When a meta-analysis is used in science to gather data far and wide to find and declare statistical significance of this kind of tiny magnitude, it just means that the "signal to noise" ratio is poor and that the magnitude of the effect is obviously academic. The author stated just as much: "In summary, these results imply that, though the effect is perhaps small and difficult to detect, the perceived fidelity of an audio recording and playback chain is affected by operating beyond conventional consumer oriented levels." Notice the careful wording... In no way does it imply that these "small" and "difficult to detect" differences are necessarily "better" as audiophiles always desire to promote. I like this wording and think Dr. Reiss did a fantastic job putting this together. By the way, these results are of no surprise as we've been talking about this for years!

To me, if I were an investor in companies primarily targeting the "hi-res audio" segment after all these years, these results are actually not to be welcomed. High time to take more chips off the table hopefully with a profit because it's clear that when the market stabilizes, hype subsides, and value is priced in, the markups will have to be minimal.  Of course this doesn't mean music should not be produced in the best resolution possible (especially classical, jazz, and other acoustic genres). Just that the lack of value as currently priced is actually painfully clear.

----------

Have a great week everyone... Happy Canada Day. Happy Independence Day to the American friends!

It's summer and time to get into the great outdoors with the family. I've got some camping, trips to the tropics coming up, and planning to hit a few beaches along the way. Might not get a chance to post as much :-).

As always... Enjoy the music!

51 comments:

  1. According to Rob Watts, the designer of Chord DACs like Hugo and Mojo, has written on Head-Fi that an impulse input into a DAC is an "illegal" signal as it is not bandwidth limited. So, does it mean that to apply Shannon-Nyquist theorem, both input and output have to be band width limited (which in case of RedBook would be 22KHz) ?

    ReplyDelete
    Replies
    1. Hello Gurpreet,

      I believe you must be referring to this thread:
      http://www.head-fi.org/t/800264/watts-up/135

      I'm not sure if the word "illegal" is the proper description. However it is indeed true that an impulse is not "representative" of actual audio that would be produced by a proper ADC of a filtered analogue source.

      And that's perhaps something many audiophiles do not realize, thinking that the ringing when they see these impulse responses is somehow a common phenomenon from the DAC output. Ringing is not an issue when fed proper bandwidth limited audio as noted in the text.

      The value of an impulse response measure is that it gives us a peek at the function of the filter. And we can calculate the filter's output by convolving the input signal with this impulse response. The idea that an audiophile can look at the impulse response, become anxious about the ringing, and somehow automatically associating this as sonically "bad" would be inappropriate.

      Delete
  2. Interesting read. I seem to remember that higher resolution allows for less steep, more benign and hence less ringing filters? I believe these effects must be tiny in comparison with the speaker-in-a-room reproduction, and speaker element ringing, room reflections, standing waves etc. Interesting the comparison with video/visual material. MadVR in the moving image domain does massive oversampling afaik, and apply various type of filters, which can be turned on or off to see the effects of various combinations. HQPlayer seems to do something similar in the audio domain.

    As for Izotope software it would be fun if had a randomizer for different filters to allow some blind play back. Would be even more fun if I as a consumer could get the music files and open them up, and play with the settings to essentially remaster aspects of the sound on the fly, like it's done in the Izotope demo videos. Yeah, I know, that would be the last thing the record companies, and probably the artists would want. It would be fun, though.

    ReplyDelete
    Replies
    1. Hi Gadget,
      Thanks for the note. Yes. That's pretty well the gist of it when it comes to hi-res and filters. Record and play back in high-res means there's no need to filter out stuff just beyond the usual 20kHz hearing limit (of course most of us would be very lucky to have any hearing beyond 18kHz as adults). No need for a strong "brick wall" squarish cut off at Nyquist. No need for steep filter with long filter lengths and concomitant prolonged ringing in the time domain.

      Yup, in the video world, software like MadVR and TVs with interpolation DSP like upsampling to 120/240Hz refresh rate do a ton of processing. And certainly one can still see artifacts even in some of the best algorithms.

      HQPlayer is certainly interesting software. Objectively, by looking at the "digital filter composite" graphs, I can see the difference a good, high precision filter looks like compared to one where the mathematics gets overloaded with intersample peaks. I'm not sure I'm sold on huge audible benefits however...

      Well, if you want to try your hand at remastering, you could go get some free unmastered multitrack recordings here :-):
      http://www.cambridge-mt.com/ms-mtk.htm

      Delete
  3. Nice post as always.

    About the hi-res thing, i don't get it: if it is so much good than 44/16, why they have to "preach" so hard? Isn't the difference obvious?
    The way some sites put it seems that anyone can hear the difference and it's clear by now that only very few trained ears can distingish 44/16 from 96/24+ (by the way, i think we knew this already...).

    So... nothing really new in this "meta" paper.

    Best regards!

    ReplyDelete
    Replies
    1. Hello VK.

      Indeed, this is why I think it's *fantastic* that these sites seem to promote and accept the findings of the meta-analysis!

      Other than providing just a headline that it's now "proven" that high-res is audible, the typical news soundbite of the day, the data itself is very clear to those who are thoughtful enough to examine the meaning of what this is saying.

      High resolution is essentially irrelevant for consumer audio. This meta-analysis IMO represents potentially the "end of the debate". And the conclusion is clearly not rosy for the hi-res audio industry. Mindless websites might claim some kind of silly victory to the battle because it shows "statistical significance". But the war is lost.

      Delete
    2. Hello,

      yes 96 kHz or more is essentially irrelevant for consumer audio. But the industry misses an opportunity to distribute music e.g. in 24/48 container (FLAC downloads or DVD with FLAC files or DVD-A), which does not live on the edge like 16/44.1 does (while still being OK for consumer audio).

      In this sense war is not lost, if they concentrate on this and not dreaming about 192 kHz or DSD for home use.

      Delete
  4. On the meta-analysis paper: The Theiss/Hawksford study should have been eliminated, and the inclusion of it puts the rest of the sources in the study in question. I'm just going to repeat something from Reddit's /r/audiophile :

    First off, listening level isn't controlled for, so there is no guarantee that the difference isn't merely in identifying the differing noise floor.

    The bit and sample rate conversion is not controlled for. There are pretty huge variations in the performance of sample rate converters, as evidenced by these measurements [1] and without having characterized the performance of one, the paper is pretty much only testing the performance of the sample rate converter itself.

    Going further, the Hawksford study is not controlled for audible intermodulation artifacts - something I hardly even think people thought of in 1997. (This is also a general criticism of the entire meta analysis - systems trying to reproduce ultrasonics, without being capable can demonstrably yield audible, and measurable artifacts, and any study that doesn't control for it is basically completely invalid)

    (Yes, one can argue that explicit mention of listening levels shouldn't _need_ to be included, but it, along with intermodulation artifacts are two issues that are known to be error sources in ABX comparisons, but given that this study has results that are very far off all of the other studies, and the one studies that pulls the numbers into "significance" territory, it should be viewed much more rigorously)

    ReplyDelete
    Replies
    1. Great comment Arve.

      I must admit that I have not looked at this Theiss/Hawksford study which is of course rather old at this point from 1997. Indeed, the forest plot on the meta-analysis shows huge confidence interval. At least the weight was low at 0.71 for the final composite score.

      Delete
    2. The larger point here is that it puts into question all of the _other_ studies that were included.

      Not being an AES member, I've only read the aforementioned paper and Meyer/Moran, and neither studies control for IMD, but Meyer controls for SPL, and reaches the opposite conclusion.

      Given that those two are probably the most prominent studies, I don't think going in to the rest is all that worth it, but it's pointing at the fact that we need a much more stringent protocol for these studies, that all exclude technical error sources like this. (A study would also need to be a bit more explicit about dithering and filtering)

      Delete
  5. Only audiophiles could beat themselves up this way! The standard linear phase filter is the 'correct' filter. Any filter that moves the ringing to the post- side is changing the signal phase. Only if the ringing was at non-ultrasonic frequencies would any of this be an issue. It isn't, so it isn't.

    Linear phase speaker crossover filters are more interesting because, by listening to a single driver, the pre-ringing is audible, apparently. I say, "apparently", because a single driver already sounds mighty odd, anyway - the pre-ringing is just a small effect that, maybe, some people might enjoy listening for. The slopes used in speaker crossovers are very shallow in comparison to a "brick wall" so the ringing is low. But of course the whole idea is that, combined with the theoretical ringing of the neighbouring drivers which is in the opposite phase, the ringing sums to zero.

    A person could worry about these things, or they could simply accept that by embarking on the path of serious investigation and listening tests, they are going to ruin their enjoyment of their system, lose money, and possibly spoil their enjoyment of music altogether. Instead of buying several DACs, they could just buy some better speakers - which could indeed be audibly different, as opposed to the DACs which won't be.

    ReplyDelete
  6. I repost under correct account - Personally when I need it I use SoX with 95 passband, aliasing ON and linear phase. I know that the default is aliasing off but I prefer less ringing and some aliasing. Standard 44.1 kHz I do not resample (upsample) I leave it on DAC if it needs it. Maybe my preference for that setting is cause by lower ringing made possible through aliasing. As is written here http://src.infinitewave.ca/help.html "... It should be noted that the "ringing" of filters during SRC is mostly concentrated near Nyquist frequency because this range contains variations of the frequency response (here it is usually a range of 20-24 kHz). Even though it is in the ultrasonic range, there is some evidence that excessive ringing of an SRC filter negatively affects the overall sound, smearing the stereo image and reducing the clarity of bass ... " So some caution for ringing is good I think.

    ReplyDelete
  7. And one more addition - on cheaper onboard DACs like Realtek ALC892 it seems to me beneficial (subjectively) to enable SW upsampling of CD rate 44.1 to 96 kHz - probably because filtering is better at this rate. May be this can be tested in the future here also - I do not have neccessary tools ready.

    ReplyDelete
  8. P.S. I wonder how many people were confused as me when they tried to set their cheap onboard sound card to e.g. 96 kHz and obtained different (perceived as better) sound and attributed that to the sampling rate itself - thus started to prefer 96 kHz recordings just because of that experience. In my experience that is not an attribute of a recording but an attribute of some (cheaper but otherwise good) DACs that have worse filtering at 44.1 kHz, whereas the situation at 96 kHz is much better because even worse filter performs well at this rate since it is far from audible frequencies .....

    ReplyDelete
  9. Congrats for another excellent article, Archimago!

    I have, over time, and under the impression of the kind of misconceptions you are addressing, come to an even more pointed way of describing this impulse reconstruction stuff:

    Firstly, I believe now, it is necessary to clearly refute the diagram that shows the supposedly original impulse as rectangular, as plain wrong. You do that in spirit, and with the correct arguments, but I think the point needs to be made as forceful as possible, because it is the origin of the misconception. The digital data is merely a stream of numbers. A stream of numbers, where a single nonzero number is embedded in a stream of zeros, can be regarded as representing a pulse of some sort, but certainly not a rectangular one. Drawing it in this rectangular form is tacitly introducing a zero-order-hold function, and such a function does not produce a valid rendition of the signal. Full stop.

    This means, that if you want to represent the original data stream graphically, you would have to put dots on the diagram, not lines. It would be OK to draw lines from each dot to the horizontal axis, but that's as far as you can take it while still being valid. Lines to connect the dots with each other would already be an attempt at reconstruction, which goes beyond merely showing the "original" data.

    Secondly, every attempt at reconstructing a waveform from the data stream, which is what a DAC is supposed to do, is basically asking the question: What is the analog waveform, which would have produced this stream of data when fed into an (ideal) ADC. This is what reconstruction means: You want to reconstruct an analog waveform that you imagine has been present at an ADC's input. The digital data stream is, in this sense, not the original, it is an intermediate representation. You can of course produce a data stream artificially, with no ADC involved anywhere, but this data stream only gets its meaning when you associate it with an analog waveform that would have led to this data stream when fed to an ADC. It is this waveform which you are seeking to reconstruct.

    Now, we know that for this to work, the waveform must have been bandwidth limited to half the sampling frequency. Otherwise the signal representation would become ambiguous. In other words, the waveform that you imagine was fed to the ADC to produce our data stream, must have been bandwidth limited. Hence it can't have been a rectangular pulse, because that's not bandwidth limited. Try to come up with a waveform that is properly bandwidth limited, yet would produce the stream of numbers shown when fed to an ADC. This is the crucial question here: Which analog waveform would go through all the dots formed by the data stream, and at the same time be properly bandwidth limited? This is the correct reconstruction, and hence the waveform the DAC must produce.

    This is a difficult answer to figure out for many, since the waveform is shown in the time domain, while the bandwidth condition says something in the frequency domain, and you have to make the proper connection between the two. If you manage to work this out, you find that the only way to solve this is via a waveform that shows this apparent ringing. In other words, the DAC that rings is right, it comes at least close to the correct answer. If it produced anything that more resembled a rectangular pulse, it would be violating the preconditions of the whole system.

    It turns out there never was a rectangular pulse, it had been a chimera all along, a misinterpretation of what the data stream actually means.

    ReplyDelete
    Replies
    1. Thanks for the detailed comment Pelmazo. Absolutely agree especially about the point about the fact that the digital data being just a stream of numbers. This was of course my intent in starting with the impulse response listed as 0, 0, 0, +32767, 0, 0...

      We are such visual creatures!

      Delete
  10. Great Summary

    Hi Archimago.

    I have just found the time to read this blog and congratulation, this is a nice kind of summary about digital filters. Nice work.

    @Honza: If you up-sample form 44k1 to 96k, you need a digital filter set at 44k1 within this process, no matter what you are doing.

    Juergen

    ReplyDelete
    Replies
    1. Yes that is true but that filter is done by the used resampler not by the DAC which is the fed by 96k. Still on some cheaper DACs like ALC892 the result is (subjectively) better.

      Delete
  11. I am a big fan. But, I think there is still considerable confusion, misunderstanding, etc. on the hi rez issue. Your comments at the end of your post on the Reiss paper perhaps amplify that.

    It is helpful to first go through the Mark Waldrep comments you also cited. We have to take him with a grain of salt. But, I think it is clear that in today's marketplace, there is a lot of "fake" hi rez out there. Hi rez really has to start with the original recording. Analog or RBCD masters, though upsampled to hi rez specs, are fakes.

    To really get the advantages of hi rez, one must have it through the entire chain, from recording through playback. I believe many audiophiles, and even "scientific" testers, like Meyer&Moran, have missed this. They used quite a few analog remasterings in their testing. Many audiophiles, sometimes angrily, insist it is all BS, because there is no difference to their ears in listening to RBCD vs. what they understood to be "hi rez". Many download websites have unfortunately delivered fakes rather than the real deal.

    But, it is merely the GIGO concept. Years ago, a friend wanted to have all his videos transferred from VHS tape to DVD because, as we know, DVD had much better picture quality. I, with hesitance, had to tell him the quality would be no better.

    The point is many negative audiophile conclusions may have been reached based on an inferior sample of recordings where the hi rez was not truly hi rez. Pop music engineering practices are typically troublesome, because usually we have no idea exactly what went on in creating the master from which the playable recording was derived. I do not think a hi rez remastering of some oldie rock classic from analog tape, for example, will be at all revealing of what hi rez can do. Ditto for an RBCD resolution digital master.

    Me? I am predominantly a classical music listener to natively recorded hi rez in Mch, which is a more recent development over the last 15 years or so. To me the advantages are clear, though perhaps subtle. It is an improvement, not a gee whiz breakthrough. I say that based on my own testing of hi rez vs. RBCD in stereo. But, it is fairly clear consistently in comparisons of the RBCD layer vs. the DSD stereo layer, both from the same hi rez stereo master on a hybrid SACD. Volume matching can be a bit tricky for that test, however.

    But, though the question has largely been ignored due to the niche status of hi rez, I find considerable comfort in the Reiss paper that I am not just deluding myself about the potential sonic advantages of hi rez. Many other test subjects, particularly ones trained in what to listen for, in careful testing seem to hear the difference, too, depending on the quality of the experiment.

    But, it is all perceptual. We know in full detail what the measurements say about RBCD vs. hi rez. You gotta go with what you think sounds best. Papers are useful, but your own listening comparisons with reasonable controls are best.

    ReplyDelete
    Replies
    1. Your argument is just as tenuous here as it is on ASR. When hi-rez first appeared audiophiles and hi-end press *rarely if ever* claimed that a hi-rez-only chain was necessary to hear the supposedly obvious benefits. Plenty of 'hi rez remasterings from some oldie rock classic' were praised to the skies as proof that hi rez magic is real, and stunning. The new more stringent criteria only became 'crucial' after Meyer and Moran made hi rez advocates look silly. After that, even *SACD* has become suspect, which is frankly a hilarious example of goalpost-moving. And too, you're according to yourself the status of exemplary/trained listener , as audiophiles are wont to do, when Reiss's metaanalysis, at best, indicates that some sort of formal listener training is needed to really make this 'call'. Note too that Reiss can't tell from his metaanalysis *what* was being heard -- was it really the *intended* difference, or was it IM distortion, or what? Bottom line is, the rather minor 'effect' being claimed here, is no reason for particular listeners, either you, me, or Neil Young, to assume they are not deluding themselves when they perceive a sonic benefit *and conclude that it's due to hi rez*. The overwhelmingly likelihoood is , still, that what you are 'hearing', is what you want to hear.

      Delete
    2. @Fitz Have you tried listening to Monty's Intermod Tests as suggested by Archimago? What did you hear?

      Have you tried measuring the upper frequency response limit of your speakers, assuming that is what you are listening to? Most folks are surprised to find out that few speakers measure to 20 kHz, let alone exceed it. Forget the manufacturers “specs.” And even if there is output past 20 kHz., it is usually well down in level compared to 1 kHz for example. Note: requires a measurement microphone that is capable of measuring ultrasonic, along with the rest of the measurement chain.

      When was the last time anyone had their ears tested by an audiologist for high frequency hearing limit? Anyone, in the world, tested over 20 kHz? :-)

      A fun experiment I performed over 4 years ago was comparing 16/44 versus 24/192 on one of Barry Diament’s hi-rez recordings that has real measured ultrasonic output. Bottom line, I could measure a difference, but could not hear it. One can download and listen to the difference files from the link above. Very educational to hear first-hand.

      Having experimented extensively with linear phase FIR filters over the past 5 years, I have yet to hear any (pre)ringing artifacts from literally hundreds of custom designed filters.

      After spending 10 years as a professional recording/mixing engineer, with a bit of industry knowledge, I conclude that the only reason there is hi-rez is for the record industry/publishers to re-sell music catalogs. Not to be too cynical, but I don’t see how this has any sonic value for the consumer. For me, the quality of the musical performance, recording, mixing, and mastering far exceeds any so-called audible benefits of hi-rez recordings, regardless of the container format it is delivered in. I see MQA in the same category, i.e. another commerce platform with little to no sonic benefit to the end consumer. Wrt MQA’s deblurring filter, that is another topic for another day. Enjoy the music!


      Delete
    3. I agree with those opinion on hi-res like 192 kHz or more. Still it seems obvious that 16/44.1 is enough, but on the edge and the industry could offer the 24/48 container download/distribution in addition to a CD which seems to be pretty standardized physical media of distribution. I think that it would put and end to "vinyl revival", "hi-res crawling", "new encodings" and other endless discussions which are interesting but of little value in the longer perspective.

      Delete
    4. I don't think that Meyer/Moran missed this point at all. I think they are being unfairly criticised by Waldrep and others. They used the material that was presented to them as exemplary for high resolution audio by those who claimed to hear the difference. I don't think they could have done otherwise, because that would have drawn even more criticism.

      It follows that it wasn't them wo got it wrong, it was the audiophiles who claimed to hear the difference who got it wrong. Meyer/Moran wanted to show just that, and they succeeded. I'm confident that the result would be the same today, no matter whether you restrict the choice of material to "genuine" high res or whether you include examples of "faked" high res that audiophiles still claim to hear differences with.

      The fact that "fake" high res material is being offered for sale without people noticing should be a warning sign to everybody. If it were obvious to tell fake from real here, the bluff wouldn't work. Ironically, people have started to resort to using measurement results, predominantly spectrograms, to help them distinguish between fake and real. That's a tacit admission that the ear won't do, right?

      Yet, it doesn't help. If spectrograms are being used to check whether some material is really high res, the next trick for the fakers is obvious: Add some distortion to fill in some higher frequency components. That looks convincing enough in a spectrogram, yet it actually makes the sound worse rather than better.

      Mitchco is right IMHO regarding the motivation of the industry. If they wanted to deliver quality, they could do that within 44.1/16 no problem. There's no lack of examples, so it should be clear that lack of quality has nothing to do with the alleged restrictions of 44.1 kHz and/or 16 bit. And if that is so, HRA will not help.

      Delete
    5. The problem is that a lot of common DACs do not play 44.1 kHz as well as they could. I do not say it is not possible theoretically but they are not doing it. 48 kHz provides more filtering headroom and usually is in sync with the DAC main clock. 24/16 bit is another issue but as I wrote even if it is for sure possible to deliver quality at 16/44.1 it would be beneficial to offer to the customers also the 24/48 kHz container. It features seamless conversion from 24/96 if that is used for recording/mixing without dither and actually many DACs (sound cards) support 48 kHz better than 44.1. And the objections about high frequencies is long gone at 22-23 kHz that can be comfortably filtered at 48 kHz SR, whereas the 19-20 kHz fitlering of common DACs at 44.1 kHz always calls for questions abotu the missing frequencies (altough inaudible for most of us).

      Delete
    6. P.S. the issue with high frequencies (16-22 kHz or so) is not that we can hear them alone but they can "color" other sounds in the mix. We have to remember that music contains a lot of frequencies "at once".

      Delete
    7. Are you talking about intermodulation? Wouldn't this be a reason for omitting the high frequencies?

      Delete
    8. "The problem is that a lot of common DACs do not play 44.1 kHz as well as they could." Says who? Archimago on this very blog reports common products that do remarkably well.

      Delete
    9. For example very common realtek DACs play 44.1 rate OK, but their clock is rated 24.0 MHz which results in 48 kHz rate of operation. 44.1 is supported through adding zero samples into the stream (sequence 12-11-11-12-11-11-12-11-11-12-11-11-11, 147/160 ratio). Also the filter starts at 0.441*SR=approx. 19,5 kHz. That does not harm common playback but the performance at 48/96 kHz is better.

      Delete
    10. datasheet here http://www.hardwaresecrets.com/datasheets/ALC892-CG_DataSheet_1.3.pdf

      Delete
    11. The claims that Redbook's not good enough -- claims being made for decades now -- *are not* traditionally coming from people comparing PC motherboard DACs to hi-rez players. That's a comparatively recent iteration of the old claim. Name me some common mass-market CD player models whose DACs 'do not play 44.1 kHz as well as they could'.

      Nor have you demonstrated that a 'performance' issue like this -- a DAC that downconverts from 48kHz -- is really 'the problem' in practice -- i.e., that it makes a notable audible difference -- even it if were proved to be 'common'.

      Yours is the sort of 'perfectionist audio critique Archimago refers to. The idea that these 'performance issues', rather than simple human psychology, are behind the widespread belief that 'Redbook sounds worse', is silly.

      Delete
    12. If those DACs like the realtek ones filter 44.1 kHz starting at 19-20 kHz range and slow-roll off, then I think that they could play it better and they do that at 48 kHz, where the roll of starts at about 22 kHz. Also when the native clock rate is used it is better (although probably common CD players have 44.1 master clock since they do not have to support anything else).

      Delete
    13. And moreover, I am not perfectionist in the sense of striving for 192 kHz, DSD or other overkills for consumer use. But I think that e.g. 24/48 distribution format could be a welcome addition to CD (redbook) physical medium.

      Delete
    14. Great discussion guys!

      When looking at research, I think it's always good considering the question that they're trying to answer. I agree that the negative criticisms of Meyer & Moran are reasonable in that they did not analyze each DVD-A/SACD used to ensure an actual hi-res recording. But at the same time, that paper showed us that in the "naturalistic" setting with actual touted "hi-res" music, there was no clear benefit. That happened in 2007, before the days of modern downloaded hi-res. (Remember the list of suspicious SACD's: http://archimago.blogspot.com/2013/07/list-suspected-44-or-48khz-pcm.html)

      Yet the music industry did not care. They released the same old analogue remasters and same "standard resolution" recordings in big bit buckets, even to audacious 24/192 sizes. Companies like HDTracks on many releases didn't even bother if it was just an upsample. Many already knew about the fact that "Loudness Wars" dynamic compression destroys the benefit of the extra 8-bits dynamic range. Yet the "system" including audiophile magazines as a whole did not champion better standards for high-resolution releases (except for 'Hi-Fi News & Record Review' I think) and in fact parroted the nonsensical drivel from folks like Neil Young. All for the opportunity to sell yet another "better" version as Mitch said.

      And here we are. Hardware capable of excellent resolution without too much cost. Plenty of storage space for cheap. Slight supposed audible difference (not necessarily benefit) even in the lab as per Reiss.

      In my mind, if the music industry wants to come clean from this, they will need to accept that *there is no real benefit to hi-res*. Certainly *no* benefit for the vast majority of rock and pop recordings which is where most of the money is made. People will eventually *not* pay money for 24-bits and >44kHz. Already most people will pick 320kbps MP3 over lossless based on those Bandcamp numbers from AudioStream (no wonder!). IMO, it will end up where the hi-res "markup" will be minimal, and at some form of default high resolution (24/44, 24/48, 24/96...) depending on the original recording. Except for perfectionist audiophiles interested in high quality reproduction of acoustic recordings like classical, I think in a few years people just won't care what the exact bit-depth and samplerate will be, like most probably don't care if it's 256kbps AAC or 320kbps MP3.

      I honestly hope that the *mainstream press* do their job and advocate for the consumer instead of perpetuating the hype which is at this point as clear as day for many of us. Their legitimacy as having journalistic independence is very much at stake (not that this is anything new in the last few decades!).

      Delete
    15. THe negative criticism of M&M struck me as very much post-hoc goalpost shifting, because certainly the many rave reviews of the *sound* of DSD and hi rez PCM , specifically touting the formats as being responsible for the better sound, did not depend on the source of the recording. I was gobsmacked to see Reiss (2016) make uncritical reference to the even newer audiophile idea that SACD *itself* isn't hi rez because it "obscures frequency components above 20 kHz" (p 367).

      Delete
  12. True, Reiss does write "In summary, these results imply that, though the effect is perhaps small and difficult to detect, the perceived fidelity of an audio recording and playback chain is affected by operating beyond conventional consumer oriented levels." But in speaking to press, he said "Audio purists and industry should welcome these findings -- our study finds high resolution audio has a small but important advantage in its quality of reproduction over standard audio content." Which to my reading is a claim that goes well beyond what his data supports, and hints a a bias of his own. https://www.sciencedaily.com/releases/2016/06/160627214255.htm

    ReplyDelete
    Replies
    1. I don't think Reiss actually spoke to the press. He prepared a press release, which was being taken more or less verbatim by numerous science-related media. The press release was issued by his university, and by the AES, in very similar form. It is actually fairly rare for the AES to issue press releases on appearance of a research article, but in this case it may have helped that the author is the AES "Vice-Chair Publications".

      I agree that the press release and the paper proffer quite different messages, even though they come from the same horse's mouth. In a press release, you don't have peer review, hence you needn't pay attention to the reservations the reviewers might have. That might at least partly explain the discrepancy.

      At any rate this is more than a hint of Reiss' bias. He knows what the "industry" wants to hear, and is willing to provide it, even when that means putting a spin on his own research.

      Delete
    2. Yes, of course, we know that audiophiles have considerable bias in sighted listening. But, as is clear, they also have distinct biases in how they read the results of research papers and press releases. Forgive my own personal bias, but I am not seeing a gigantic inconsistency between the data presented in the paper, the conclusions drawn in the paper and the press release.

      Archimago is quite right that hi rez has been overhyped by the industry and audiophiles alike. But, so have a lot of other things, like vinyl and countless others. What else is new? And, if there ever were any perceptual studies, between vinyl and CD say, I can almost guarantee that those experiments would produce very noisy datasets requiring careful statistical analysis to reveal the "truth". That is the very nature of perceptual studies on human subjects, who vary from one another quite dramatically. And, people would nit pick and argue their interpretations of those studies as vindications of their own beliefs until the cows come home. As with this study, the arguments could go on forever.

      I realize this is uncomfortable for objectivist, measurement focused folks. I am on your side. But, it is a different ballgame when we involve the opinions or reactions of human beings. Because we are all different, it becomes more like the behavioral or social sciences, medicine, etc. So, searching for killer, absolute truth, as in measurements say, is replaced by statistical probabilities simply because people's reactions vary for reasons known or totally unknown. But, it is still science, just a very different form of science. And, you seldom get to the simple clear answer you often get with electrical measurements.

      If you need a human perceptual study to positively confirm your own beliefs beyond any doubt about its universal audibility in clear and unambiguous terms about any particular audio benefit, I think you may be waiting for a very long time. Had you needed that to switch from vinyl to CD, you would still be waiting.

      Delete
    3. Pelmazo, the press release *is quoting Reiss*: "Dr Joshua Reiss from QMUL's Centre for Digital Music in the School of Electronic Engineering and Computer Science said: "Audio purists and industry should welcome these findings -- our study finds high resolution audio has a small but important advantage in its quality of reproduction over standard audio content."

      Delete
    4. Love the perspective Fitzcaraldo2015... A true realist :-).

      I as well share that general sentiment. Research scientists don't get too many opportunities to have their 15 minutes of fame so to be quoted in various news sources and having one's name out there certainly promotes the career, the university, and the AES as contributing to academic progress... Good for them and the hours of labour he spent producing the research!

      But the data is out. And those who search will see the meaning in the text and the conclusions. Once the hype fades, the data IMO will speak for itself and I think audiophiles will look back and realize that the numbers and conclusion did not exactly help the "cause" for high-resolution. How this works through the human psyche over time will be interesting and worth observing as a psychological case study in the "madness of crowds" :-).

      Delete
  13. Before the introduction of the CD, there have been a number of perceptual studies. Hence I don't think you are right here. The discussion about which sample rate to choose, and which bit depth to implement, went on for several years in the advent of digital audio, and of course perceptual studies were done to help and underpin the decision.

    You can't, of course, expect that the results would be completely unanimous, hence at some point you have to jump and decide on the basis of the available information. And at that point in time, it seemed to be quite clear from the perceptual studies that were available, that a bandwidth up to 20 kHz was already generous and offered considerable margin. Similarly, 16 bits were also regarded as ample, which is perhaps illustrated by the fact that initially, PCM processors that allowed digital audio recording on analog consumer grade VTR boxes, considered 13 bits to be sufficient for a consumer grade format. It didn't prevent them from being hailed as a great step forward in fidelity, several years prior to the introduction of the CD.

    35 years on from there, we still have no convincing and clear evidence that the choice of wordlength and sampling rate made back then was inappropriate, and needs to be increased to provide appreciably better fidelity for the consumer.

    ReplyDelete
    Replies
    1. Yes it is true that in principle 16/44.1 is OK, also perceptually. But we have to remember tha Nyquist theorem is a theoretical principle, working with ideal filtering, devices etc. Yes it is possible to construct a playback device that works nearly perfect with 44.1 and some better CD players are an example of that. But in reality 48 or 96 kHz filtering often works better. So the primary question is not if 16/44.1 is OK but at wich frequency it should be played optimally. I think that is the culprit of many discussions about audio ale also the "ammunition" for audiophiles whose approach is overkill. One can hope that the industry would standardize some container like 24/48 as an addition (not replacement) to CD - this way we will be above the requirements in all directions and this could be the ultimate format for audio, without constant craving for "more".

      Delete
  14. My previous comment was supposed to be an answer to Fitzcaraldo215, but the indentation somehow didn't happen. Sorry for that.

    ReplyDelete
  15. Ane one more thing about perceptual testing. While it is very important for evaluation, we have to think carefully about it. Imagine e.g. if the CD standard would have been set at 15/42 kHz (not impossible). Would you think that perceptual tests would lead us to adopt e.g. 24/48 container when it became generally available ? I do not think so, we would have a lot of studies telling us that it is not statistically significant to discern 15/42 from 24/48 similarly to those that tell us the same with 16/44.1 vs 24/48, just in that case the margin is smaller. So, Perceptual tests are important but we have also to evaluate the technical parameters of a recording - if it records additional audio information or not, where "audio" means what can be heard by humans under imaginable and reasonable circumstances (e.g. 20-20 kHz and appropriate time slices) and provides undistorted record of what was originally performed.

    ReplyDelete
    Replies
    1. Right Honza.

      The choice of 44.1kHz by Sony (mainly) likely had to do with the specifications for analogue video as digital transport back in the day; a sample rate that could be handled by the tapes and reasonably divisable for both NTSC/PAL with different active lines and field rates (as per the wiki: https://en.wikipedia.org/wiki/44,100_Hz). It is a good thing that Sony won out and that we ended up with 16/44 rather than 14-bits which I think Philips was aiming for.

      Probably good that DAT went it's own way and just decided to adopt a more "rounded number" of 48kHz to remind us all that we're not locked into a base 44.1kHz idiosyncratic number.

      As with anything in nature, human hearing has a normal distribution so it's totally expected that any number we set with Nyquist up around 20kHz will lead to the consequences of it being "perfectly" good enough for human high-fidelity reproduction including high frequency tones anticipated at the volume levels used to listen to music.

      Delete
    2. The choice of 44.1 kHz precedes the CD by several years. It indeed was motivated by the attempt to record digital audio on analog video tapes. Some may remember that such systems appeared on the market in the second half of the 1970s. Sony was at the forefront, but not the only manufacturer involved. Systems for both the professional market and the high-end consumer market became available, and the first digital recordings that were to end up being distributed on CD were made with such systems. See https://en.wikipedia.org/wiki/PCM_adaptor

      Sony, for example, introduced their PCM-1 in 1977, a consumer-type PCM adapter for use with their Betamax recorders. It didn't use 16 bit wordlength, because the converter technology was not yet up to snuff in those years, and because the dynamic range afforded by 16-bit wasn't deemed necessary at that time. The decision to support 16 bits came only later during the development of the CD, and Sony pushed for 16 bits because they had by then developed a 16-bit converter. Philips only had a 14-bit converter at the time and feared they would suffer from a marketing disadvantage, but their engineers decided to use oversampling to effectively arrive at an equivalent solution. Had they not been able to convince their marketing, who knows whether whe would have got 16-bit wordlength on the CD.

      Realizing that standardisation of sampling rates would be necessary to enable digital interfacing, engineers from various companies met in the context of the AES conventions to try to come up with a common rate starting in 1977. The choice of 44.1 kHz is for example discussed in several papers in the JAES of April 1978. To be precise, it was 44.05594 kHz, because it was derived from the NTSC video clock. When using PAL, the resulting number was 44.1 kHz, which was close enough so that 44.1 kHz was standardised as a sampling rate that could be used with both types of video recorder, within an acceptable tolerance.

      It wouldn't matter much today, as we have good sampling rate converters both as computer algorithms and as chips, but remember that a digital sampling rate converter at the beginning of the 1980s was a large and expensive box. So large and expensive that most such conversions were done in practice by converting to analog and back.

      Hence, when the CD became reality, the sampling rate to use was already established by those PCM adapters. It was clear that a mastering system was needed to produce the material that was to be pressed onto the CD. The obvious choice was to use the PCM adapter and video recorder combination that was already there, instead of coming up with yet another system, and this meant that the sampling rates had to match.

      Delete
    3. Yes, OK. Good to have 48 kHz standard, since at this SR we are really very close if not at the peak of what can be digitally recorded properly. As I wrote I would be happy if th records are sold also (not only) in this container (24/48), of course with minimal price increase. This could put and end to endless hi-res crawling on one side and CD defending at the other side. Both is an effort that has little sense: hi-res e.g. 192 kHz or 32 bit does not bring anything to end user and CD is very good physical format for distribution and should not be replaced, but lives on the edge of what is possible.

      Delete
  16. Always amazing when 70+ year old hearing males using speakers with response that plummets above 20k and would explode way before >16bit dynamic range, can clearly hear the benefits of Hi-Re$ higher sample rates and greater word length. Must be that dreaded "time smear" that Reiss mistakenly called "unknown reasons" from the data mining.
    Now if they could just say exactly what specific track and what specifically to "listen" for, to hear these elusive benefits. But alas, they never do. Like the Hypersonic effect, you just feel better, for "unknown reasons".
    Well, like the magic cable guys would say, you must "experience it for yourself" (translation, buy it).
    Eyes open so can hear that time smear now.

    ReplyDelete
  17. Hi
    This is probably off topic but related? I'm trying to understand some of the concepts discussed here and finding your blog very helpful.
    I have a question (if you have the time or inclination!) -
    I digitize my Vinyl and use Sound Studio (a cheap but excellent App) on the Mac. Unlike any other software I know I can record upto 2.88Mhz (!) from a 192khz/32 feed. This is resampled to 960khz and saved. I then use iZotope RX to resample to 48k/32bit as it will accept upto c.960khz.
    The result is astonishing to my ears - better than a straight 192khz recording by far - it has a similar clarity to SACD (JRiver) conversions I've done to 384khz -> 48khz PCM.
    Am I doing something wrong?! I see I'm basically Upsampling on the fly,
    I'd be interested in your thoughts

    ReplyDelete
    Replies
    1. Hi Danzy,
      Interesting comment. Not sure I can answer fully since I don't have Sound Studio. What ADC are you using? A samplerate of 2.8MHz or so suggests you're recording at DSD64 format rather than PCM; basically 2.8MHz x 1bit.

      As you've already seen, even a pro package like iZotope doesn't like >960kHz samplerates in PCM, so 2.8MHz would be very odd for 16/24/32-bit PCM.

      Similar clarity to SACD you say? Maybe it is recording DSD like an SACD? :-)

      Delete
    2. Strangely it is PCM not DSD - Sound Studio (£30) is the only software I know that will go that high!
      However I've now discovered that EasyDSD will convert a 192k PCM file to DSD256 and then using Ponophile (from the same company) I can convert it back to 384k which is finally Resampled to 48k in iZotope RX. It is even better - the 2.88Mhz PCM Upsampling gave a kind of ringing(?) despite other benefits but this DSD conversion fixes that!
      Perhaps I should just get the new Korg DSD recorder aimed at Vinyl archiving...
      Thanks for your interest.

      Delete
  18. This comment has been removed by the author.

    ReplyDelete