Saturday, 2 February 2013

High Bitrate MP3 Internet Blind Test: Part 3 - DISCUSSION

Previous - Part II: Results

DISCUSSION:
So, what does this all mean?

Firstly, it's important to keep in mind the limitations of this survey. As an attempt to gather testers around the world, there are numerous uncontrolled variables including the varying degrees of technical savvy among users and competence in terms of maximizing the sound quality of their gear. Having said this, looking at the responses I got, I believe most respondents did give the test a fair trial and looking at the responses where equipment was listed, it's clear that the cohort doing this test is beyond the average consumer of audio electronics in terms of quality of hardware. For the most part, even those describing equipment used as <$100, the models chosen are generally highly regarded within the price bracket.

Despite the lack of control of equipment or listening methodology, this test is 'naturalistic' and captures the preference of the "audiophile" in his/her own room, and own equipment. Even if unfamiliar with the music, there's a familiarity with the sound of the gear and the room which one would expect should help with sound quality evaluation. Furthermore, plenty of time was afforded so there should have been no stress since this is not a time-limited task nor were the respondents forced to choose one or the other (as I said in the instructions, I was also interested in those who did not think they could hear a difference).

As I noted in the PROCEDURE page, the MP3 encoding is somewhat unorthodox in that the parameters used were chosen to mask certain anomalies easily detected in MP3 files sourced with standard settings. Nonetheless, I believe the resulting quality still reflects approximately the same lossy characteristic as a direct 320kbps encode. In fact, one might even suspect that these test files could actually be worse (from an accuracy perspective in comparison to the lossless source) because the audio was run through the psychoacoustic process twice (once at 400kbps, second time 350kbps), and in retaining the full 16/44 audio spectrum, significant portions of the bitrate were devoted to encode inaudible frequencies rather than more accurately represent the audible.

Reading the comments on the various message boards, I believe that I have been successful in maintaining the anonymity of the MP3 files. There was one board where someone commented on how the frequency spectrum appears unusual but was not able to identify which was MP3 sourced.

As for the test itself (a blind AB comparison) and the survey question "which Set sounds inferior", the respondent has to make 2 choices:
1. Is there a difference between the two Sets of audio? If not, the respondent can vote "no difference".
2. If there were a perceived difference, which is "inferior"?

For question 2 above, intellectually we can imagine that "lossy" compression implies the music has been altered such that the loss is somehow bad or a degradation in quality. Likewise, the general consensus in media (as per my links in Part 0) suggests MP3 should be "bad sounding". But isn't it also possible that running music through a psychoacoustic model may "clean up" the sound by retaining a focus on the most relevant signals? One might imagine that this might come across as a less noisy background or reduced ultrasonic intermodulation distortion since high frequencies are often filtered out. An alternate model like the ABX paradigm would have resolved these two concurrent decisions but ensuring the integrity of a blind test would be impossible.

Even based on the result from this admittedly small survey of 151 respondents, there was a significant preference for the sound of the MP3 Set (ie. most thought the lossless Set sounded "inferior"). The fact that a significant result was achieved suggests that high bitrate MP3 is NOT strictly "transparent" since this would imply exactly the same sound and presumably a random insignificant result. The fascinating suggestion from this dataset therefore is that in a blind test, most listeners would actually consider the MP3 tracks as sounding better! This pattern of preference surprisingly appeared EVEN STRONGER in those using more expensive equipment to evaluate. Furthermore, respondents who thought there was a greater difference in the more "noisy" and distorted track 'Keine Zeit' also showed an even stronger preference for the MP3 encoded version (some were very vocal in noting how "obvious" this was) even though from an objective perspective, this was the most difficult track for MP3 encoding.

As with any survey / study based on group results, even though the consensus points to one conclusion, this does not necessarily apply to everyone. To be clear, there were a few respondents who appeared very sure of their perception in the survey and proved to have been correct.

Going into this endeavor, I expressed that my reason to do this test was to find out whether MP3 encoding resulted in significant deterioration in sound quality. From what I can tell with 151 responses from around the world, a majority did not find a significant deterioration, and surprisingly most thought it sounded superior! Let me know if you've seen any other tests show such a bias.

Thanks again to all the respondents in contributing their time! :-)

Continue to - Part IV: Subjective Descriptions

4 comments:

  1. The following was a thoughtful response and criticism which I wanted to include on this page:
    ------------------------
    Originally Posted by azinck3
    Very interesting results. Thank you for the time and care you spent doing this.

    I am not a researcher, just an amateur like yourself , but I have a few methodological concerns:

    1) You bundled the files into two complete bundles (all mp3s were in group A, all lossless in B). In doing so it becomes much more difficult to draw any conclusions about the detectability/preference of mp3 encoding as it pertains to any one pair of files. It also allows people to focus intently only on the files they care to listen to, or those which they came across first, and then to make conclusions about the whole group. Additionally, it prevents you from changing the order of the files (selection of group A by most respondents may have been primacy effect, to some degree). Asking about confidence on a per-file basis as you did would seem to mitigate some of these concerns but I don't think it fully addresses the potential problems.

    2) There's no control group. This could have helped identify any possible the primacy effect, too (if you'd had each individual file pairing done independently it would have been possible to have two identical files as one of the pairs).

    3) Your main question "which set sounded inferior" had 3 answers: A, B, or "no audible difference". This, paired with the question about confidence does a decent job of answering the question "which sounds better" but I wonder if it does a good enough job of answering the question "is mp3 distinguishable from lossless". There could be a subset of people who had a hard time developing an opinion on which one sounded inferior, but an easy time distinguishing between the files. These people would not have wanted to answer "no audible difference" so may have taken a guess for the question "which set sounded inferior", but for "how difficult was it to come to your conclusion" they might have said "easy" (since it was easy for them to distinguish the files). Maybe I'm splitting hairs here; these are just some thoughts that came to mind while reading the results.

    4) I also wonder about the decision to use such an unorthodox mp3 encoding technique. I understand your rationale for doing so, but in the end it seems that your conclusion necessarily becomes "people tend to prefer this unusual method of audio processing over the original files". If you'd used a more typical encoding method then your conclusions could be more useful by applying more broadly to mp3s in the "real" world.


    These all sound like harsh criticisms. They're not. Your survey is, to me, exactly the kind of stuff audio publications should be doing. You clearly put a lot of though into this; it was a great read!

    ReplyDelete
    Replies
    1. Thanks azinck,
      I agree with your points and appreciate you putting them down. In preparation for the test, I wanted to make sure that it was first and foremost "doable" in the sense of being simple enough to perform and not onerous for those wanting to partake. I was already a bit concerned about the 75MB file size for example.

      1. Out of simplicity I bundled the songs together. I agree that people would likely only pick the ones they can bear to listen to! This was why 3 songs were provided spanning a few genres... I was also worried about confusion and error if I were to mix-and-match especially "out in the wild" where the respondents can become confused if they had to respond with something like "I liked Church_A, Time_B, KeineZeit_A".

      2. Don't know how I could have done this unless I provided at least 2 ZIP files; 1 being just MP3 A&B or lossless A&B to guage the serial position effect (primacy effect vs. recency effect). Again, we'd be looking at more complexity and the survey would have to allow people to identify which test they downloaded...

      3. & 4. Good points and I think the most important criticisms of the methodology. Interestingly, nobody said the test was "easy" but voted for "no difference" (good that didn't happen! :-). Yes the unorthodox MP3 encoding was what bothered me the most but I considered it a "necessary evil"! However, if it's that easy to create a preference for MP3, that's meaningful as well!

      Delete
  2. I think there was an interesting point that I heard years back - some music professor somewhere playing back CDs and MP3s and asking for the preference afterwards. Initially, the preference for CD played music was very clear, then as time passed, the MP3s became more preferred, assumedly because each incoming class was more and more accustomed to MP3-style distortions....

    It would be interesting if you had an age-range question in your survey. I myself played violin for 12 years, and since I stopped and began to predominantly listen to high-bitrate MP3 sources, I commonly find myself surprised when I listen to live music all over again...

    ReplyDelete