Saturday, 2 February 2013

High Bitrate MP3 Internet Blind Test: Part 2 - RESULTS

Previous - Part 1: Procedure

RESULTS:
The final tally for respondents is 151. Here's the updated map of where the responses came from:
As I mentioned in "Part 0", the majority of responses were from North America (64), followed by Europe (47), Asia (33), Australia & New Zealand (4), finally South America (3). It looks like freeonlinesurveys.com may actually not be completely accurate since at least one person indicated they were in Russia which was not highlighted on the map!


First, lets have a look at the demographic that responded to this test in terms of equipment used:


As you can see, the price range (asked to specify in $USD) of the audio gear and system setup varied greatly.  A large proportion of respondents used headphones for the test (23%) which I suspect is reasonable especially given the computer-audio nature. I suspect many of us consider the headphones plugged into the computer/DAC to be superior to whatever speakers may be on the desk. 25% responded to the optional field and actually listed the gear used (thanks!). Depending on the price range, scanning the responses I see a huge range of headphones tested (Beyerdynamic DT990, 880 & DT770 seems popular, a few AudioTechnica M50's & AD700, Sony V6, Creative Aurvana, Senn HD800/650/600/570, Bose QuietComforts, AKG K701, Shure SRH440, Hifiman Re-0, Ultimate Ears Triple-Fi 10, Superlux 668B, Fostex). Likewise a full range of speakers like Martin Logans, PBN Montana, Decware MG944, a couple Magnepans, B&W 802D's, KEF iQ1, Sapphire ST2). It's notable that some folks used a combination of headphones and speakers. Some DIY guys also got involved with their own DAC's - one respondent specified a homemade Sabre DAC. Network streamers were mainly Squeezebox Touch models sent to outboard DAC's, one person listed the Naim NDX. As for DAC's, I see everything from a dCS setup to DragonFly to Mytek to Xonar Essence ST / D2 / One's to Meridians... Looking at the detailed responses, I think I can honestly say that respondents took the test seriously, some describing their test procedure and running foobar2000 ABX tester themselves.

As for which song was felt to be easiest to differentiate between MP3 and lossless:
"Time" was the winner followed by "Church". Interesting given that from Part 1, we can say with some objectivity that it's actually "Keine Zeit" which shows the greatest variance in comparison to the original lossless audio. For most of us, familiarity is important and I think for the demographic, "Time" and "Church" would likely be most accessible (like I said, I had complaints about putting the metal track in the test!). Some respondents would have preferred a classical track as well. I agree this also would have been revealing but it's always a compromise trying to keep the test simple and download size reasonable.

Perhaps not unexpected, most respondents had to work hard or felt it was impossible to tell the difference between the Sets (total 50.7% for these 2 groups). Interesting that almost 1/4 (21%) thought the test was "easy" - it'll be interesting to see later if this confidence leads to accurate identification!

Finally, what you've all been waiting for:

WOW! Remember that Set B was the MP3, yet for those who picked A or B, most thought A sounded inferior! Looking at just the ones who selected A or B, assuming a 50% chance of success in a "guess", the fact that only 45 respondents got the answer correct out of 123 is statistically significant with a probability <1%.

Lets have a look at those who were confident and said this test was easy:

As you can see, despite the confidence, most of the respondents thought that Set A (the original lossless audio) sounded worse than Set B (MP3).

How about those with more expensive equipment vs. less expensive?


For those who used equipment $6000 and above, we see a similar distribution of preference for Set A, but look at what happened to the proportion for those using less expensive equipment. It appears that those using <$500 actually showed a more balanced preference of A and B - it seems like the participants with more expensive equipment preferred the lossy tracks.


Looking at the larger groups, it was interesting to see that those who used speakers (either floorstanders or bookshelves) seem to prefer Set A more than headphone users (likely not significant but interesting observation):



As for the songs themselves, the song "Keine Zeit" where the lossy file measured with the most variance compared to the original lossless file (ie. the song most difficult to encode resulting in the most error), was the one where most preferred the sound of the MP3!
In contrast, the other 2 songs were slightly more balanced. Note though that since songs were grouped as "sets", these results are obviously not independent of each other.


Surprised by the results? I sure was!

Continue to - Part III: Discussion

14 comments:

  1. Can you help me recover my own results? I lost mine in a HDD crash. I used an LJM DAC. Thanks!

    ReplyDelete
  2. Sure wwenze - I believe you responded with:
    "Musiland 01 USD -> LJM CS4398..." right?

    Not sure how I can PM you... I'm looking around and don't see any obvious way to do it.

    ReplyDelete
    Replies
    1. Send to wwenze@lycos.com ? Thanks.

      Delete
    2. Sorry for the delay... Just E-mailed you...

      Delete
  3. Facinating! thanks for doing this. I too am surprised that lossy was preferred to lossless. but while 151 may seem a small sample I suspect you are on to something here and that the "differences" most people here are psych-influenced. T

    ReplyDelete
  4. The problem is that folk who are only exposed to lossy have that as an audio frame of reference. I would have like to see the results from a group of audiophiles who listen to live natural music - classical or acoustic - and listened to recordings of acoustic instruments in natural space. Perhaps some David Grisman.

    It's hard to believe that MP3s were preferred to lossless files. I'd hypothesize that the brightness and glare of MP3s are preferable to most people - hence the atrocious production on discs like Springsteen's Wrecking Ball. The compressed, bright sound is made for MP3s, not audiophile equipment which should, and will, reveal all of its deficiencies.

    ReplyDelete
    Replies
    1. Yes, selecting for audiophiles with specific interest in acoustic instruments and experience with live natural music would be interesting.

      Alas, this is done through the internet so many variables just cannot be controlled for. Validation with further tests would be important.

      Delete
  5. I didnt see the poll until it was closed, but I chose A for lossless. Honestly, I wonder if people chose A because they thought they were picking which sounded BETTER; this is the most common way tests are done and this is how I wrote down my results despite reading the "rules".

    ReplyDelete
    Replies
    1. You'd also be surprised how many people don't understand the word inferior >.>

      Delete
  6. Were the tracks volume normalized ? People are very prone to mistakenly belive that louder sounds " better"

    ReplyDelete
    Replies
    1. yes, it says so in the article.

      Delete
  7. This comment has been removed by the author.

    ReplyDelete
  8. I'm sorry, as much as I'd like to take any conclusion from this study, it has one MAJOR flaw: the way the question was framed. You asked people to rate which one was INFERIOR. However, people are used to marking answers in terms of what they think is BEST. So I'm sure a significant portion of the users completely skipped over the INFERIOR part of the question and just answered based on which sounds better, or even if they read it as inferior at first, they forgot it later.
    This effect would have been easy to verify if you had repeated the question, this time asking which one sounds SUPERIOR. Obviously, the person would have to mark the opposite answer of what they did for the INFERIOR question. If they didn't, you could safely disregard their answers on the basis of the user not giving the due care to their answers and/or not understanding the vocabulary.

    ReplyDelete