Comments to Ubuntu 10.04 Reads File Sizes Differently

I stumbled over Ubuntu 10.04 Reads File Sizes Differently and I have to correct some statements.

First I want to ask you, my blog reader, to read the units policy. Then think about it and read it again.

Now my criticism to the blog post:

  • We didn’t change the units policy. There were no such policy; we created one.
  • KB does not exist (in the SI or IEC standard). It’s either kB (meaning 1000 bytes) or KiB (meaning 1024 bytes). Did the author read the policy?

Now my clarifications to the commenter:

  • This policy was not Canonical’s decision. You have to blame me for creating the draft of the policy and the Technical Board for approving it.
  • This policy has nothing to do with Apple. I have never used a Mac and I don’t care what kind of byte prefixes Apple uses.
  • This policy is not connected to the decision to change the window buttons position of the default theme. This was done by different people. These two things are absolutely independent.

Correcting all applications to comply to the units policy is a goal for lucid+1 (Ubuntu 10.10). We are too late in the release cycle for the change in lucid (Ubuntu 10.04). My current plan is to create a library for inputing/outputting bytes to users. The user can then configure this library to display the units in base-2 (KiB), base-10 (kB), or the historical totally fucked-up format (KB).

Edit: My clarifications to the commenter apply to the commenter of Ubuntu implements units policy, will switch to base-10 units in future release too.

37 thoughts on “Comments to Ubuntu 10.04 Reads File Sizes Differently

  1. If you can show me a microbyte, then I’ll agree that kilobytes are at all related to SI units.

    • My hobby is to collecting microbytes (μB). I have now 62,500 microbytes. They can store a half bit together. It’s zero today. 😉

    • Fractional bits are actually very common in information theory. If I tell you the value of my six-sided-die roll, that gives you log2(6) = 2.585 bits of information. So that would be 2585 milibits or 2647 mibibits :). AFAIK a byte is just 8 bits.

  2. Well, one bit holds the same information as 0.125 bytes. That is, 1 bit is 125000 microbytes.

  3. Hello, I wrote the post 🙂

    The KB Vs. kB thing is a typo and one, for the vast majority of our readership, is likely superfluous.

    Although i did read the unit policy (and should have paid more attention re: the typo above, i will freely admit the unit policy was a bit dense to make sense of) the aim of the post on re: the unit policy wasn’t to delve into the semantic debates or reasoning’s leading to it but just present to the reader the “change” they will notice upon using Lucid: that something has changed; the way file sizes are displayed is different. Hence the use of the term ‘change’. It wasn’t meant in any negative sense. The term ‘has changed’ was also used in some of the bug reports we references, do I feel it’s likely a more pedantic issue than misleading.

    OMG! is aimed at end-users rather than more technical minded readers (though we have plenty of those too) and as such we always straddle a fine line when simplifying or boiling changes down into easy to understand digestible chunks.

    As for reader comments – those I have no control over!

  4. Ok, so its your fault not Canonicals. Fair enough. Still a really stupid decision.

    These days it seems Ubuntu is dying by a death of a thousand paper cuts(quite ironic) bought on by a bunch of hubris circle jerks employed by Canonical.

  5. ScottK:
    None of the x10^(n) where n is a negative number prefixes are used because that’d be a bit silly. Are you maybe mixing up kilo- and milli-?

  6. @Mackenzie: Which is why pretending that the kilobyte is an SI unit is ridiculous. If people are so anxious for consistency they should have made a byte 10 bits while they were at it.

  7. Nice job , I have seen in some programs they define MB and so on … using standards make it better…

  8. Thanks for posting this, and for writing this policy. At first, I was quite afraid of where it would lead, but I was soon convinced of the utility of it (as long as it only targets UI and not CLI).

    I actually just asked two people who are average computer users how many kilobytes you can put in a megabyte. One answered 1000, and other said she didn’t care. These are people who use computers daily, so that just convinces me that it won’t hurt most users.

    Additionally, I added a comment on the wiki page about command-line tools, as I would love to have proper CLI options to specify the output metric (without changing the default of each command, quite obviously).

  9. I like the kB/MB convention better because it’s a better estimate for humans at how many bytes are actually in a file. The error when truncating to KiB/MiB/GiB/TiB grows larger as the units grow. You won’t have this with kB/MB/GB/TB.

    1 TB = 1 000 000 000 000 B. Blam. EASY.
    1 TiB = 1 099 511 627 776 B?!

    It’s just easier, at a glance, to compare sizes that cross the unit barriers, e.g.:

    4 517 781 kB? 4.518 GB. BLAM. No calculations needed.

    4 517 781 KiB? Uhh.. let’s see here let me just get out gcalctool… Right, “4517781/(1024^2)” aaand we have 4.308 GiB. 😐 Highly dissatisfying.

    Notice how 4.5 compares to 4.3? That “error”, as I incorrectly refer to it, will just keep growing the bigger the difference between the units you convert between. Although I have to admit there is a charm in counting using base-2, when 1 TiB is expressible as just 2^40. Then again 1 TB is just 10¹².

  10. Also, on the UnitsPolicy page, we can see this:

    “Correct basis
    Use base-10 for:

    * network bandwidth (for example, 6 MBit/s or 50 kB/s)
    * disk sizes (for example, 500 GB hard drive or 4.7 GB DVD)”

    I thought MBit/s is usually written Mbit/s? Or Mbps, sometimes. Since we’re picking apart kB versus KB, we should talk about Bit and bit also, I reckon. 🙂

    • The MBit/s was a typo. I corrected it to Mbit/s. Thanks for pointing that out.

      A bit is either called bit or b. Transfer rate can be displayed in bit/s, b/s, bps, or B/s. It’s a good idea to add a recommended way to the policy. IMHO we should use B/s in most cases. If you need bits, use bit/s. It’s clearer than bps and cannot cause confusion like b/s.

  11. I thought it was pretty simple, really:

    KiB is unambiguous. It means 2^10 bytes.

    kB/KB/kilobyte are ALL ambiguous; they have all been widely used to mean both 2^10 bytes and 10^3 bytes.

    Certainly, bits and bytes are not SI units—they don’t belong there, either; SI is for units whose definitions are defined and continually refined in terms of things we can observe in the natural world.

    But people _do_ commonly use the SI prefixes for non-SI units, even in the sciences; it’s surely can’t unspeakably evil to do so.

    Now we need to decide on a convention for how to treat kB. We have KiB for 2^10 bytes. We don’t need another one for that. Using kB to mean 3^10 bytes keeps the prefix consistent to what “k” means almost everywhere else, and gives us a unit to neatly describe a whole bunch of bit quantities in common use that already do use multiples of 10.

    So since we have to somewhat arbitrarily assign a meaning to kB (for this particular context, given that it is in general meaningless), why not use the one that is the most useful?

    (Aside: the concepts of _portions of bits_, e.g. the microbyte mentioned above, are not generally referred to in these terms, but are most certainly used in information theory.)

    (Aside: we should be using KiB wherever possible, since it’s the only option with a universally unambiguous meaning. If we’re not, it’s only because people will say “huh?” when they see it, even if they wouldn’t notice the difference between the different definitions of “kB”. At very least, an option to make all units displayed with binary prefixes is a must; it’s the only future; everything else is icky and ambiguous!)

  12. Hey, just wanted to say I suppose your effort to try to clean things up and bring some form of standards to the desktop. I realize there’s alot of people complaining about every little improvement that’s done but in the long run they’re well worth changing.

  13. Every change brings some degree of dissatisfaction and need of adaption. What must be done is to make a decision as to whether the short-term pain outweighs the long-term gain.

    Having been bugged by the “prefix confusion” for close to twenty years, I am very much in favour of the recent trend towards use of distinct names for the binary and decimal prefixes. My regret is obviously that this change did not come sooner, which would both have made life simpler at an earlier time and the transition easier.

    Where to best use what kind of unit is another matter—but a very secondary one.

    As for any relation to SI: It is not in anyway relevant. What matters is that certain prefixes have certain established meanings. Notably, most people (at least in the western world) will have a clear understanding of kilometer = 1000 meter and kilogram = 1000 gram. Using them in alternate meanings is asking for (and, in this case, receiving) trouble.

  14. I appreciate your work towards a more standardized way of using units of measurement in Ubuntu.
    Old habbits persist long. I remember the introduction of “MHz” in American journals and the fuss of irrational comments this introduced at those times.

  15. You know, having half of one thing and half of another is worse than just having the wrong thing everywhere.

    I really like that it will be easier to understand once it is done but I simply cannot believe that you would put something half-baked into an LTS.

  16. The problem is not to choose betwenn Kio or Ko but it is that the operating system are using the good unit at the good moment.

    For example your RAM had to been sized and displayed in Kio because RAM memory is manufactured by Powers of two.
    At thé oposite, your hard drive is not.

    You can use as well Kio or Ko but you had just to know what you are speaking about

  17. Pingback: BlogoFlux – Latest news on Gadgets, Internet, Applications & Hardware » Blog Archive » Ubuntu implements units policy, will switch to base-10 units in future release

  18. RE: bps vs. bit/s vs. whatever else, you left out baud. 🙂

    And KB does exist. It is from the JEDEC standard. However, since nobody makes terabyte semiconductor chips yet, the JEDEC standard only specifies KB, MB, and GB for sizes of 2^n in. When was the last time you bought RAM that was base 10? Bytes are not base 10, they are base 2. How about a giant class action suit against storage manufacturers instead? Why aren’t they changing to display the correct values on their packaging and devices, and in advertisements? How does perpetuating the lies help users?

  19. I can’t stand the Language Police telling everyone to stop using ‘kilobyte’ one way and to start using it another way. It doesn’t work and it just creates confusion and havoc. If they wanted to measure bytes in base 10 (in a horribly inconsistent manner), they should have created a new unit for it, instead of pretending the standard unit was morally wrong because it’s “inconsistent”. Despite what is done in France, language cannot be dictated.

    Anyway…what I would like to see is fully-decimal measurements everywhere. That means using bits, not bytes. To human beings, the quantity of ‘8 bits’ is meaningless and confusing. A bit, on the other hand, makes sense: it’s the smallest unit of information. Wouldn’t life be much simpler if information were always measured in base 10 fully?

    • You may, as I read you, be missing the point: It is not a question of forcing a change of consistent use. The main problem is not that e.g. the “k” prefix has another meaning than elsewhere, but that its use even wrt computers was/is highly inconsistent and confusing. This change is better likened to e.g. enforcing a consistent terminology with regard to various kinds of gallons.

      (As an aside: There are large practical advantage with having numbers divisible by factors of 2, and having a unit like the byte, or otherwise counting many things by 2s, still makes sense.)

      • Human beings are taught to think in base 10. The confusion over the meaning of ‘kilobyte’ didn’t occur until computing became mainstream and some standards organization thought that they had some sort of Divine Right to change the meaning from the accepted usage in the field of computing. So, yes, it is about forcing a change.

        And, now, because of that arrogance, people are confused, and consistency is badly needed. The only way to do that is to avoid ambiguity. This means using either base 10 wholeheartedly or base 2 wholeheartedly. The Frankenstein mixture of the two that I have never seen outside of hard drive marketing should just go away forever.

        As I said, humans think in base 10. A fully base-10 system would be far less confusing for people (just like the metric system is far better than the English system). If the binary system makes more sense somewhere and it even makes sense to expose this fact to normal human beings, use the binary system, not the evil hybrid. I suspect that most people (though not I) would simply not care about the fact that RAM is designed in such a way that it comes only in powers of 2.

  20. @David

    I do not quite follow you, and in as far as I do, I largely disagree:

    I do not know how old the confusion is, but it must be at least several decades, and the actions of the standards organization are meant to remove this already existing confusion—it is not creating a new confusion.

    This is what a standards organization does: It suggests standards to remove confusion, make interoperation easier, whatnot. The rest of the world can choose to adopt these standars or not. There is no element of force in the sense of e.g. “Use this standard or we hit you on the head with a keyboard.” (but may be in the less drastic sense “driving for a change”).

    There is nothing wrong with using several bases as long as they are clearly distinguishable (and just the lack of distinction). If one is inferior people (outside the US, ahem) will eventually drop it. Notably the difference between these prefixes is mostly uninteresting to the man on the street, who can work by a good heuristic of 1 k ~ 1Ki, etc.

    Further, even if one type of unit is eventually used exclusively with the public, there is no reason for the specialists to forego other units. Consider e.g. measures like the Planck length, the AU, or the parsec among physicists.

    • SI and other standards organizations not only defined the ‘kibi-’ prefix, but they define the meaning of ‘kilo-’ in the context of bytes. This does not remove any confusion at all, but forces the issue. The far more common usage of ‘kilobyte’ has been (and still continues to be) ‘1024 bytes’, while these organizations felt it their duty to declare that, from now on, ‘kilobyte’ should mean ‘1000 bytes’. That’s enforcing a change in common usage—Language Police.

      Their “standards” are what has made this confusion so bad today, since people thought they should start following them, leading to an even worse situation where the common usage is not as reliable. In the end, it will never ever be the case that everyone uses ‘kilobyte’ to mean ‘1000 bytes’. The ambiguity will always be there, thanks to “standards”.

      Bytes, as opposed to bits, are useful in binary contexts, such as RAM and hard-drive platters. In those contexts, the fully-binary system makes sense, while the hybrid system these organizations are trying to enforce makes no sense whatsoever.

      Can you think of one context where using decimal values of bytes (rather than binary values, or decimal values of bits) makes sense, where it conveys some meaningful information? I cannot, because it makes no sense, and it never did. That’s why the only “decades-old” confusion you think existed was for users who saw these technical terms and got confused. And users have a right to get confused—not because of the ambiguous ‘kilo-’ but because we’re talking about binary values. That’s why I think it makes more sense to go all-decimal for users and just measure bits.

      Anyway, personally, I would love to see sizes displayed as, for example, ‘3.14 Mb (383.5 KiB)’. No ambiguous terms, normal metric values given primacy, so they can be read by actual human beings and not just by us geeks. And it would be a great way to transition everyone to a more humane system.

  21. Amazing how so many seem to miss the point. This process shouldn’t be about does your favourite program use KB or KiB as a unit nor about how many bytes does 12091832 GB at the moment mean. For a regular user the most important point of this policy is that (in future) when a program shows a value of 53274 KiB or 1328 kB the user will finally know exactly how many bytes it actually is.

  22. Pingback: OurLife » Ubuntu units policy

  23. Pingback: EnNegrita » Ubuntu 10.10

  24. Pingback: Ubuntu 10.10 medirá la información en kibibytes (cuando sea necesario) | AlfaLibre

  25. Pingback: Ubuntu 10.10 medirá la información en kibibytes (cuando sea necesario) « Swichers Linux

  26. Pingback: Ubuntu 10.10 medirá la información en kibibytes (cuando sea necesario) | Todos Geek

Comments are closed.