• Sotuanduso@lemm.ee
    link
    fedilink
    English
    arrow-up
    16
    ·
    1 year ago

    One letter per bit? You’d need some crazy effective compression algorithm for that, because a bit is 1 or 0. Did you mean byte?

    • AdrianTheFrog@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      ·
      edit-2
      1 year ago

      UTF-8 and ASCII are normally already 1 character per byte. With great file compression, you could probably reach 2 characters per byte, or one every 4 bits. One character every bit is probably impossible. Maybe with some sort of AI file compression, using an AI’s knowledge of the English language to predict the message.

      Edit: Wow, apparently that already exists, and it can achieve even higher of a compression ratio, almost 10:1! (with 1gb of UTF-8 (8 bit) text from Wikipedia) bellard.org/nncp/

      If an average book has 70k 5 character words, this could compress it to around 303 kb, meaning you could fit 1.6 million books in 64 gb.

      You can get a 2tb ssd for around $70. With this compression scheme you could fit 52 million books on it.

      I’m not sure if I’ve interpreted the speed data right, but It looks like it would take around a minute to decode each book on a 3090. It would take about a year to encode all of the books on the 2tb ssd if you used 50 a100s (~$9000 each). You could also use 100 3090s to achieve around the same speed (~$1000 each)

      52 million books is around the number of books written in the past 20 years, worldwide. All stored for $70 (+$100k of graphics cards)

      • Sotuanduso@lemm.ee
        link
        fedilink
        English
        arrow-up
        11
        ·
        1 year ago

        There’s something comical about the low low price of $70 (+$100k of graphics cards) still leaving out the year of time it will take.

        • Cicraft@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          Well I guess you could sacrifice a portion for an index system and just decode the one you’re trying to read