DNA: Storing the Internet in 1 Kilogram

DNA holds the information required to build a human from scratch. So why can't it also hold our images, PDFs, and Netflix downloads?

Toni Witt

September 5, 2021

Using DNA to store information isn't a new idea.

The cells in our bodies can each store about 750 megabytes, which is one or two episodes of your favorite Netflix show, and the average human has about 37 trillion cells. That means our bodies are lugging around nearly 28,000 exabytes of data. To give some scope, the networking hardware company Cisco estimated that only 2000 exabytes were transferred through the internet globally in all of 2019.

Using DNA for our personal data storage is not a new idea, either. Using molecules to store information was proposed in the 60s and researchers worldwide have been working on it since. So how come we're still stuck with hard drives, USB sticks, and discs?

Quick rundown of how it works:

Your data is a sequence of 1s and 0s. Convert that list into a sequence of base pairs, the fundamental units of information in DNA, using a key. Then, create that sequence of base pairs using DNA synthesizing machines you find in bio labs. You then have a vile full of pure synthetic DNA in liquid form (it can also be stored in living bacteria). Using more fancy lab equipment you can read the DNA the same way 23andMe reads your DNA if you send them your saliva and if whoever reads it also knows the key used to convert the 1s and 0s into base pairs, they can use it to extract the original information.

The Upsides

Storage Density

In 1956, IBM released the RAMAC 305, the world's first hard drive. It cost thousands of dollars, weighed over a ton, and was the size of a few refrigerators pushed together. But it only held around 5 megabytes - maybe 10-15 standard quality images. Nowadays you can buy a portable hard drive for under a hundred dollars that can easily store 1 million megabytes, or a terabyte.

Like Moore's law for transistors and computing power, this difference is a beautiful example of exponentiality. But can data storage continue this trend, compressing already micro scale systems by another factor or 10 or 100?

In order to do so, we might have to move away from traditional media like magnetic fields in a hard drive to genetic material in the same way we're moving from silicon chips to quantum based chips. The exact amount of data stored in the world isn't really known, but that's not important. But most sources give numbers which confirm that DNA can store so much information in so little space that theoretically you could hold the amount of pure synthetic DNA that contains all of the world's data in just one palm. Forget enormous data centers costing billions of dollars of construction and maintenance. This is promising for archival info storage or when enormous volumes of data are required in little spaces like vehicles, spacecraft, or even robots.

Nature Vol.537 p24 2016

It makes a lot of sense that the storage density of information in DNA is so good. It's why enormous trees can grow from a single seed, why an entire human can start developing from seemingly nothing.

Low maintenance and reliable

Unlike current data centers, which consume a large portion of the world's energy supply, DNA requires little energy and effort to store. It's also extremely stable through long periods of time. That's why we can extract pieces of genetic material from ancient fossils and determine whether they have living relatives today, what they might have looked like, and even if they were lactose intolerant or not. Compare this to the standard procedure in modern data centers of replacing hard drives every 3 years because they're not considered to last longer than that - a procedure which is expensive and bad for the environment.

Easy to copy

I wasn't around for floppy disks but I've heard the stories. Nowadays there are few laptops with a floppy disk reader, and the medium became obsolete. One great thing about using DNA for data storage is that basic tools and methods of working with DNA are being improved constantly anyway; as long as there is life as we know it DNA will need to be handled. The side products of researching genetics for other applications is that using DNA for data storage will continue to improve.

It's already done so with replicating DNA from the Human Genome Project in the 90s/early 2000s. We can copy large volumes of DNA very cheaply, ideal for keeping long term archives safe or to do more experiments with.

The Downsides

Slow read and write rates

It doesn't take much to use a hard drive or USB stick; you just stick it in (hence the name). In hard drives, a small needle reads the magnetic state of tiny sections on a stack of discs. These discs spin extremely fast and the needle arm scans the surface equally as fast, producing that humming and clicking sound characteristic to desktop computers. But with DNA, you need expensive lab equipment to both synthesize it and extract information from it by sequencing. The process still involves trained scientists and takes time, unlike the fractions of a second that hard drives use to handle entire video files. Not to mention most people don't have a genetics lab at home, and it's also difficult to pick out specific pieces of information in large data sets in DNA. So don't expect DNA data storage to replace short term memory, USB sticks, or even personal hard drives any time soon.

That doesn't mean DNA can't be used for long term archival storage, though.

The cost

As mentioned - fancy lab equipment + trained scientists + time = expensive. Although the Human Genome Project led to a 2-million fold decrease in DNA sequencing costs, the cost of synthesizing DNA is still too high (over 2000-3000$ per megabyte). In an experiment by the European Bioinformatics Institute, 98% of the budget went to synthesis and only about 2% to sequencing. To be competitive with other long term storage techniques like magnetic tapes, these costs needs to be reduced by many orders of magnitude. But with Moore's law, that might happen someday.

Failure in the process

Sometimes errors slip in when trying to create DNA or read its base pairs, which can't be the case when storing important information long term. Luckily this is a relatively approachable problem because you can use error correcting algorithms.

For example, say you were trying to memorize three numbers: 8, 3, and 11. You memorize each individually. But you can also memorize the sum (22). If you forget one of the numbers, you can use the sum and the other two numbers you still know to determine the last.

Plus, as long as we inhabit biological bodies we will try to drive down the cost of DNA synthesis and sequencing, as well as increasing speed and accuracy, because these are things that help us in countless ways. Unlike relying on old operating systems or forgotten social media platforms, working with DNA won't become obsolete anytime soon.

What is DNA of Things (DoT)?

The idea started in 2019 with researchers in Israel and ETH Zurich who encoded information into DNA and embedded this DNA into tiny silica beads which were in turn embedded into the filament of a 3D printed bunny. This embedding is only possible due to DNA's extremely high information density. The only thing you need to know to create the rabbit was an existing one.

Courtesy of ETH Zurich

This mimics biological life by creating a self-contained object which contains its own replication and instruction manual. This new application might be good for tracking medicines, marking paints and their exact shades used, certain smart materials used in construction, self replicating devices or robots, and of course spies who need to transfer sensitive information without drawing attention (the next iteration of the James Bond glasses perhaps?)

But what if you store data in a living mammal? Read the next post here. Cheers!


  • Thumbnail image from Raymond Gosling/King's College London. Cover image: "2010 DNA Distribution" by igemhq.
  • Koch, J., Gantenbein, S., Masania, K. et al. A DNA-of-things storage architecture to create materials with embedded memory. Nat Biotechnol 38, 39–43 (2020). https://doi.org/10.1038/s41587-019-0356-z
  • Extance, Andy. vol. 537, Macmillan Publishers Limited, 2016, pp. 22–24, Digital DNA, www.nature.com/articles/537022a.pdf.
  • “How Big Is the Internet? Hint: Probably a Lot Bigger than You Think.” Starry Blog, 5 Mar. 2020, starry.com/blog/inside-the-internet/how-big-is-the-internet.
  • “Why DNA Data Storage Is the Future.” Savjee.be, savjee.be/videos/simply-explained/why-dna-data-storage-is-the-future/.