Here we go…
We’ve discussed a lot of different similarities between the natural world and the computing, but today we’re going to hit a little closer to home. We’ll tackle the startling similarities between the code that runs your computer and the code that describes you. That’s right, we’re going to talk about DNA. There are some interesting similarities between DNA and code, but we’ll discuss some differences first before I get too deep into it.
DNA and code share one important key factor – They both encode information. Where code (usually) uses a binary sequence, genetic material codes using a series of nucleic acids, whose base components are called nucleotides.There are 4 possible nucleotides that can sit at each spot in the sequence- A, T, C, and G. A single strand of RNA (a DNA molecule split in half by a cell’s proteins) looks something like this:
Since there are 4 possible values for each spot in the sequence, the number of possibilities that can be represented by a string of nucleotides grows much faster compared to a binary value. While a binary string can represent 2^n unique states (where n is the number of digits), a DNA strand can represent 4^n unique states! That might not seem like the biggest difference, but it grows fast. 10 binary digits can be 1024 different values, but 10 base pairs can have 1,048,576 different arrangements! And the difference only grows as you add more digits. BUT THAT’S NOT EVEN ALL!
A full strand of DNA (as opposed to RNA) has sets of base pairs, where A always bonds to T, and C bonds to G. DNA can be read from either of its two sides, so in a sense it actually has 2 * 4^n states. Wouldn’t it be cool if code could be read backwards too?
How it’s Read
To read a string of binary digits, some form of sensor that detects the state of each transistor in memory (there are a number of kinds for different mediums) passes over each state in a sequential manner. Based on the data that’s read from the states, the sensor can move to different areas in memory to get specific data from somewhere else (a branch operation), or make decisions based off of the data it’s reading. (Is this value bigger or smaller than a constant? Than some other variable?)
DNA can be read by first splitting in half (becoming RNA), then the cell’s ribosomes pass over each codon. To copy the DNA, each half of the split is paired up with a sequence exactly opposite ( A <==> T, G <==> C) to reform the base pairs, resulting in two copies of the DNA. When the information needs to be expressed, the unused RNA can be snipped out and then processed by the cell, producing the desired traits. This is how cells become all of the different kinds – they locate the code that expresses their type, cut out all of the rest, and act on what’s left.
With both DNA and machine code, the fact of the matter is that not all of it counts all of the time. In a code file, a programmer often adds comments to make the code more human-readable so that other programmers may better understand it, and understand it faster. Here’s an example from some PHP I wrote:
//Create image from data, grab its width and height
$srcimg = imagecreatefromstring($content);
$swidth = imagesx($srcimg);
$sheight = imagesy($srcimg);
//Create a destination for our thumbnail to be stored with appropriate
//scaling and width 100
$thumbWidth = 100;
$thumbHeight = (100/$swidth)*$sheight;
$thumbnail = imagecreatetruecolor($thumbWidth,$thumbHeight);
The lines that begin with “//” are comments – they aren’t code. Even if you can’t read all the other lines, these ones probably make sense to you and you should be able to get some idea of what’s going on there from them. (I was manipulating an image on a web page)
Just like code has comments, DNA has sequences that indicate to the proteins that read them that the sequence that follows is not meant to be read. While in some cases the DNA is just junk (not used at all), usually this is to signal to the cell that the information following doesn’t code for that particular cell. Because each cell carries a copy of the entire genome with it, it uses these as a way to signal which code is actually important to the cell reading it. Unlike code, when a cell reaches a bit of data it doesn’t need, it actually cuts it out of the sequence rather than just ignoring it.
When it Goes Wrong
Sometimes, the data that defines a system becomes corrupted. In the world of the digital, this can be from noise, failure during transmission, or the (very) occasional computational error. In the world of biology, errors are called mutations, and they can range from positive to horrifying.
When DNA becomes corrupted, some of the possible outcomes include cancer and tumors. They occur when a cell that has a harmful mutation is not stopped from dividing. The error spreads until the harm either interrupts the necessary processes of life, or the cells with the mutation all die.
In particular, the situation is similar to a very peculiar kind of computer issue. In a *nix system, a process can be cloned from a parent process (becoming a child process) using a system function called fork(). In a way, this is much like a cell’s ability to divide into two identical cells, and when it goes wrong the computer can lose control much like a body can. When fork() is either programmed incorrectly, or repeated intentionally by a malicious user, resources are used until the OS is depleted of them and subsequently crashes. Ouch.
When it’s Meant to go Wrong – Viruses
This is a really interesting connection to me because the terms match. As well, both are horrifying.
Just look at this. Isn’t it creepy? Its technical name is bacteriophage
Seriously, though. Imagine thousands of these guys latching onto your cells, splitting them open, and then using your dead cells to ‘give birth’ to more! (An approximate term since technically, viruses are not living creatures) Similarly, computer viruses tend to attack the normal functions of otherwise good code by exploiting some weakness the developer didn’t think of. The virus then usually has some code that uses your infected computer to infect other computers. Creepy!
I think that’ll be all for now. brb, time to go update my anti-virus…
This post inspired in-part by Bert Hubert’s magnificent page on the same topic. Please check it out! It’s a lot more comprehensive (but a bit more technically demanding, and written for programmers)