// new media class

Alexander Hahn — Bioinformatics: Nature’s superiority over binary computing

Free Projects

All data inside of a computer is stored in a binary form. Such a digit has two possible states, on or off. This is usually described as a 1 or 0. Each state is called a bit, eight of them result in a byte. There is a similar coding system in genetics. The DNA consists of four different nucleobases: adenine [A], cytosine [C], thymine [T] and guanine [G]. These are all molecules and could be also interpreted as four different states. They bind together to the shape of a double helix and are the basic information level of life.

While modern computers were developed within the past 80 years, the DNA code is more than four billion years old. This longer development time results in some areas of superiority compared to electronic computing. For example DNA strings are able to fix themselves. Even though there are four different bases, they only combine in a very limited way. A and T are combined together to form a pair, and then also C and G. This means that whenever one molecule breaks, the DNA can get repaired, because on the other side is a backup. It only gets problematic if both are broken (Hubert, 2021).

In contrast a computer can not repair itself. There are several occasions of random bit flips, where cosmic rays hit semiconductors and errors occurred that were impossible to comprehend. Especially in the case of a voting in Schaerbeek in Belgium this created huge trouble. There, a candidate got 4,096 votes too many, because one bit inside of the voting computers got hit by a cosmic ray (Johnston, 2017). Computing systems are very fragile to errors. Often one single line of wrong code is able to create a crash. In nature this wouldn’t happen, because it is way more focused on fixing these.

"Natural organisms are constructed to make errors as [...] harmless as possible. Artificial automata are designed to make errors as [...] disastrous as possible. [...] We are [...] much more "scared" by occurrence of an isolated error and by the malfunction [...] behind it. Our behavior is clearly that of overcaution, generated by ignorance." - John von Neumann (1948)

Also in terms of data storage DNA is in many terms more advanced than our electrical disks. In fact the density of DNA information is so high, that you could put the worldwide information into a DNA solution of only nine liters (Hubert, 2021). That and its longevity are the reasons why Microsoft and the Washington University are researching how to create a fully automated DNA data storage.

“Under the right conditions, DNA can last much longer than current archival storage technologies that degrade in a matter of decades. Some DNA has managed to persist in less than ideal storage conditions for tens of thousands of years in mammoth tusks and bones of early humans, and it should have relevance as long as people are alive.” - Christopher N. Takahashi (2019)

Not only in terms of data density but also in compression nature should be a role model for today's IT industry. If you convert the ACTG bases of a human into bits and bytes, the size is only 750 MB (Hubert, 2021). The minimum requirements of Adobe Photoshop 2022 are four Gigabytes. Just think about what capabilities Photoshop has compared to a human being.

The tricky part about bioinformatics is the fact that DNA is water based and not as deterministic as electrical computing. It is also slower and jiggling. Current harddisks have a linear read speed of 100s of Megabytes, DNA only of 15 bytes per second. Also in copying the speed differences are big. There the values are several Megabytes versus 250 bytes/s. But at the same time it also needs only 1 picowatt, when the harddisk uses 10 Watt (Hubert, 2021).

Another important chapter of understanding bioinformatics is the way algorithms are implemented into proteins. So just for a better understanding: Out of DNA you can copy a RNA string, which is converted to a string of amino acids that form proteins. And those proteins are the parts in microbiology that execute the functions. Inside of a cell the producing factory is called a Ribosome. It is producing proteins and built out of RNA. Just like in computer coding, typical operations like for example boolean or if else conditionals are also present in those proteins (Hubert, 2021). This can be seen very clearly in E.Coli bacteria. In order to survive they need to eat glucose, but if that is not available they can also eat lactose. Breaking that down takes more steps and is less efficient so they prefer the glucose. A coder would describe that as “if glucose is available, eat it, else eat lactose.” In the protein a lack of energy causes a buildup of a cAMP molecule, which is basically a signal of hunger. This changes to a CAP molecule and promotes a particular DNA sequence, so that in the end the protein either binds to the lactose or another specific DNA sequence that represses the lactose binding (Kovach, 2014).

DNA is a universal code, so any microorganism, plant, animal or human being is using exactly the same basic building blocks in their DNA. This means that in theory you could copy DNA strings from one species to another and therefore exchange and combine different survival techniques. This method is called interspecies gene transfer. For a long time scientists thought it was impossible, because cells and organs vary too much in between different species. But one of the most fascinating phenomena of the world of plants actually originated from such a gene transfer: photosynthesis. Scientists found out that the parts of the cell that are responsible for the photosynthesis in plants are originally from cyanobacteria. These two organisms merged once together into a symbiosis and after millions of years of mutation transferred their genetic code. This included up to 2000 genes that the Arabidopsis plant inherited from cyanobacteria (Martin, 2002). In 2018 scientists were technically at the stage where they could synthetically do this step the other way round. They took Chlorophylls, which are the natural color pigments in plants that are essential for the process of the photosynthesis, and put them into E. coli bacteria. By changing the expression of 12 genes, their cells turned green (Chen, 2018).

At the Massachusetts Institute of Technology scientists even went one step further. They developed a programming language that lets coders design complex DNA-encoded circuits that give new functions to cells (Brophy et al., 2014).

“You could be completely naive as to how any of it works. That’s what’s really different about this. You could be a student in high school and go onto the Web-based server and type out the program you want, and it spits back the DNA sequence.” - Christopher A. Voigt (2015)

The research on the connections between various genes and their proteins on a cellular level is called the first wave of synthetic biology. This includes switches, biological clocks, cascades, pulse generators, time-delayed circuits, oscillators, spatial patterning and logic formulas as well as the regulation of gene expressions, protein functions, metabolism and cell to cell communication (Purnick, 2009). But in recent years the second wave started, which focuses on whole synthetic eco-systems. Here everything needs to be thought in a circular way and the challenges include cell deaths, crosstalks, random mutations and intercellular conditions.

The latest success of mRNA based covid vaccines showed the great possibilities of synthetic biology. After Moderna got the DNA sequence of the covid-19 virus, they needed only two days to crack the code of it, because there were only an additional 12 nucleobases to other coronaviruses: CCU CGG CGG GCA (Webb, 2022, pos. 177). The long waiting time occured only because of the duration of the testing on human and governmental permissions. But the code of the mRNA sequence stayed exactly the same and didn’t need any correction during the whole procedure.

Another uprising technique is CRISPR. It was found in bacteria in order to defend against invading viruses and nucleic acids (Brokowski et al., 2018). Just like a pair of molecular scissors, CRISPR-Cas9 can precisely copy and paste DNA sequences. Gene editing used to be a technology that was only available to huge companies with high tech laboratories. Now you can get a CRISPR Odin kit for only 170$ (Zayner, 2022) and with the help of some Youtube tutorials you can change the DNA of your favorite bacteria at home. This democratization of GMO-research implies an enormous range of extraordinary new inventions but also possibly catastrophic risks. When Bill Gates and Steve Jobs were developing software in their garages, it wasn’t too bad if they implemented any bugs into it. Those could always be fixed afterwards and if a computer got outdated, their code would be gone anyway. But in contrast biological organisms copy their genetic codes to their descendents, so that the change of a specific genome might have implications for future organisms in millions of years. Also the editing usually only takes place when the organism is only the size of one cell. Bacteria and archaea are almost always prokaryotes, so gene editing is rather easy with them. But plants, animals and humans consist of an enormous amount of cells, they are eukaryotes, so here the genetic modification can only be done at the beginning of a new generation. In the case of humans, that is when the egg cell and the sperm merge. Our bodies have nearly 40 trillion cells (Zimmer, 2013) and most of them include a copy of our DNA. It is obvious how changing all of them would be too complicated and take way too much time.

Another powerful CRISPR tool is gene drives. When an organism mutates, there is usually a 50% chance that it passes that gene on to the next generation. But because the possibility halves every generation, researchers would need huge amounts of edited organisms in order to change all the genomes of the whole species. With gene drives the probability is 100%, so if you add a mutation to every thousandth individual, it takes only twelve generations until the whole population of a species will get the changed gene. The basic idea is to insert a whole factory of CRISPR scissors into the genome. In nature this phenomena was noticed within worms, yeast, fish, insects and rodents. But because often resistances against specific gene drives are built up, scientists are nowadays focusing on combining several gene drives with each other. However gene drives only work for sexually reproducing species with short reproduction times, which includes a lot of insects but not mammals.

Currently this technology is mostly used on insects. In the fight against Malaria, scientists developed an edited female mosquito with a deformed mouth so that it can’t bite and spread the disease. Also their reproductive organs are changed in such a way that they can not lay eggs anymore. The scientists claim that by releasing only 1% of edited mosquitos in Africa, they could get rid of malaria within one single year. This would mean 400 thousand less yearly human death cases, a number that will increase with rising temperatures due to climate change (Wikipedia, no date).

Another current field of discussion are invasive animals. In many areas human transported foreign species into far away regions where these lack a natural enemy and therefore reproduce in such enormous amounts that they threaten the present ecosystem there. Such an example are rats in New Zealand that already pushed a fourth of all local birds into extinction (Yong, 2017).

On the first sight both examples have very strong advantages for a rather cheap and realistic price. But in terms of side effects there are many concerns. These could be disastrous because there would be a lack of food for the natural predators of mosquitoes like for example spiders. Ideally the mosquitoes would be genetically immune to malaria so that not the whole species has to be eradicated, but that is currently not the case. With invasive animals it is more tricky as they don’t have natural predators in their new habitats, so you might easily say that they create more damage there than that they are useful. But what happens if a mutated rat is transported by accident to Australia or the Asian continent and also there all rats go extinct? How local can you think the power of gene drives in a globalized world? Especially in the long-term over dozens of generations?

Bert, Hubert (2021), DNA seen through the eyes of a coder. Available at:
https://berthub.eu/articles/posts/amazing-dna/ (Accessed: 02 June 2022).

Brokowski, Carolyn et al. (2018), CRISPR Ethics: Moral Considerations for Applications of a Powerful Tool. Available at: https://www.bu.edu/khc/files/2018/10/CRISPR-Ethics-reading.pdf (Accessed: 23 June 2022).

Brophy, Jennifer et al. (2014), Principles of genetic circuit design. Available at: https://pubmed.ncbi.nlm.nih.gov/24781324/ (Accessed: 16 June 2022).

Guangyu E. Chen et al. (2018), Complete enzyme set for chlorophyll biosynthesis in Escherichia coli. Available at: https://www.science.org/doi/pdf/10.1126/sciadv.aaq1407 (Accessed: 02 June 2022).

Johnston, Ian (2017), Cosmic particles can change elections and cause planes to fall through the sky, scientists warn. Available at: https://www.independent.co.uk/news/science/subatomic-particles-cosmic-rays-computers-change-elections-planes-autopilot-a7584616.html (Accessed: 02 June 2022).

Kovach, T. K. (2014), Jacob-Monod: The lac operon. In Gene control. Available at: https://www.khanacademy.org/test-prep/mcat/biomolecules/gene-control/v/jacob-monod-the-lac-operon (Accessed: 02 June 2022).

Priscilla E. M. Purnick et al. (2009), The second wave of synthetic biology: from modules to systems. Available at: https://www.nature.com/articles/nrm2698 (Accessed: 05 June 2022).

Webb, Amy. (2021) The Genesis Machine, Our Quest to Rewrite Life in the Age of Synthetic Biology. New York: Public Affairs.

Wikipedia (no date), Gene drives. Available at: https://en.wikipedia.org/wiki/Gene_drive (Accessed: 16 June 2022).

William Martin, et al. (2002), Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Available at: https://www.pnas.org/doi/10.1073/pnas.182432999 (Accessed: 02 June 2022).

Yong, Ed (2017), New Zealand’s war on rats could change the world. Available at:
https://www.theatlantic.com/science/archive/2017/11/new-zealand-predator-free-2050-rats-gene-drive-ruh-roh/546011/ (Accessed: 09 June 2022).

Zayner. Josiah (2022), DIY Bacterial Gene Engineering CRISPR Kit. Available at:
https://www.the-odin.com/diy-crispr-kit/ (Accessed: 05 June 2022).

Zimmer, Carl (2013), How Many Cells Are In Your Body? Available at: https://www.nationalgeographic.com/science/article/how-many-cells-are-in-your-body#:~:text=drumroll%20%E2%80%A6,magnitude%20except%20in%20the%20movies. (Accessed: 05 June 2022).