Chapter Seven CRISPR
What if you found a strand of DNA that was surrounded by a lighted marquee, flashing with signs that say ‘look at me!!, look at me!!’ What would you think of this? In the late 1990s, several researchers working independently found some DNA sequences that seemed to them almost as if they were intentionally highlighted. These sequences were highly unusual.
They had a long sequence of the standard letters of DNA but these letters were interspaced. In other words, they had a blank space between each of the letters. The great majority of DNA has letter after letter, with only an occasional space between the letters. These unusual sequences had spaces between each letter. This set these sequences apart.
It would be as if I put a hyphen between every letter in a sequence of words of the book. For example, if I spelled out the title of this book as t-h-e-m-e-a-n-i-n-g-o-f-l-i-f-e. This stands out. This ‘interspaced’ mechanism made researchers look at certain sequences of DNA. When they looked closely, they found something even more strange. These sequences were ‘repeated palindromes.’ A ‘repeated palindrome’ is a set of letters that goes foreword and then repeats; only going backward. Here is what one would look like, using the title of this book as a sample: The palindrome for themeaningoflife is efilfognineameht (the same letters, backward). The interspaced palindromic repeat would be themeaningoflifeefilfognineameht. Adding the spacing you get t-h-e-m-e-a-n-i-n-g-o-f-l-i-f-e-e-f-i-l-f-o-g-n-i-n-e-a-m-e-h-t.
If you step back from the page and look at the text from a distance, as would researchers looking at the letters in a DNA sequence, you would see how this catches the eye. You would even be more likely to notice it if you had looked at millions of pages that had only rare spaces and these at irregular intervals, with no palindromes, and suddenly saw several full pages with nothing but clusters of interspaced letters. Then, if you look at them closely enough to see that they were clusters of interspaced repeated palindromes (something you would never expect to see occur by chance, even once, let alone repeated over and over) you would realize there is something special about these sections of DNA. They aren’t random. There is some reason behind them.
In 2000 a team at the University of Alicante in Spain led by Francisco Mojica named these DNA sections ‘Clustered Regularly Interspaced Palindromic Repeats or CRISPR. Researchers had good reason to look closely at the CRISPR sections. They stood out. When they first saw these things, they wondered if they meant something. It was worth a look.
They eventually found that the repeated palindromes were not random letters. They were each codes for certain sections of DNA in microbes that have the potential to harm DNA. Each palindrome was the name of an enemy of DNA. They found that if they put the CRISPR DNA in with its enemy, the CRISPR would go to the section of the DNA indicated by the palindrome, cut out the section represented by the palindrome. It would then replace that section of removed DNA with the codes for another section in a different palindrome. Normally, this would kill the enemy microbe; it couldn’t continue to survive without the section of DNA that had been removed and replaced. The enemy died. The CRISPR was an attack molecule. But it wasn’t just any attack molecule. It was adaptive.
This meant that if the DNA found a new enemy, it could edit the sequence in the palindromic repeats to match the name of this new enemy. It could also adapt the replacement DNA. This means it could do more than just attack the DNA of an enemy. It could be used to repair DNA. It could also be used to modify DNA. This ‘CRISPR’ was an amazing discovery. Scientists quickly discovered that they could easily and quickly modify the palindromic sequences themselves. This meant that they could alter DNA.
The best analogy people have been able to come up with for the function of the CRISPR is the ‘cut and replace’ function in a word processor. Say you write a long text and you put in a phrase several times that you later find you don’t like. You want to replace it with a different phrase. You can go to the ‘cut and replace’ function in the word processor, type in the phrase you want removed, type in the new phrase you want, and press enter. The computer will find every instance of that phrase in the text and replace it with the desired phrase. This is what CRISPR does. It cuts out a gene sequence that is undesired and replaces it with something that is desired.
What Does It Matter?
Imagine you are on a world with highly advanced sciences. Your people are going to send ‘life’ to other worlds. Your people haven’t found a way to overcome the limits of the speed of light and the laws of inertia: you can’t send anything faster than the speed of light and will require immense amounts of energy to send anything at even very high speeds.
The less matter you need to send, the less energy you need to get it heading to the other world. Since the distances between stars are vast, and you will want to send life to many different star systems, you will want a ‘life package’ that is small and light as possible.
You have put together a little package. It contains DNA for the simplest life form, the prokaryote that will be needed to generate the oxygen for the more complex life forms. It contains the DNA for the more complex life forms, the life form the eukaryote. It contains the ‘operating system’—the ‘software’—that will turn both of these DNA molecules into living things, tell them how to get energy and tell them how to reproduce.
The cyanobacteria will reproduce asexually. Each parent will basically split into two new organisms that are identical to the original. They will not evolve because they will all be the same. The cyanobacteria’s main function will be to create the conditions needed for the complex life forms to exist. These complex life forms need atmospheric oxygen to metabolize sugar (turn it into the ATP that will produce the electricity needed to operate the life forms). Oxygen is highly reactive and bonds with almost anything. If there is silicon on the planet where you are sending the life, or iron, or aluminum, you will not find any free oxygen: it will be bound with the silicon, iron, or aluminum.
However, if the atmosphere of the planet has carbon dioxide—which is an extremely stable molecule and doesn’t really bond with anything—you can get all of the oxygen you need from the carbon dioxide. The prokaryotes will use photosynthesis to split the carbon from the oxygen. The carbon will become the skeleton of the sugars, proteins, and fats of their body; it will be an integral part of the prokaryotes. When they die, their carbon-filled bodies will sink to the bottom of any bodies of water where they live and get covered by silt. Over the course of time, the bodies of the prokaryotes will fossilize and become coal, oil, or natural gas, depending on the conditions where they end up. The oxygen that was once a part of the carbon dioxide will go into the atmosphere and build up there.
After enough time, the atmosphere will have enough oxygen to support the more complex life forms. Some kind of trigger will take them out of hibernation, and they will begin to reproduce. The more complex organisms, called ‘eukaryotes,’ will reproduce sexually.
This will lead to huge variations in the genetic codes of the offspring. The offspring that have variations that give them advantages will have greater abilities to get food and live long enough to have offspring of their own. Their DNA lines will continue while the DNA lines of less capable offspring will die out. As time passes, the branches of the tree of life will split, with the intermediate sections disappearing to lead to more and more species, each of which has a different set of advantages and needs. Throughout all of this, the capabilities of the most capable beings will increase.
As you design this DNA, you think that, one day, beings with the ability to think on a conscious level will evolve on the worlds you are seeding. These beings will eventually gain the ability to read the letters in the DNA. DNA overlays three codes, one on top of the other. I have gone over the codes in other parts of this book, but I want to repeat the discussions here for a kind of refresher:
The First Code
The first code is the reproduction code. There are four letters in the reproduction code: ATGC. The letters ATGC stand for four amino acids, adenine (A), thiamine (T) guanine (G), and cytosine (C). Each link of the DNA—meaning from each ‘spine’ to the middle of the ‘rung’ of the ladder’—is one of these four letters. The letters form into ‘base pairs’ with one of the letters (say ‘A’) being on the one of the spines of the double helix and the other (say ‘T’) being on the opposite spine. Each base pair makes up one of the ‘rungs’ of the ‘ladder.’ The letters don’t bond in a random way; they can only bond with their ‘compliment.’ A and T are compliments and G and C are compliments.
A always bonds with T.
T always bonds with A
G always bonds with C
C always bonds with G.
If you get the sequence: ATGC on one spine, you will have the sequence TACG on the other spine. There is one and only one match for each of the four letters. This system allows the DNA to make perfect copies of itself. Here is how this works:
It splits down the middle. This leaves two half ladders with sequences of letters. A special and very complex protein called a ‘ribosome’ acts like a worker. It goes down the letters of one spine, one at a time, and matches each letter with its compliment. After this is done, the ‘rungs’ on the ladder are back as they were in the original molecule. Other proteins rebuild the spine to create a new DNA molecule that is identical to the original. Other ribosomes and proteins do these same things to the other ‘half ladder,’ creating a second copy that is also identical to the original.
The total length of the DNA in humans is about 3 billion links. The ribosomes work very fast; they match 2.5 million letters per second, rebuilding 2.5 million ‘rungs’ of the ladder, allowing them to complete the rebuilding of the entire 3 billion link chain—meaning to reproduce DNA, starting with one molecule and ending with 2 molecules—in about 20 minutes.
Since the ribosomes are working at this fantastic speed, they sometimes make mistakes. These mistakes are exceedingly rare, but they do happen. After the ribosomes are through, other protein ‘workers’ come along that we may think of as ‘building inspectors.’ This inspectors look for mistakes in matching the letters. If they find mistakes, they bring in other worker proteins to cut out the wrong letter and replace it with the correct letter. After this process is finished, the reproduction is perfect, with every letter in the exact right place. The two new DNA molecules are both exactly the same as the original molecule, with every single one of the 3 billion letters in the exact same place. In fact, the copy is so perfect that every one of the atoms in the copy DNA are in the exact same position in relation to all of the other atoms as the original. The copy mechanism is amazingly precise. The first code in DNA is what makes this precise copy process work.
The Second Code
Each 3 letters forms a ‘codon’ or ‘triplet.’ There are 64 possible triplets. You could think of these triplets as letters in an alphabet. If you read down from the top of the DNA, the first triplet is the first letter. If you go down to the next triplet, you get the second letter of the coded message. You can go down through all the triplets, one at a time, and get letter after letter. There are a total of one billion triplets, meaning there are 1 billion letters in this second coded message.
There are 64 triplets possible, meaning there are 64 ‘characters’ in the alphabet of this second code. But only 21 of these letters are in actual use. In 1954, the discoverers of the first code (Watkins, Wilkins, and Crick) found that the second code matches each of the triplets to one of exactly 20 amino acids OR will be a character that they call an ‘end of chain’ character, that tells the ribosomes that use the code to make proteins that the protein chain for that particular protein is finished. This means that the second code only contains 21 characters, not the full 64 that are available for use. Because there 64 ways to make up 21 characters, each of the characters can be made by more than one triplet. You can find the combinations listed in the chart below.
Qqq genetic code chart.
You could think of this second coded message as a kind of ‘materials list’ for making all of the things that DNA makes.
The third code
Scientists figured out the 20-digit code in 1954, the same year they figured out the 4-digit code. Since each of the 20 digits in the larger code, called ‘the genetic code,’ can be made more than one way, a third message can be overlaid over the first two. Since this third code has far more letters in it, it can contain far more information than either of the first two codes. Each extra digit that you add to the alphabet increases the information carrying capacity by one ‘order of magnitude.’ This means that if you can carry X amount information with a 20 digit alphabet, you can carry X2 (X squared) times that amount of information in a 21 digit alphabet. You can carry X3 (X cubed) amount of information in a 22-digit alphabet, and so on. You can therefore carry X44 (X to the 44th power) times more information in a 1 billion character message written in a 64 character alphabet than you can carry in the same length message in a 20 character alphabet. You could easily encode all the information in the largest paper-printed encyclopedia in a 1 billion-character message (the length of the message in human DNA) so you could carry 1 billion to the 44th power information if you were using a 64-character alphabet. This number (10396 possible pieces of information) is immense, far to great of a number for the human mind to comprehend. (By comparison, there are estimated to be between 1078 to 1082 atoms in the universe.) If the DNA was created by intelligent beings, and they have encoded a message in DNA, this message may possibly be very long and complex. It may contain more information than all of the books, letters, emails, tweets, blogs, and spoken words ever uttered or printed by humans.
What if you were putting together this package and you wanted the beings that would ultimately evolve to notice something about the DNA and give them a starting place to look for the message. You may highlight certain sequences in some way. The CRISPR sequences stand out so strongly that people saw them at a very early stage in our understanding of DNA technology. The first human genome was not sequenced until 2003. In 1983, we only had the most primitive sequencing equipment, but the CRISPR stood out so much that, even with primitive equipment, it was obvious. Why CRISPR? Of all of the DNA sequences that could be highlighted, why choose this one? For that, we have to go on to the next chapter.