Aging Info Home
This overview of modern genetics is excerpted from the online book: The Evolution of Aging http://www.azinet.com/aging/Aging_Book.html.
Our knowledge of the mechanics of genetics has increased enormously since the time of
or even the time of Medawar . This chapter is intended to provide only a brief summary of the aspects of modern genetics that are relevant to discriminating between various theories of evolution and theories of aging. (See a genetics textbook such as the excellent GENES VIII by George Lewin for more on genetics.) Since evolution involves the modification and propagation of heritable information that directs the characteristics of organisms, an understanding of the mechanics of genetics is critical to understanding evolution. Darwin
Early scientists thought that sexual reproduction involved transmission of a miniature microscopic animal. The animal merely subsequently grew larger. But what was the source of the miniature animal? Another theory had it that the miniature animals were nested such that the outermost animal grew larger and then transmitted the remaining nested microscopic animals during reproduction. This scheme would apparently be limited in the total number of consecutive reproductions and also did not explain why animals shared characteristics of both parents.
’s time, it was apparent that what was transmitted in reproduction was primarily information that enabled the descendent organism to construct itself according to a plan that was provided jointly by its parents. The information was somehow stored in the organism during its life and then transmitted to descendents during reproduction. Although early scientists thought that events that happened during the life of an organism could affect and modify the stored information in a structured way, this was subsequently disproved. Darwin
So sexual reproduction involves the copying and transmission of genetic information as well as the structured merging of information from two parents. Growth of an organism involves reading and interpreting the information and constructing an organism according to the plan conveyed by the transmitted information. Finally, the design of all organisms provides some mechanism for storing the genetic information so that it is available for subsequent reproduction.
’s theory tells us that species evolved from other species so it is obvious that some mechanism must exist for modifying genetic information. Darwin ’s theory says that species could build upon and extend the characteristics of ancestor species so that it is clear that the modifications are progressive and cumulative. We now know that evolution on Earth has been progressing for about four billion years. The mechanism that is being used by nature to copy and store genetic information is apparently capable of such high fidelity that such a progression is possible. Darwin
Analog and Digital Data
Here we need to take a small detour to discuss the two ways in which information can be stored and transmitted, namely analog form and digital form. These two modes for transmission, storage, and copying of information have very different properties.
Edison’s phonograph (1877), a diaphragm converted the pressure of sound in the air into the displacement of a needle that then made tracks on a wax or tinfoil cylinder. The displacement and path of the resulting track was continuously variable in response to the sound pressure. The phonograph was an instrument for storing and reproducing information in analog form. The information was both accepted and returned as a serial sequential stream. The information stored in such a recording could be copied to make thousands of duplicate recordings that could be transmitted far and wide. AM and FM radio, “analog” television, audio cassette tapes, and VHS video tapes are examples of current analog data systems.
In contrast, Morse’s telegraph (1844) represented a serial digital communications system. Instead of being continuously variable, the signal sent down a telegraph wire was binary and had only two states, “mark” and “space”, known in communications terms as symbols. The operator converted written characters into a “code” consisting of long or short marks separated by spaces. Longer spaces denoted the beginnings and ends of characters. Yet longer spaces denoted the beginnings and ends of words. Currently, the Internet, CDs, DVDs, space communications systems, and digital television are all examples of digital communications systems.
One of the problems with analog communications is noise. Since the signal is continuously variable, any disturbance introduces an error or discrepancy from the original signal that cannot be removed. This is an especially severe problem when consecutive copies of information are made.
Edisoncould make thousands of copies of an original because each copy was a copy of the original, that is, there was only one “generation”. If we needed to make a copy of a copy of a copy of a copy the noise buildup would be very severe. Each generation adds more noise.
In digital systems, noise is not as much of a problem. Because the telegraph had only two symbols, disturbing noise could not cause an error unless it was so great as to cause “mark” to be confused with “space”. For the same reason digital data can be regenerated and noise removed. Copies of copies are not as much of a problem with digital data. A copy of a CD, or DVD is usually exactly as good as the original. Copies of copies can be made indefinitely.
Some digital systems have more than two symbols. For example, English, a serial digital communications system, has 27 primary symbols (counting space).
Any digital system can ultimately be reduced to binary digits or bits. That is, we could convert the 27 possible English symbols to 27 possible combinations of five binary digits.
Here is an illustrative example of a digital communications system:
Suppose we had some automated weather stations and wanted to send a digital message from each station to our central location several times per hour. Suppose further that our system works by sequentially sending any of four possible symbols denoted A, B, C, and D. We could devise a message format or code as follows:
Symbols would be sent in order reading from left to right.
First, the station sends a three symbol synchronization pattern sss. This is a known fixed pattern that allows the receiver to determine the meanings of subsequent symbols. We could choose the value AAA for the synchronization pattern. (In English, synchronization is performed by spaces and punctuation characters.) We can follow this with three symbols (www) denoting the weather station sending the message.
Next are three symbols giving the wind velocity. Since there are four possible symbols (A,B,C,D), three symbols together have a total of 64 possible values. We could convert the analog wind velocity to a number between 0 and 63 and then represent it with three symbols. AAA would correspond to 0, AAB correspond to 1, and DDD correspond to 63. Next come two symbols denoting wind direction. Two symbols have 16 possible values. AA could correspond to N, AB to NNE, and so forth. Note that information is being lost in the conversion between continuously variable analog form and digital form. Although the actual wind direction might be anywhere between say North and Northeast, the “analog to digital converter” is forced to pick one of the allowed values. Presumably, if the actual direction is closer to North than Northeast or Northwest it picks North. Next, we have a single symbol denoting the sign of the temperature, (D denotes positive) followed by four symbols denoting temperature. Since four symbols are used, the temperature can have 256 possible values allowing temperature to be conveyed more precisely. We then add three symbols each for humidity and air pressure. The system does not care if there are extra junk symbols preceding or following the message as long as they do not duplicate a synchronization pattern. This is because the format or rules for transmitting and receiving the data call for looking for the synchronization pattern and then interpreting only the specified symbols based on their distance from the synchronization pattern. Notice that all the messages have the same organization and format. The information is represented by the specific digital content, which varies from message to message.
One difficulty is apparent. If the temperature or some other parameter had the value 0 (corresponding to symbols AAA) then the receiver might synchronize at the wrong place in the symbol sequence causing all the data to be misinterpreted. We could eliminate this problem by forbidding the value 0 in any of the data sequences and digitizing all the temperatures and other parameters to values starting at 1 instead of 0.
This simple system illustrates some of the properties of digital communications systems.
Because there are only a finite number of symbols, all the data in a digital system is ultimately limited in “precision”. Nothing is continuously or indefinitely variable. The degree of variability is determined by the number of different symbols possible (in this case four) and the number of symbols used to convey a particular parameter. The code or format used must be known and understood in advance at both the transmitting and receiving ends of the communication.
The consequences of an error in a digital code vary enormously depending on where in the format the error occurs. A single symbol error in the synchronization pattern would cause the entire message to be missed. A single symbol error in the “most significant” (leftmost) symbol of the temperature is 64 times larger than a corresponding error in the least significant symbol. An error resulting in insertion of an extra letter or deletion of a letter in a message would result in misinterpretation of all the subsequent data in the message. Insertion or deletion of letters between messages would have no effect unless it created a new synchronization pattern.
In an analog system, errors (noise) tend to cause minor deviations from the true value of a communicated parameter but all communications have errors. In a digital system, error-free communication is much more likely, but errors occasionally still happen. The consequences of a digital error tend to be more severe. In our example message format there are 22 symbols. An error in which one of the symbols was replaced by an incorrect symbol (a substitution error) would cause a major change in reported value unless it occurred in the least significant symbol of a parameter. There are only 5 least significant symbols in our code so more than 75 percent of the possible errors would cause major, even catastrophic, effect. An error in which an additional symbol was inserted or an existing symbol was deleted would be catastrophic in nearly all cases because the subsequent symbols would be misinterpreted.
In modern digital communications systems various methods have been developed to detect and even correct errors. One obvious technique is redundancy. We could send or store the same information three times and compare the data on the receiving or retrieving end. If any of the copies did not agree with the other two we would know it contained an error and discard it. Many more sophisticated ways of ensuring error free transmission and storage of data are in current use.
The reason for this detour is that the “genetic communications system” is in fact a serial digital system and bears an eerie resemblance to modern digital data systems. The genetic system has four symbols, synchronization patterns, formats, redundancy, error detection and many other properties of digital systems. This has significant consequences for evolution theory and aging theory as will be explained in detail.
’s time, many thought (despite some rather obvious discrepancies) that inheritance was an analog, continuously variable, averaging process. It was thought that characteristics of progeny tended to average out the characteristics of their parents. Darwin ’s world was an analog world. Darwin had no reason to consider the digital concepts discussed above. Darwin
Gregor Mendel (1822 – 1884) was an Augustinian monk who conducted very extensive crossbreeding experiments with peas and other plants. Mendel’s paper Experiments in Plant Hybridization (1865) was not widely noted until much later and was unknown to
. Mendel determined that some inherited characteristics were discrete or binary. That is, there was a minimum unit of inheritance such that some characteristics were either inherited by a given individual, or not, with no averaging or intermediate possibility. Inheritance of traits was not continuously variable. Mendel also noticed that some inherited characteristics were latent. Progeny could exhibit characteristics that were not displayed by either of their parents but were displayed by grandparents or other ancestors. Darwin
Watson and Crick in 1953 published their famous paper A Structure for Deoxyribose Nucleic Acid describing the basic mechanism (the “double helix ”) whereby genetic information is recorded, copied, and transmitted in all living organisms. They shared the Nobel Prize in Medicine for 1962 with co-discoverer Wilkins.
Serial Digital Genetic Codes
As determined by Watson, Crick, and extended by many subsequent investigators, the system used by nature to store, copy, and transmit genetic information is a digital system. Genetic information is conveyed by the sequence in which the organic compounds adenine, guanine, cytosine, and thymine are strung together to make long molecules of DNA. These sequences then ultimately determine all the inherited characteristics of the organism. In computer parlance, this would be a serial digital code. Because of the digital nature of the genetic code, some parts of genetic sequences have been faithfully reproduced (ie consecutively copied) for billions of years. As we have previously seen, an analog system would never be capable of accommodating the very large number of consecutive duplications involved in the evolution of life on Earth.
Since there are four possible nucleotides , (A, G, C, and T for adenine, guanine, cytosine, and thymine), each nucleotide (also referred to as a base, or base pair) corresponds to two bits of information. We could translate A to 00, G to 01, T to 10, and C to 11 and then represent any amount of genetic code as a binary number sequence. AGTTC would then be 0001101011. The nucleotides are the symbols of the genetic code.
The Human Genome Project in 2001 released a preliminary report describing the genetic content (or genome) for humans and determined that the human genome contains about 3.3 billion bases of information. By early 2003 the sequence had been 99.9 percent determined. In computer terms this is about 6.6 billion bits or 825 megabytes of data – small enough to fit on your laptop computer’s hard disc. Approximately half of the genome consists of repeat sequences that are highly repetitive and therefore, according to information theory, contain very little information. Some of the repeats are tandem repeats that consist of sequential repetitions of a simple sequence (eg. ATATATATAT…AT). A large amount of the remaining code has no obvious function. Although the sequence has been determined, the actual specific functions of most of the genetic code remain unknown.
Animal genetic code is transmitted in the form of contiguous, sequential, DNA molecules called chromosomes . Humans have 23 chromosomes. Mice have 20. Dogs have 39. Some plants have more than 100. Small objects in the female egg cell called mitochondria that are duplicated in subsequent cells transmit a small percentage of human DNA.
Chromosomes have special sequences on either end called telomeres (in humans, repeats of the sequence TTAGGG). Another special sequence more centrally located (position varies depending on the chromosome) is called the centromere. When a cell divides to form a second cell, the genetic information content is duplicated in a process called mitosis. The telomeres, centromeres, and other structural aspects of chromosomes are known to be essential to the proper duplication of one (and only one) complete set of chromosomes during cell division.
Almost every non-sex cell has two copies of the genome, (two sets of chromosomes) one inherited from each parent. Sperm and egg cells only have one copy.
Mice have a genome of about 3 billion bases (only 10 percent less than humans).
Yeast has a genome of about 12 million bases, 6000 genes on 16 chromosomes.
The bacteria e coli has a genome of 5.6 million bases.
A major genetic curiosity, the microscopic amoeba (Amoeba dubia) has 670 billion bases in its genome!
To further illustrate the information content, the upper and lower case characters in the English alphabet (52 alphabetic characters, 10 numerical digits, and space) could be represented in binary form using 6 bits per character. The phrase “ and seven” would correspond to a binary string of 120 bits and could be expressed in genetic code using 60 bases. (This book contains about 300,000 characters equivalent to 900,000 bases of genetic code.) The probability of duplicating “ and seven” by random combination of bits (as might be done if you had “enough monkeys and typewriters”) is one in 2120 or one in 1036. It would take a very, very, large number of monkeys and typewriters a very long time to randomly duplicate even this very short phrase! A single error might result in “Four scBre and seven”. Several errors could look like “Fouw scory and 8even”.
The reason for this diversion is that geneticists can trace descendency at the species as well as the individual level. If you and any other living thing share a significant sequence of code that is approximately the same then you and the other organism must have had a common ancestor because the chances of a random duplication are impossibly low. Not only can they determine if you are related to your alleged children, they can determine if mice and men had a common ancestor (yes, of course) and can even determine from the number of errors that have crept into the genetic message approximately how long ago humans and mice had a common ancestor (about 50 million years ago). (It is possible for DNA in an organism to be, in effect, “cross-contaminated” with DNA from another organism but this method is considered minor relative to direct inheritance of DNA sequences.)
Errors and Mutations
Errors in copying genetic data are the source of the genetic changes that drive evolution. Some errors, such as in a sequence which controls basic cell design, or oxygen transport, or other crucial process, are almost always immediately fatal and so are immediately “selected out” and do not propagate into the genetic code of descendent organisms. This sort of sequence tends to be “well conserved” after billions of years. Humans share some sequences with yeast that both humans and yeast must have received from a common ancestor. Other sequences that control “how much” (how long a claw, how much fur, etc.) are the source of the variation that drives natural selection. An error in such a sequence could well only cause slight variation of a parameter and only very mildly affect fitness. Finally some sequences (as much as 95 percent of the human genome) have no apparent biological purpose. Changes in such a sequence generally have no effect on the organism and are not selected against at all, thus freely propagating to future generations.
In modern electronic data systems, it is not unusual for errors to occur more or less frequently depending on the pattern of the data. Errors in both electronic and genetic systems can be caused by substitution of an incorrect letter in a sequence and can also be caused by deletion of a letter or insertion of an extra letter.
In the genetic code , which is all about pattern and sequence, it is not surprising that it is also true that the chance for an error is pattern sensitive. For example, humans have a genetic structure called a variable number tandem repeat (VNTR). Copying errors (insertion/deletion errors) which change the length of these repeats are thought to occur virtually every generation. (These are the sequences whose lengths are compared in some types of genetic fingerprinting.) Another illustration of pattern sensitivity are the restriction enzymes. There are many different enzymes which can cause strands of DNA to be physically broken at points where a particular sequence exists. For example the enzyme sgf I causes breaks where the pattern GCGATCGC is encountered.
In the genetic code, occasionally sequences are duplicated. Genes in the duplicated sections can have subsequent copying errors that sometimes result in new, useful genes. Presumably, this is the mechanism whereby a more complex and longer genome can evolve from a simpler one. In human genetic code there is a specific pattern of about 300 bases called the alu element. Alu is repeated about 1 million times in the human genome and is thought to have a significant role in affecting duplications, which in turn, have a significant role in genetic diseases as well as in implementing evolution of the genetic code. Alu elements represent about ten percent of human genetic code, have no known biological function, and are often considered part of “junk” DNA.
Genes perform the actual control of physiological functions. Each chromosome can have thousands of genes. The human genome contains approximately 30,000 genes but the actual number is still unknown.
The structure of the sequence of information representing a gene as seen reading sequentially along a chromosome typically includes regulatory regions at the beginning or end of the gene sequence that determine when and where the gene is activated.
A gene is often thousands of bases in length. The coding region determines which protein will be produced by the gene, that is, the sequence of amino acid molecules which will be constructed to produce a particular protein molecule (often referred to as the gene product). The properties of a protein are determined not only by the number and type of the amino acid molecules used in its construction but also by the particular sequence in which the amino acids are assembled. The long protein molecules tend to “fold up” in very complex ways depending on the particular sequence. This folding and consequent shape of the molecule affects its properties. There are therefore an essentially infinite number of possible different proteins. The largest known human protein has 27,000 amino acids corresponding to 81,000 DNA nucleotides.
A particular three-letter sequence, ATG, is the synchronization pattern denoting the start of a coding sequence; other three letter sequences (known in genetics parlance as codons) denote particular amino acids to be sequenced into a protein and the end of a coding sequence. Since there are 20 possible amino acids and 64 possible codons, some errors in the third symbol of a codon have no effect. For example, CTA, CTG, CTT, and CTC all code for Valine. This is a form of redundancy.
The regulatory regions determine when, where, and how much product will be produced. Some products are only produced in the liver; some are produced only at certain times in an animal’s life, and so on. The regulation involves the detection of chemical signals which can either enhance or inhibit the gene’s expression. Although some genes produce proteins used in the construction of tissue, many, probably a majority, produce products that act as signals to activate or inhibit other genes thus allowing the construction of a very complex regulatory logic framework.
If the regulatory region determines that a gene is activated, the cell starts making copies of the genetic information in the coding region in the form of small RNA molecules with sequences corresponding to the coding region. These messenger RNAs are used as templates by the cell machinery that produces the proper protein molecules. (Sometimes the RNA molecule itself is the gene product and performs some biological function such acting as a signal to other genes.) The genetic code snippets will preferentially adhere to a complementary string of code. “Gene chips” carrying hundreds of samples of potential snippet complements can be used to test for the presence of specific RNAs in a sample. Using such gene chips, researchers can detect the presence of various different RNAs in various tissues and thereby determine which genes were activated. In connection with anti-aging research, detecting the differences in gene activity between a caloric restricted animal and not, or between a progeria victim, and not (see next chapter) could produce valuable clues regarding aging mechanisms.
We can think of a specific “gene” as a message defining a product that accomplishes a particular biological function. Since all multi-cell organisms have a common basic cell design and function it should be no surprise that there are genes that are common to all such organisms. As organisms become more similar they share more commonality. It is estimated that 99 percent of mouse genes have an equivalent human gene that produces a very similar product.
The organization of the genes in the genome tends to be very different between even similar species. Mice have a different number of chromosomes from humans and the equivalent genes are generally in a different order on different chromosomes. Some genes are organized in groups or clusters that are conserved between mice and humans.
Coding regions in the genes of more complex organisms have introns. Introns are portions of the coding regions that are spliced out and deleted from the code during the creation and processing of an RNA. The deletion is caused by patterns at the beginning and end of the intron that match in a particular way. Since the introns are deleted, they have no known biological effect and are often considered “junk” DNA. The remaining (functional) portions of a coding region which are expressed in the RNA and subsequent protein are called exons. Exons are thought to represent only about one percent of human DNA. Introns in humans represent about five percent of DNA. Human genes have an average of five introns and a maximum of 178 introns.
Genes are autonomous data units. They contain their own synchronization patterns and operate somewhat independently. Junk DNA can therefore exist between genes without disturbing their operation. The position (or locus) of a gene within a chromosome or on a particular chromosome generally does not appear to affect the functional operation of the gene. (In communications parlance such an autonomous unit would be referred to as a packet.) (Some specific genes must be located on the sex chromosomes in order to accomplish sex differences between organisms.) If we inject a small loose string of DNA containing a single gene into a cell, the cell will happily produce the gene’s protein product. This approach is used in some forms of gene therapy. However, the loose strand of DNA would not be duplicated during cell division because such duplication requires the gene to be part of a chromosome. Methods for inserting new genes into chromosomes have been developed and are used in genetic engineering. Such a gene would be propagated during cell division and even possibly during reproduction of the organism. While junk DNA and gene location do not affect the functioning of genes they may well have significant evolutionary effects to be described.
All normal humans are thought to have the same genes, specifying the same or nearly the same products, in the same order, on the same chromosomes. Genetic differences between humans are expressed in the exact digital content of their genes, generally minor differences such as single nucleotide substitutions.
Mendelian genetics considers that some genes in a particular species can have two different specific contents or alleles such that two different results occur. Often one allele is represented by a gene that is disabled and therefore produces no functional product, while the other allele is represented by the functioning gene, a binary situation. In practice, some genes can have more than one functioning state and a single gene can therefore have more than two alleles.
A single substitution difference in a coding region exon (for example an A could be replaced with a T) could cause a different protein or RNA product to be produced, which in turn could have a significant effect but could also have a mild or negligible effect. An error in the regulatory region or an error that deletes the start codon or adds a stop codon could cause the gene to become disabled and produce no product. Other errors could have more minor effects such as changing the amount of product produced. Many of the more than 1000 known human genetic diseases as well as most of the normal variations between individuals are caused by such single letter differences in the genome.
In many cases of genetic disease, if one parent’s gene is disabled, the other parent’s corresponding gene provides enough product so that significant symptoms are avoided. The child and the first parent are carriers. If the genes received from both parents are defective, then the child has the recessive genetic disease. If one gene does not provide enough product to avoid symptoms, or if an incorrect and deleterious product is produced, then a defect in either parent’s gene can cause disease symptoms in a dominant genetic disease or other trait.
Many human genes appear to be duplicated, another form of redundancy.
So, by far the most likely possibility in a mutation is a single letter error. It would appear to be ridiculously unlikely that an entire functioning gene could be produced by a random mutation. The significance of this is covered in the section on aging genes.
About half of the human genome consists of repeats of very short (2 – 5 bases) or relatively short (<300 bases) sequences. Since these repeats and other “junk” DNA are between genes or in introns they have no apparent effect on an organism’s function. However, they do have an apparent evolutionary effect in that they influence mechanisms that cause segments of code to be duplicated, copied to another part of the genome, or deleted. Introns appear to have a similar evolutionary effect. The sections of expressed genetic code (exons) between introns appear to correspond to “building blocks” or “modules” that have been used by nature to produce a family of different proteins each of which consists of one or more common modules added to a unique sequence. Although the content and length of introns in a particular gene tends to vary between species, the exons and the number of introns tend to be conserved.
As mentioned, sex cells have only one copy of the chromosomes so that when a sperm and egg cell are united the resulting cell and subsequent cells have a normal complement of two sets of chromosomes. In order to do this, half of the genetic material is not used during the creation of a sperm or egg cell. This process, called meiosis , and other aspects of sexual reproduction are extremely complicated as will be summarized below. The reason for this detour is to demonstrate the enormous difficulty nature has endured in order to produce the maximum possible variation in organisms. These extremely complex evolved mechanisms further validate
’s theory of natural selection by means of natural variation and also lend credibility to adaptive theories of aging as will be explained in Chapter 7. Darwin
In the process of meiosis, one chromosome from each set of two coming from the two parents is randomly selected for transmission in the sperm or egg cell. Humans have 23 different chromosomes (designated as numbers 1 through 22 in order of decreasing length and either “X” or “Y”). Note that the complex selecting mechanism has to guarantee that exactly one of each set of two chromosomes will be transmitted and that we do not possess three of chromosome 1 and none of chromosome 2, etc.
A random sort of human chromosomes would result in 223 or 8,388,608 different possible combinations. Each parent performs such a random shuffle of the chromosomes received from their parents in producing the sperm and egg cells. This would appear to guarantee plenty of variation.
However, it was eventually determined through inheritance studies that reality was actually yet more complicated. If only the chromosomes were shuffled, then inheritance of a gene on a chromosome would be tied to inheritance of another gene on the same chromosome. If you inherited one gene from one parent, you would have to also inherit the other gene from that same parent. (This would make it impossible for nature to sort out the beneficial or adverse effects of different mutations on the same chromosome and therefore drastically limit the process of evolution.) At the same time if the two genes were on different chromosomes, inheriting one would be completely independent of and not affect the chance of inheriting the other because of the random chromosome shuffle.
(The plant traits that Mendel used in his experiments happened to be on different chromosomes. (Plants tend to have many chromosomes.) If this had not been the case he would probably still be trying to make sense of the inheritance patterns as explained below!)
Geneticists discovered that if traits were controlled by genes on different chromosomes the inheritance pattern was, as predicted, completely independent. However, if genes were on the same chromosomes the inheritance of the respective traits ranged from almost independent (inheritance of one trait was random relative to the other) to nearly totally dependent (inheritance of one trait almost always meant inheritance of the other). They deduced that during construction of sex cells (meiosis) one or more contiguous segments of a parent’s chromosome is exchanged (crossed over) with the other parent’s chromosome to make a new chromosome that is a composite of the two parents. The length and position of the swapped segment is almost random. As a result, the probability of inheriting any two genes on a single chromosome from one parent is proportional to the physical distance between the genes on the chromosome. If two genes are physically close, then they almost always would be inherited together, if physically distant, their inheritance would be almost independent. Using this genetic distance principal and mind numbingly tedious inheritance studies, geneticists have been able to determine the approximate physical chromosome location of many genetic disease genes. (This approach is very difficult if the disease or trait is the result of two or more genes.)
In a further complexity, the swapping process apparently only exchanges sequences of code that are nearly identical at least near the ends of the cut sections. This helps ensure that genes are only exchanged with corresponding genes and the progeny does not end up with two of some genes and none of some other genes or inherit partial genes or genes with insertion or deletion errors. However, an error can occur if the end of a cut occurs at a place where two identical code sequences (such as two alu elements) occur in a relatively short stretch of code. The cut might exclude or duplicate the section of code between the identical sections. It is thought that some genetic diseases are in fact caused by these kinds of errors in the crossover process. This unequal crossover mechanism is significant in supporting genome evolution.
Occasionally, humans inherit three copies of chromosome 21 instead of the normal two copies. As a result, 50 percent more of some gene products is made than normal. This in turn results in a genetic disease characterized by mental retardation and physiological abnormalities known as Down syndrome . Other chromosome abnormalities include inheriting less or more than two of any chromosome, swapping of genetic material between different chromosomes, or losing parts of chromosomes. Most such abnormalities cause fetal death or degradation so severe that propagation in a wild population would be impossible. However, the fact the Down syndrome is not immediately fatal despite duplication of hundreds of genes illustrates that duplication of some genes might happen without severe adverse consequences. Duplication of genes is part of the process whereby organisms evolve more complexity.
Animals have special X and Y chromosomes to help manage sexual reproduction . Female humans have two X chromosomes (which are paired and swapped during meiosis like other chromosomes). Males have an X and a Y chromosome. Therefore, progeny always inherit an X chromosome from their female parent and have a 50 percent chance of inheriting an X chromosome from their male parent thus resulting in a 50 percent chance of being either male or female. In humans, the X chromosome is larger and has more genes than the Y chromosome. The gene that triggers “maleness” is on the Y chromosome.
One aspect of this arrangement puzzling to geneticists was how do females avoid having something like Down syndrome . Since females have two copies of chromosome X and males only have one copy, females would appear to have 100 percent more of some gene products than males (or males have 50 percent less than females). (I know that at this point, some men and woman readers will be saying, “that explains a lot” about women or men respectively.)
Eventually, it was determined that, in females, one (and only one) of their two X chromosomes is randomly “inactivated” such that, functionally, females only have one X chromosome. X inactivation is another in a long list of evolved complexities associated with sexual reproduction .
At least in higher animals, a process similar to X inactivation inactivates certain genes as development proceeds. As stem cells differentiate into more specialized cells, some genes are marked as inactivated (genetic imprinting) which partly enables the capability for structural and functional differences in different body cells. The inactivation state of genes is copied during mitosis. So although almost all your cells have all your genes, in most cells some genes are inactivated. This inactivation is removed during meiosis and also when cloning animals from differentiated cells such as skin cells.
Polymorphism and Variation
A polymorphism is a situation in which a characteristic possessed by individuals in the normal population varies. If 90 percent of the flies have black eyes and 10 percent have red eyes, this would be an eye-color polymorphism. “
” is often defined as meaning that at least one percent of the population has the variation. At the genetic level, it is now estimated that normal humans possess genetic codes that are as much as 99.9 percent identical. However, there are an estimated 3 million places in the human genetic code where a single letter is different in some individuals. In such a location, a letter might be “A” in 85 percent of the people and “G” in the remaining 15 percent. These “Single Nucleotide Polymorphisms” or SNPs represent and convey the “normal variation” between humans. Presumably, the number of polymorphisms would be much less if we considered only individuals in a particular race and would be progressively still less if we considered only a particular ethnic group, clan, tribe, or family. Extensive research is now under way to identify which SNPs are associated with which identifiable characteristics and to determine how SNPs vary between races, ethnic groups, and families. Normal
If indeed there are 3 million SNPs, (some estimates are as high as 10 million), then there are 2 3,000,000 possible combinations of those SNPs, a very, very large number of combinations. Every human is therefore unique and expresses a different combination. At the same time, as explained earlier, the probability of inheriting any part of the genetic code with some other part depends on the physical distance (genetic distance) between the two segments on a chromosome. SNPs that were close together on a single chromosome would tend to be inherited together. SNPs that were very close together would tend to be inherited as a unit that would tend not to be divided even after many generations. SNPs that were far apart or were on different chromosomes would be shuffled in every individual. Many SNPs are known to be clustered and their inheritance is therefore complicated. These details regarding the inheritance of variation are critical to theories of evolution such as the selfish gene theory.
A main purpose of this section was to demonstrate that, far from being a fundamental property of life as was plausible in
’s analog world, variation is implemented by a number of very complex evolved biological mechanisms that were necessary to provide variation in the actual genetic digital world. The variation is so extensive that in complex organisms, every individual expresses a unique combination of inherited characteristics. The details that have been revealed fairly recently about the structure of genes, mechanics of inheritance, and the mechanics of genetic diseases are critical to the more recent theories of aging and cast increasing doubt on the traditional theories that were developed before these details were available. Darwin
This brief overview does not begin to convey the complexity of genetics . A typical, college level, genetics textbook is more than 3 inches thick and only provides relatively superficial coverage of what we think we understand about genetics. What we do not yet understand could obviously fill several or even many additional books. Genetics is like a matryoshka doll. Solving one puzzle often leads to yet another puzzle.
The Occam’s razor principle teaches that if two theories fit the facts, then the simplest theory is the most probably correct. In genetics, and biology generally, there seems to be a “reverse Occam’s razor principle” which goes something like: “No matter how complex a biological process or system appears to be, in reality it is probably more complicated.”
Biological Plans and Schedules
Any major project involving the construction of anything, say a house, involves plans and schedules. The plan (in this context) describes the physical locations of various components. There is a window here and a door there. Plans normally do not get too detailed. They do not specify the position of every brick but merely specify that a wall of certain dimensions is to be built of a certain type of bricks. The schedule specifies the time sequence in which the components will be installed. Some tasks can be performed in parallel. Other tasks can only be performed following the performance of some other portion or portions of the work. We cannot install the roof until the underlying structure is installed. We must install the roof before installing materials that would be damaged by rain. More complex portions of the work usually take longer to complete and involve more of these sequential tasks. It is usually beneficial to optimize the schedule to result in finishing the project as rapidly as possible to save time and therefore money.
The growth of a biological organism involves the same kind of processes except that the plan and schedule are genetically transmitted such that an organism constructs itself. Similarly to house construction, there is an obvious competitive benefit to be gained from more rapid development to maturity. In addition, in organism growth, complex structures such as eyes tend to start development earlier. Finally, the genetic plan provides more detail for complex structures. Presumably, much more genetic code is involved in the specification of eyes and brain than large, simple, repetitive structures such as the gluteus maximus.
Although different cells contain the same genetic instructions, different genes are activated in the growth and subsequent life of different cells. This accounts for their physical and functional differences. One mechanism whereby the different activations are implemented is a framework of chemical signals. As an organism grows, cells produce an expanding array of chemical signals that affect activation of genes in subsequent cells that then form different structures and systems. Such signals can either enhance or inhibit gene expression.
Enrico Coen, in his book The Art of Genes, describes how the structure of a fruit fly is determined by this process. In humans and higher animals, at least one additional process, the progressive inactivation of certain genes as stem cells differentiate, also helps determine which genes are activated in a particular cell. The mechanics of this differential inactivation are not yet well understood.
In this connection, we tend to think of chemical signals such as hormones circulating throughout the body. However, depending on their solubility, diffusion characteristics, and other attributes, signals can be very local. Coen describes experiments in which transferring material from one part of a fly embryo to another resulted in the development of two-headed embryos and other structural abnormalities depending on the circumstances. Many signals are internal to individual cells and are even local to particular locations within a cell.
Even less well understood are the mechanics of biological scheduling. It is clear that development of complex organisms involves complex scheduling functions and that optimization of the schedule has an evolutionary benefit. Even at the cell level, there are many processes that must occur in a particular order. Chemical signals such as hormones and RNAs clearly play a role in scheduling of animal growth. The completion of a task often results in production of a chemical signal that activates genes to begin the next task and inhibits the genes that performed the completed function. Presumably, the scheduling aspects of an organism’s development evolved in parallel with its structural and behavioral aspects.
The need for gene activation and deactivation and more specifically for a scheduling system to handle sequential biological activities has implications for aging theory. “Mature adult” would presumably be the last stage in the life of a non-aging animal. There would be no need for the “schedule” to extend beyond “mature adult”. We would therefore not expect to see hormone levels change as a function of age beyond full maturity and the end of the schedule. A mutation to a gene can clearly cause a problem at any one of the points in the biological schedule where the mutated gene would have been activated. A disabling mutation to a gene which is normally active in, say, childhood development, would cause a problem when the person reached that stage in life. But how could we have a mutation to a normal gene which only causes a problem after the end of the schedule. By definition, there should not be any genes that are only activated after the end of the schedule. By definition, there should not be any way to activate genes only after the end of the schedule. Mutations to a gene should cause problems during the schedule but not after the end of the schedule. This would appear to be a major problem for traditional theories of aging which depend on the idea that random mutations exist which only cause a problem in older individuals or that genes can exist that have beneficial properties in youth but are adverse in older animals.
Chickens and Eggs
It is apparent from the discussion of genetic mechanisms that many issues along the lines of “which came first, chicken or egg” apply. It does not appear to make any difference what three letter sequence “codes for” a particular amino acid but the mechanism that reads the genetic code and assembles the amino acids must have the same rules that were used to write the code. For the system to work, the receiver of the message and the associated mechanism must have the same understanding as to the meaning of the various codons as the transmitter. (Some organisms have been found that have slightly different correspondence between codon sequence and amino acid.)
The myriad chemical signals involved in gene regulation represent a similar situation. Each signal that is sent has no meaning unless receivers for that particular signal exist. The receivers have no function unless signals are being sent. Evolution of such systems must take an extremely long time and be a very incremental process. Thus, genetics further validates
’s idea that evolution occurs incrementally. Darwin
Evolutionary Genetic Processes
We can see from the foregoing that there are at least five separate processes involved in the evolutionary modification of the genetic code of organisms.
The first and most rapid process is the shuffling of the variable elements (i.e. single nucleotide polymorphisms) of the code through crossover and meiosis. Every individual animal represents a different combination of the variable elements. We can observe variation by looking at a single generation.
In the second process, natural selection or selective breeding increases or decreases the population density of specific variations. Eventually a particular variable element allele could be eliminated from the population or become universal. However, natural selection or selective breeding cannot alter the perhaps 99+ percent of the genetic code that does not vary between individuals. Natural selection or selective breeding also cannot create variable code elements that do not already exist. We can easily observe the effects of this second process in breeding experiments or in observations of domesticated species.
In the third process, errors in copying code (mutations) introduce new variable code elements in existing genes (e.g. single nucleotide substitutions). Presumably, because of the digital nature of the genetic system, many such new variations are sufficiently deleterious as to result in their being fairly immediately selected out. Some have fitness effects that are initially either positive, neutral, or sufficiently mildly negative so they avoid being selected out and eventually spread in the population to become polymorphisms which can participate in the second process. An error that causes a non-duplicated gene to have an entirely different protein product would appear to be unlikely to propagate even if it resulted in a potentially useful product because the original function of the gene and its benefit would be lost. A simple mutation such as a single letter substitution cannot alter a gene that does not already exist and therefore is limited in the scope of the changes it could cause in the design or behavior of an organism. In order to increase in complexity, an organism would presumably need to have more genes, not just changes to existing genes.
In the fourth process, entirely new genes are created by means of copying errors in which entire genes are duplicated. The third process could then differentially modify the two genes such that they produce different proteins and thereby have more substantially different functions. In this way, additional genetic functions can be produced.
In the fifth process, genes are moved, or transposed, to different positions in the genome. Although such transposition does not affect the gene’s function it does, because of the genetic distance principal, alter the inheritance patterns of genes and thereby alter evolution.
Speciation appears to be more dependent on organization of the genome than on content. It is clear that the mechanics of sexual reproduction including chromosome pairing, meiosis, and gene crossover depend heavily on a very high degree of similarity in the genetic organization (such as the number of chromosomes and order of genes on chromosomes) of the parents. Wild animals are observed that are nearly identical but nevertheless belong to different species. Domestic animals (e.g. dogs) are observed to be drastically different but nevertheless belong to the same species. Speciation has a dramatic effect on the process of evolution because it prevents transmission of genetic characteristics to coexisting species.
Every species presumably inherits the vast majority of its genes from its ancestor species, some from very distant ancestors. Most of the genetic code therefore has a longer lifetime than the lifetime of any individual species.
Here are some examples of the potential complex interactions in these processes that are disclosed by our current fragmentary understanding of genetics.
Although alu elements have no known biological function, they could have a significant effect on the evolution of genetic code. The presence of alu elements in a particular region of genetic code increases the chance for subsequent duplications or deletions of code sequences (during meiosis and crossover) in that region and therefore affects the fourth process.
Although introns in gene coding regions have no known biological function, the presence of introns could also affect the probability of and process of duplication. An alu within an intron could affect the probability that part of a coding region might be duplicated or deleted, thus affecting the fourth process.
In addition to alu elements there are many other repeat patterns that could have similar effects on evolution of the genetic code.
We have to believe that the survival value of most variations are dependent on other variations. For example, a larger eyeball might be beneficial but only if accompanied by a larger eye socket. Now presumably there are genes that affect the size of the whole animal, there are genes that affect the size of the head relative to the rest of the animal, and there are genes that affect the size of the eyeball relative to the rest of the head. Similarly, physical characteristics of organisms must be matched by appropriate behaviors and behaviors must be matched by appropriate neurological systems. It is clear that evolution would be assisted if inheritance of certain genes was associated with inheritance of certain other genes such that, for example, eye size tended to be associated with eye socket size. We know that the degree of such association depends on the relative physical location of the genes in the genetic code of a chromosome. It “boggles the mind” to contemplate how long it could take to achieve these kinds of associations through the process of gene copying and transposition. Presumably, many such associations as well as their underlying genes are inherited from ancestor species and have long lifetimes relative to a “species lifetime”.
Because mutations in junk DNA presumably have no biological effect, such mutations can propagate easily through a population. The presence of these mutations could then affect the probability of subsequent mutations through various forms of “pattern sensitivity” and processes such as described above and thereby have significant long-term effect on the evolution of that species’ genetic code.
It is estimated that only about 1 percent of human DNA is in the form of gene exons. If we include in “functional DNA” all the regulatory regions, critical leading and trailing patterns in introns that cause them to be introns, the patterns in telomeres and centromeres that cause them to function, and all the other DNA that seems to have some fitness effect, the total functional DNA is probably less than 5 percent of the total genome.
Suppose we were to rearrange the genome of a mouse. We could take the same mouse genes (excepting the sex chromosomes) and place them in different positions on different chromosomes. We could even equip our new mouse with a different number of chromosomes. Because the new mouse has the same genes, the mice in a population of new mice should be physically indistinguishable from old mice. They should have the same fitness as old mice. The only difference is the order in which the genes are sequenced in their chromosomes.
We can make some more changes. We could add introns to some genes and delete introns in other genes. We can change the specific internal sequences of introns. We can add, or delete, or change the content, of other junk DNA. As far as is now known, none of these changes would affect the appearance, behavior, or fitness of the new mice relative to the old mice.
However, it is clear that we might be drastically altering the ability of the new mouse to adapt and evolve. Since genes are in a different order, genes that formerly tended to be inherited as a group because of short genetic distance would now be independent. Other genes formerly on different chromosomes could now be in clusters. Protein “modules” formerly available because of intron structure would no longer be available. Although the new mouse is physically and functionally identical to the old mouse, the mechanics of evolution available to the new mouse would appear to be significantly different. Because of the major differences in genome organization, the new mice would be unable to interbreed with old mice. The new mice would be members of a different, though physically identical, species.
Because of the large differences in genome organization between similar species (e.g. mammals) it is clear that the organization of the genome evolves. This evolution must necessarily be extremely “incremental” (more “tiny steps”) in order to maintain the ability of individual members to interbreed.
This entire scenario appears to be incompatible with classical Darwinism as follows.
’s theory holds that mutations that are beneficial to an organism increase its fitness and mutations that are adverse decrease its fitness. Mutations with positive or zero fitness impact can eventually become widely distributed. Period. In the above discussion we have identified a whole family of different types of mutational changes which have no immediate fitness effect but which plausibly benefit or detract from the ability of the organism to subsequently adapt through evolution by reducing or increasing the probability that certain types of subsequent mutation can occur. In effect, these mutations affect the future of the organism in terms of the descendent species it might produce or the “evolution of its species”. Although such mutations, either beneficial or adverse, could, (since they are fitness neutral), spread through the population of a species and could be transmitted to any descendent species, they cannot spread to co-existing species. This suggests that “survival of the species”, that is, species that produce descendent species or which “evolve”, as opposed to becoming static or extinct could play a much more important role relative to “survival of the fittest individual” than contemplated by classical Darwinism. Darwin
The purpose of this chapter has been to illustrate the complexity that has appeared as we discovered more about the mechanisms whereby evolution of genetic codes actually occurs.
’s analog world is very simple when compared to the digital reality. Breeding experiments and heredity studies are generally confined to exploring variation and natural selection. Variation and natural selection are essentially the easily observable “tip of the iceberg” regarding the mechanics of genetic code evolution. The time scales of these different processes differ enormously. The selection of a trait that was represented in variations could take a relatively few generations. Other traits, produced or affected by non-variable parts of the genome code could be conserved for millions or billions of years. Details of the mechanics of the third, fourth, and fifth processes could explain why Darwin ’s theory does not work for aging and other troublesome animal characteristics. Specifically, it appears that evolution could involve much longer times and more complex processes than contemplated by orthodox Darwinism and that therefore the importance of “individual” fitness could be less than considered by Darwin . At the same time, knowledge of these complex processes supports Darwin ’s determination that sudden massive mutations were unlikely to have a significant role in evolution. Darwin
Copyright © 2004 Theodore C. Goldsmith