Chapter 2

Genetics

Mendelian genetics, population genetics, and genomics.

Genetics

Genetics is the study of heredity and genetic variation in living organisms. It encompasses the mechanisms by which traits are passed from parents to offspring, the mathematical principles governing genetic frequencies in populations, and the comprehensive analysis of entire genomes. Understanding genetics is fundamental to biotechnology, medicine, agriculture, and evolutionary biology.

Mendelian Genetics

Historical Foundations

Gregor Mendel (1866) established the fundamental principles of inheritance through his work with pea plants. His laws form the foundation of modern genetics:

Law of Segregation

Each diploid organism has two alleles for each geneThese alleles segregate during gamete formationEach gamete receives one allele\text{Each diploid organism has two alleles for each gene} \\ \text{These alleles segregate during gamete formation} \\ \text{Each gamete receives one allele}

Law of Independent Assortment

Alleles of different genes assort independently during gamete formation(Provided genes are on different chromosomes)\text{Alleles of different genes assort independently during gamete formation} \\ \text{(Provided genes are on different chromosomes)}

Basic Genetic Concepts

Alleles and Genotypes

  • Allele: Alternative forms of a gene (A, a)
  • Homozygous: Two identical alleles (AA, aa)
  • Heterozygous: Two different alleles (Aa)
  • Genotype: Genetic constitution (AA, Aa, aa)
  • Phenotype: Observable characteristics

Monohybrid Crosses

For a single gene with complete dominance:

P0:AA×aaF1:Aa (all dominant phenotype)F1×F1:Aa×AaF2:1AA:2Aa:1aa (3:1 phenotypic ratio)P_0: AA \times aa \rightarrow F_1: Aa \text{ (all dominant phenotype)} \\ F_1 \times F_1: Aa \times Aa \rightarrow F_2: 1AA : 2Aa : 1aa \text{ (3:1 phenotypic ratio)}

Dihybrid Crosses

For two independently assorting genes:

Expected phenotypic ratio=9:3:3:1For genes on same chromosome, ratio depends on recombination frequency\text{Expected phenotypic ratio} = 9:3:3:1 \\ \text{For genes on same chromosome, ratio depends on recombination frequency}

Deviations from Mendelian Ratios

Incomplete Dominance

RR (red)×WW (white)RW (pink)RR \text{ (red)} \times WW \text{ (white)} \rightarrow RW \text{ (pink)}

Codominance

IAIA (A blood type)×IBIB (B blood type)IAIB (AB blood type)I^A I^A \text{ (A blood type)} \times I^B I^B \text{ (B blood type)} \rightarrow I^A I^B \text{ (AB blood type)}

Multiple Alleles

Human ABO blood groups: IA,IB,i (3 alleles at one locus)\text{Human ABO blood groups: } I^A, I^B, i \text{ (3 alleles at one locus)}

Epistasis

Interaction between genes at different lociExample: 9:3:4 ratio in mice coat color (B/b and C/c genes)\text{Interaction between genes at different loci} \\ \text{Example: 9:3:4 ratio in mice coat color (B/b and C/c genes)}

Population Genetics

Hardy-Weinberg Equilibrium

The Hardy-Weinberg principle describes the genetic equilibrium in an ideal population:

p2+2pq+q2=1p^2 + 2pq + q^2 = 1

Where:

  • pp = frequency of dominant allele
  • qq = frequency of recessive allele
  • p2p^2 = frequency of homozygous dominant genotype
  • 2pq2pq = frequency of heterozygous genotype
  • q2q^2 = frequency of homozygous recessive genotype

Assumptions for Equilibrium

  1. No mutations
  2. No gene flow
  3. Large population size
  4. Random mating
  5. No natural selection

Testing Hardy-Weinberg Equilibrium

Chi-square Test

χ2=(OE)2E\chi^2 = \sum \frac{(O-E)^2}{E}

Where OO is observed and EE is expected frequency.

Factors Disrupting Equilibrium

Mutation

  • Forward mutation: AaA \rightarrow a at rate μ\mu
  • Reverse mutation: aAa \rightarrow A at rate ν\nu
  • Equilibrium: p^=νμ+ν\hat{p} = \frac{\nu}{\mu + \nu}

Migration (Gene Flow)

pt=ps+m(pspt)p_t = p_s + m(p_s - p_t)

Where mm is migration rate, psp_s is source population frequency, ptp_t is target population frequency.

Genetic Drift

Change in allele frequency due to random samplingMore pronounced in small populationsVariance in allele frequency=p(1p)2N\text{Change in allele frequency due to random sampling} \\ \text{More pronounced in small populations} \\ \text{Variance in allele frequency} = \frac{p(1-p)}{2N}

Where NN is effective population size.

Natural Selection

Fitness and Selection Coefficient
  • Fitness (W): Relative reproductive success
  • Selection coefficient (s): s=1Ws = 1 - W
Selection Models
  • Selection against recessive: AA:Aa:aa=1:1:(1s)AA: Aa: aa = 1 : 1 : (1-s)
  • Selection against dominant: AA:Aa:aa=(1s):(1hs):1AA: Aa: aa = (1-s) : (1-hs) : 1 (h = dominance coefficient)

Quantitative Genetics

Polygenic Inheritance

For traits controlled by multiple genes:

Phenotypic variance=Genetic variance+Environmental varianceVP=VG+VE\text{Phenotypic variance} = \text{Genetic variance} + \text{Environmental variance} \\ V_P = V_G + V_E

Components of Genetic Variance

  • Additive variance (VAV_A): Due to average allelic effects
  • Dominance variance (VDV_D): Due to interaction within loci
  • Epistatic variance (VIV_I): Due to interaction between loci

Heritability

Narrow-sense heritability

h2=VAVPh^2 = \frac{V_A}{V_P}

Broad-sense heritability

H2=VGVPH^2 = \frac{V_G}{V_P}

Response to Selection

R=h2SR = h^2 S

Where RR is response to selection and SS is selection differential.

Genomics

Genome Structure and Organization

Prokaryotic Genomes

  • Size: Usually 0.5-10 Mb
  • Structure: Single circular chromosome
  • Gene density: High (85-90% coding)

Eukaryotic Genomes

  • Size: Variable (yeast: 12 Mb, humans: 3,200 Mb)
  • Structure: Linear chromosomes in nucleus
  • Gene density: Lower (only ~1.5% coding in humans)

Genomic Technologies

DNA Sequencing

Sanger Sequencing
Chain termination method using dideoxynucleotidesRead length: 600-1000 bp\text{Chain termination method using dideoxynucleotides} \\ \text{Read length: 600-1000 bp}
Next-Generation Sequencing (NGS)
Parallel sequencing of millions of fragmentsRead lengths: 50-300 bp (Illumina), 8,000+ bp (PacBio)\text{Parallel sequencing of millions of fragments} \\ \text{Read lengths: 50-300 bp (Illumina), 8,000+ bp (PacBio)}

Genotyping Technologies

  • SNP arrays: High-throughput genotyping
  • Microarrays: Gene expression profiling
  • Whole-genome sequencing: Complete genome analysis

Comparative Genomics

Synteny Analysis

Conservation of gene order between speciesIdentifies evolutionary relationships\text{Conservation of gene order between species} \\ \text{Identifies evolutionary relationships}

Phylogenetics

Reconstruct evolutionary relationshipsMethods: distance-based, maximum parsimony, maximum likelihood, Bayesian\text{Reconstruct evolutionary relationships} \\ \text{Methods: distance-based, maximum parsimony, maximum likelihood, Bayesian}

Modern Genomic Applications

Whole Genome Analysis

Annotation

  • Gene prediction: Identify protein-coding regions
  • Functional assignment: Assign functions to genes
  • Regulatory element identification: Promoters, enhancers, etc.

Variation Analysis

  • SNPs: Single nucleotide polymorphisms
  • Indels: Insertions/deletions
  • CNVs: Copy number variations
  • Structural variants: Large-scale rearrangements

Functional Genomics

Gene Expression Analysis

RNA-seqQuantify transcript abundanceDEG analysisDifferentially expressed genes\text{RNA-seq} \rightarrow \text{Quantify transcript abundance} \\ \text{DEG analysis} \rightarrow \text{Differentially expressed genes}

Epigenomics

  • DNA methylation: Gene expression regulation
  • ChIP-seq: Protein-DNA interactions
  • ATAC-seq: Chromatin accessibility

Population Genomics

Genetic Diversity Measures

Heterozygosity

He=1pi2H_e = 1 - \sum p_i^2

Where pip_i is frequency of allele ii.

Nucleotide Diversity

π=1n(n1)/2i<jdij\pi = \frac{1}{n(n-1)/2} \sum_{i<j} d_{ij}

Where dijd_{ij} is number of differences between sequences ii and jj.

Population Structure

F-statistics (Fixation Indices)

  • FISF_{IS}: Inbreeding within subpopulations
  • FSTF_{ST}: Differentiation between subpopulations
  • FITF_{IT}: Total inbreeding in total population
FST=HTHSHTF_{ST} = \frac{H_T - H_S}{H_T}

Where HTH_T is total heterozygosity and HSH_S is subpopulation heterozygosity.

Genome-Wide Association Studies (GWAS)

Statistical analysis to identify genetic variants associated with traitsCan identify loci contributing to complex diseases\text{Statistical analysis to identify genetic variants associated with traits} \\ \text{Can identify loci contributing to complex diseases}

Applications in Medicine

Medical Genetics

Single-Gene Disorders

  • Autosomal dominant: Huntington's disease, Marfan syndrome
  • Autosomal recessive: Cystic fibrosis, sickle cell anemia
  • X-linked: Duchenne muscular dystrophy, hemophilia

Complex Diseases

  • Multifactorial: Diabetes, heart disease, cancer
  • Polygenic: Continuous distribution of risk

Pharmacogenomics

Genetic basis for individual drug responsesExample: CYP2D6 polymorphisms affect drug metabolism\text{Genetic basis for individual drug responses} \\ \text{Example: CYP2D6 polymorphisms affect drug metabolism}

Computational Tools

Genetic Analysis Software

  • PLINK: Whole-genome association analysis
  • BEAGLE: Genotype imputation
  • STRUCTURE: Population structure analysis
  • PhyML, RAxML: Phylogenetic tree construction

Real-World Application: Population Bottleneck Analysis

Population bottlenecks significantly impact genetic diversity and can be studied using population genetics principles.

Bottleneck Analysis

# Population bottleneck and genetic drift analysis
population_data = {
    'initial_size': 10000,    # N0 (original population size)
    'bottleneck_size': 50,    # Nb (size during bottleneck)
    'bottleneck_duration': 5, # generations
    'recovery_time': 100,     # generations since recovery
    'mutation_rate': 2.5e-8,  # per site per generation
    'current_size': 50000     # N1 (current population size)
}

# Calculate heterozygosity after bottleneck
# Formula: Ht = H0 * (1 - 1/2N)^t
# Where N is harmonic mean of population sizes

# Harmonic mean calculation during bottleneck
N_harmonic = 1 / ((population_data['bottleneck_duration'] / population_data['bottleneck_size']) + 
                  ((population_data['recovery_time']) / population_data['current_size']))

# Effective population size over entire period
generations_total = population_data['bottleneck_duration'] + population_data['recovery_time']
N_e = generations_total / ((population_data['bottleneck_duration'] / population_data['bottleneck_size']) + 
                           (population_data['recovery_time'] / population_data['current_size']))

# Expected heterozygosity after bottleneck
H0 = 0.0005  # Initial heterozygosity
Ht = H0 * (1 - 1/(2 * N_e)) ** generations_total

# Calculate nucleotide diversity reduction
# Expected reduction: π_post = π_pre * (1 - 1/(2*Nb))^t_bottleneck
pi_reduction = (1 - 1/(2 * population_data['bottleneck_size'])) ** population_data['bottleneck_duration']

# Effective number of founding individuals (genetic perspective)
# Using: Nb = (4*Nm*Nf) / (Nm + Nf) where Nm=male, Nf=female
# Simplified: assuming equal sex ratio for bottleneck
effective_founders = (4 * (population_data['bottleneck_size']/2) * (population_data['bottleneck_size']/2)) / population_data['bottleneck_size']

print(f"Population bottleneck analysis:")
print(f"  Original size: {population_data['initial_size']:,}")
print(f"  Bottleneck size: {population_data['bottleneck_size']}")
print(f"  Bottleneck duration: {population_data['bottleneck_duration']} generations")
print(f"  Effective population size (harmonic mean): {N_e:.1f}")
print(f"  Expected heterozygosity reduction factor: {(1 - Ht/H0):.3f}")
print(f"  Nucleotide diversity reduction: {(1 - pi_reduction):.3f}")
print(f"  Effective founding individuals: {effective_founders:.1f}")

# Calculate time to recover original diversity level
# Approximate: generations to restore diversity = 4 * Ne
recovery_generations = 4 * N_e
print(f"  Approximate generations to recover original diversity: {recovery_generations:.0f}")

# Interpretation
if effective_founders < 100:
    bottleneck_severity = "Severe - significant genetic drift expected"
elif effective_founders < 500:
    bottleneck_severity = "Moderate - some genetic drift"
else:
    bottleneck_severity = "Mild - minimal genetic drift impact"

print(f"  Bottleneck severity: {bottleneck_severity}")

Conservation Genetics Implications

Understanding population bottlenecks helps in conservation biology and species management.


Your Challenge: Hardy-Weinberg Analysis

Analyze genetic data to determine if a population is in Hardy-Weinberg equilibrium and calculate evolutionary parameters.

Goal: Use population genetics principles to analyze genetic data and assess evolutionary forces.

Population Data

import math

# SNP genotyping data for a population
genotype_counts = {
    'AA': 420,    # Number of homozygous dominant individuals
    'Aa': 480,    # Number of heterozygous individuals  
    'aa': 100     # Number of homozygous recessive individuals
}

# Calculate total individuals and allele frequencies
total_individuals = genotype_counts['AA'] + genotype_counts['Aa'] + genotype_counts['aa']
total_alleles = 2 * total_individuals

# Calculate allele frequencies
p = (2 * genotype_counts['AA'] + genotype_counts['Aa']) / total_alleles  # freq of A allele
q = (2 * genotype_counts['aa'] + genotype_counts['Aa']) / total_alleles  # freq of a allele

# Expected genotype frequencies under HWE
expected_AA = p**2 * total_individuals
expected_Aa = 2 * p * q * total_individuals
expected_aa = q**2 * total_individuals

# Chi-square test
observed_values = [genotype_counts['AA'], genotype_counts['Aa'], genotype_counts['aa']]
expected_values = [expected_AA, expected_Aa, expected_aa]

chi_square = sum([(O - E)**2 / E for O, E in zip(observed_values, expected_values)])

# Degrees of freedom = number of genotypes - 1 - number of estimated parameters
# For 3 genotypes with 1 estimated parameter (p), df = 3 - 1 - 1 = 1
df = 1

# Calculate FIS (inbreeding coefficient)
# FIS = (He - Ho) / He, where He is expected heterozygosity and Ho is observed heterozygosity
He = 2 * p * q  # Expected heterozygosity under HWE
Ho = genotype_counts['Aa'] / total_individuals  # Observed heterozygosity
FIS = (He - Ho) / He

# Calculate other population genetics parameters
heterozygosity_reduction = 1 - (Ho / He)  # Reduction from expected
allele_balance = abs(p - q)  # Difference in allele frequencies

# Assess evolutionary forces
evolutionary_forces = []
if abs(FIS) > 0.1:
    if FIS > 0:
        evolutionary_forces.append("Inbreeding")
    else:
        evolutionary_forces.append("Outbreeding")
if chi_square > 3.84:  # Critical value for p=0.05 with df=1
    evolutionary_forces.append("Deviates from HWE")

Analyze the population genetics data and determine the evolutionary forces at play.

Hint:

  • Calculate observed vs. expected genotype frequencies
  • Perform chi-square test to assess Hardy-Weinberg equilibrium
  • Calculate FIS to assess inbreeding/outbreeding
  • Consider the implications of observed allele frequencies
# TODO: Calculate genetics parameters
allele_frequency_A = 0    # Frequency of allele A
allele_frequency_a = 0    # Frequency of allele a
inbreeding_coefficient = 0  # FIS value
chi_square_statistic = 0   # Chi-square test result
hwe_status = ""            # In or out of equilibrium
evolutionary_force = ""    # Type of evolutionary pressure

# Calculate allele frequencies from counts
total_alleles = 2 * (genotype_counts['AA'] + genotype_counts['Aa'] + genotype_counts['aa'])
allele_frequency_A = (2 * genotype_counts['AA'] + genotype_counts['Aa']) / total_alleles
allele_frequency_a = (2 * genotype_counts['aa'] + genotype_counts['Aa']) / total_alleles

# Calculate expected frequencies under HWE
expected_AA = allele_frequency_A**2 * total_individuals
expected_Aa = 2 * allele_frequency_A * allele_frequency_a * total_individuals
expected_aa = allele_frequency_a**2 * total_individuals

# Calculate observed heterozygosity
observed_het = genotype_counts['Aa'] / total_individuals
expected_het = 2 * allele_frequency_A * allele_frequency_a

# Calculate FIS (inbreeding coefficient)
inbreeding_coefficient = (expected_het - observed_het) / expected_het

# Calculate chi-square statistic
chi_square_statistic = ((genotype_counts['AA'] - expected_AA)**2 / expected_AA + 
                        (genotype_counts['Aa'] - expected_Aa)**2 / expected_Aa + 
                        (genotype_counts['aa'] - expected_aa)**2 / expected_aa)

# Assess HWE status (critical value for df=1 at alpha=0.05 is 3.84)
if chi_square_statistic > 3.84:
    hwe_status = "Out of equilibrium"
else:
    hwe_status = "In equilibrium"

# Determine evolutionary force based on FIS and HWE
if inbreeding_coefficient > 0.1:
    evolutionary_force = "Genetic drift or inbreeding"
elif inbreeding_coefficient < -0.1:
    evolutionary_force = "Outbreeding or selection"
else:
    evolutionary_force = "Random mating (no significant force)"

# Print results
print(f"Allele frequency A: {allele_frequency_A:.3f}")
print(f"Allele frequency a: {allele_frequency_a:.3f}")
print(f"Inbreeding coefficient (FIS): {inbreeding_coefficient:.3f}")
print(f"Chi-square statistic: {chi_square_statistic:.2f}")
print(f"Hardy-Weinberg status: {hwe_status}")
print(f"Dominant evolutionary force: {evolutionary_force}")

# Additional population genetics assessment
heterozygosity_ratio = observed_het / expected_het
if heterozygosity_ratio < 0.9:
    diversity_status = "Reduced heterozygosity"
elif heterozygosity_ratio > 1.1:
    diversity_status = "Increased heterozygosity"
else:
    diversity_status = "Normal heterozygosity"
    
print(f"Heterozygosity status: {diversity_status}")

How might the population's genetic structure be affected if the observed deviation from Hardy-Weinberg equilibrium is due to population subdivision?

ELI10 Explanation

Simple analogy for better understanding

Think of genetics like studying the family tree of traits - how characteristics get passed down from parents to children through the generations. Just like you might have your grandmother's eyes or your father's height, genetics explains how traits are inherited through tiny instructions called genes. It's like understanding the rules of a game where parents each contribute half of the 'cards' to their children, and sometimes certain 'cards' are more likely to show up than others. Genetics helps us understand not just how you inherit your traits, but also how traits can change over time in whole populations, and how we can read the entire instruction manual (genome) to understand all the possible traits and characteristics.

Self-Examination

Q1.

What are the key differences between Mendelian and polygenic inheritance patterns?

Q2.

How do Hardy-Weinberg equilibrium principles apply to population genetics?

Q3.

What are the main approaches used in modern genomics?