Genetics

Genetics is the study of heredity and genetic variation in living organisms. It encompasses the mechanisms by which traits are passed from parents to offspring, the mathematical principles governing genetic frequencies in populations, and the comprehensive analysis of entire genomes. Understanding genetics is fundamental to biotechnology, medicine, agriculture, and evolutionary biology.

Mendelian Genetics

Historical Foundations

Gregor Mendel (1866) established the fundamental principles of inheritance through his work with pea plants. His laws form the foundation of modern genetics:

Law of Segregation

\text{Each diploid organism has two alleles for each gene} \\ \text{These alleles segregate during gamete formation} \\ \text{Each gamete receives one allele}

Law of Independent Assortment

\text{Alleles of different genes assort independently during gamete formation} \\ \text{(Provided genes are on different chromosomes)}

Basic Genetic Concepts

Alleles and Genotypes

Allele: Alternative forms of a gene (A, a)
Homozygous: Two identical alleles (AA, aa)
Heterozygous: Two different alleles (Aa)
Genotype: Genetic constitution (AA, Aa, aa)
Phenotype: Observable characteristics

Monohybrid Crosses

For a single gene with complete dominance:

P_0: AA \times aa \rightarrow F_1: Aa \text{ (all dominant phenotype)} \\ F_1 \times F_1: Aa \times Aa \rightarrow F_2: 1AA : 2Aa : 1aa \text{ (3:1 phenotypic ratio)}

Dihybrid Crosses

For two independently assorting genes:

\text{Expected phenotypic ratio} = 9:3:3:1 \\ \text{For genes on same chromosome, ratio depends on recombination frequency}

Deviations from Mendelian Ratios

Incomplete Dominance

RR \text{ (red)} \times WW \text{ (white)} \rightarrow RW \text{ (pink)}

Codominance

I^A I^A \text{ (A blood type)} \times I^B I^B \text{ (B blood type)} \rightarrow I^A I^B \text{ (AB blood type)}

Multiple Alleles

\text{Human ABO blood groups: } I^A, I^B, i \text{ (3 alleles at one locus)}

Epistasis

\text{Interaction between genes at different loci} \\ \text{Example: 9:3:4 ratio in mice coat color (B/b and C/c genes)}

Population Genetics

Hardy-Weinberg Equilibrium

The Hardy-Weinberg principle describes the genetic equilibrium in an ideal population:

p^2 + 2pq + q^2 = 1

Where:

$p$ = frequency of dominant allele
$q$ = frequency of recessive allele
$p^2$ = frequency of homozygous dominant genotype
$2pq$ = frequency of heterozygous genotype
$q^2$ = frequency of homozygous recessive genotype

Assumptions for Equilibrium

No mutations
No gene flow
Large population size
Random mating
No natural selection

Testing Hardy-Weinberg Equilibrium

Chi-square Test

\chi^2 = \sum \frac{(O-E)^2}{E}

Where $O$ is observed and $E$ is expected frequency.

Factors Disrupting Equilibrium

Mutation

Forward mutation: $A \rightarrow a$ at rate $\mu$
Reverse mutation: $a \rightarrow A$ at rate $\nu$
Equilibrium: $\hat{p} = \frac{\nu}{\mu + \nu}$

Migration (Gene Flow)

p_t = p_s + m(p_s - p_t)

Where $m$ is migration rate, $p_s$ is source population frequency, $p_t$ is target population frequency.

Genetic Drift

\text{Change in allele frequency due to random sampling} \\ \text{More pronounced in small populations} \\ \text{Variance in allele frequency} = \frac{p(1-p)}{2N}

Where $N$ is effective population size.

Natural Selection

Fitness and Selection Coefficient

Fitness (W): Relative reproductive success
Selection coefficient (s): $s = 1 - W$

Selection Models

Selection against recessive: $AA: Aa: aa = 1 : 1 : (1-s)$
Selection against dominant: $AA: Aa: aa = (1-s) : (1-hs) : 1$ (h = dominance coefficient)

Quantitative Genetics

Polygenic Inheritance

For traits controlled by multiple genes:

\text{Phenotypic variance} = \text{Genetic variance} + \text{Environmental variance} \\ V_P = V_G + V_E

Components of Genetic Variance

Additive variance ( $V_A$ ): Due to average allelic effects
Dominance variance ( $V_D$ ): Due to interaction within loci
Epistatic variance ( $V_I$ ): Due to interaction between loci

Heritability

Narrow-sense heritability

h^2 = \frac{V_A}{V_P}

Broad-sense heritability

H^2 = \frac{V_G}{V_P}

Response to Selection

R = h^2 S

Where $R$ is response to selection and $S$ is selection differential.

Genomics

Genome Structure and Organization

Prokaryotic Genomes

Size: Usually 0.5-10 Mb
Structure: Single circular chromosome
Gene density: High (85-90% coding)

Eukaryotic Genomes

Size: Variable (yeast: 12 Mb, humans: 3,200 Mb)
Structure: Linear chromosomes in nucleus
Gene density: Lower (only ~1.5% coding in humans)

Genomic Technologies

DNA Sequencing

Sanger Sequencing

\text{Chain termination method using dideoxynucleotides} \\ \text{Read length: 600-1000 bp}

Next-Generation Sequencing (NGS)

\text{Parallel sequencing of millions of fragments} \\ \text{Read lengths: 50-300 bp (Illumina), 8,000+ bp (PacBio)}

Genotyping Technologies

SNP arrays: High-throughput genotyping
Microarrays: Gene expression profiling
Whole-genome sequencing: Complete genome analysis

Comparative Genomics

Synteny Analysis

\text{Conservation of gene order between species} \\ \text{Identifies evolutionary relationships}

Phylogenetics

\text{Reconstruct evolutionary relationships} \\ \text{Methods: distance-based, maximum parsimony, maximum likelihood, Bayesian}

Modern Genomic Applications

Whole Genome Analysis

Annotation

Gene prediction: Identify protein-coding regions
Functional assignment: Assign functions to genes
Regulatory element identification: Promoters, enhancers, etc.

Variation Analysis

SNPs: Single nucleotide polymorphisms
Indels: Insertions/deletions
CNVs: Copy number variations
Structural variants: Large-scale rearrangements

Functional Genomics

Gene Expression Analysis

\text{RNA-seq} \rightarrow \text{Quantify transcript abundance} \\ \text{DEG analysis} \rightarrow \text{Differentially expressed genes}

Epigenomics

DNA methylation: Gene expression regulation
ChIP-seq: Protein-DNA interactions
ATAC-seq: Chromatin accessibility

Population Genomics

Genetic Diversity Measures

Heterozygosity

H_e = 1 - \sum p_i^2

Where $p_i$ is frequency of allele $i$ .

Nucleotide Diversity

\pi = \frac{1}{n(n-1)/2} \sum_{i<j} d_{ij}

Where $d_{ij}$ is number of differences between sequences $i$ and $j$ .

Population Structure

F-statistics (Fixation Indices)

$F_{IS}$ : Inbreeding within subpopulations
$F_{ST}$ : Differentiation between subpopulations
$F_{IT}$ : Total inbreeding in total population

F_{ST} = \frac{H_T - H_S}{H_T}

Where $H_T$ is total heterozygosity and $H_S$ is subpopulation heterozygosity.

Genome-Wide Association Studies (GWAS)

\text{Statistical analysis to identify genetic variants associated with traits} \\ \text{Can identify loci contributing to complex diseases}

Applications in Medicine

Medical Genetics

Single-Gene Disorders

Autosomal dominant: Huntington's disease, Marfan syndrome
Autosomal recessive: Cystic fibrosis, sickle cell anemia
X-linked: Duchenne muscular dystrophy, hemophilia

Complex Diseases

Multifactorial: Diabetes, heart disease, cancer
Polygenic: Continuous distribution of risk

Pharmacogenomics

\text{Genetic basis for individual drug responses} \\ \text{Example: CYP2D6 polymorphisms affect drug metabolism}

Computational Tools

Genetic Analysis Software

PLINK: Whole-genome association analysis
BEAGLE: Genotype imputation
STRUCTURE: Population structure analysis
PhyML, RAxML: Phylogenetic tree construction

Real-World Application: Population Bottleneck Analysis

Population bottlenecks significantly impact genetic diversity and can be studied using population genetics principles.

Bottleneck Analysis

# Population bottleneck and genetic drift analysis
population_data = {
    'initial_size': 10000,    # N0 (original population size)
    'bottleneck_size': 50,    # Nb (size during bottleneck)
    'bottleneck_duration': 5, # generations
    'recovery_time': 100,     # generations since recovery
    'mutation_rate': 2.5e-8,  # per site per generation
    'current_size': 50000     # N1 (current population size)
}

# Calculate heterozygosity after bottleneck
# Formula: Ht = H0 * (1 - 1/2N)^t
# Where N is harmonic mean of population sizes

# Harmonic mean calculation during bottleneck
N_harmonic = 1 / ((population_data['bottleneck_duration'] / population_data['bottleneck_size']) + 
                  ((population_data['recovery_time']) / population_data['current_size']))

# Effective population size over entire period
generations_total = population_data['bottleneck_duration'] + population_data['recovery_time']
N_e = generations_total / ((population_data['bottleneck_duration'] / population_data['bottleneck_size']) + 
                           (population_data['recovery_time'] / population_data['current_size']))

# Expected heterozygosity after bottleneck
H0 = 0.0005  # Initial heterozygosity
Ht = H0 * (1 - 1/(2 * N_e)) ** generations_total

# Calculate nucleotide diversity reduction
# Expected reduction: π_post = π_pre * (1 - 1/(2*Nb))^t_bottleneck
pi_reduction = (1 - 1/(2 * population_data['bottleneck_size'])) ** population_data['bottleneck_duration']

# Effective number of founding individuals (genetic perspective)
# Using: Nb = (4*Nm*Nf) / (Nm + Nf) where Nm=male, Nf=female
# Simplified: assuming equal sex ratio for bottleneck
effective_founders = (4 * (population_data['bottleneck_size']/2) * (population_data['bottleneck_size']/2)) / population_data['bottleneck_size']

print(f"Population bottleneck analysis:")
print(f"  Original size: {population_data['initial_size']:,}")
print(f"  Bottleneck size: {population_data['bottleneck_size']}")
print(f"  Bottleneck duration: {population_data['bottleneck_duration']} generations")
print(f"  Effective population size (harmonic mean): {N_e:.1f}")
print(f"  Expected heterozygosity reduction factor: {(1 - Ht/H0):.3f}")
print(f"  Nucleotide diversity reduction: {(1 - pi_reduction):.3f}")
print(f"  Effective founding individuals: {effective_founders:.1f}")

# Calculate time to recover original diversity level
# Approximate: generations to restore diversity = 4 * Ne
recovery_generations = 4 * N_e
print(f"  Approximate generations to recover original diversity: {recovery_generations:.0f}")

# Interpretation
if effective_founders < 100:
    bottleneck_severity = "Severe - significant genetic drift expected"
elif effective_founders < 500:
    bottleneck_severity = "Moderate - some genetic drift"
else:
    bottleneck_severity = "Mild - minimal genetic drift impact"

print(f"  Bottleneck severity: {bottleneck_severity}")

Conservation Genetics Implications

Understanding population bottlenecks helps in conservation biology and species management.

Your Challenge: Hardy-Weinberg Analysis

Analyze genetic data to determine if a population is in Hardy-Weinberg equilibrium and calculate evolutionary parameters.

Goal: Use population genetics principles to analyze genetic data and assess evolutionary forces.

Population Data

import math

# SNP genotyping data for a population
genotype_counts = {
    'AA': 420,    # Number of homozygous dominant individuals
    'Aa': 480,    # Number of heterozygous individuals  
    'aa': 100     # Number of homozygous recessive individuals
}

# Calculate total individuals and allele frequencies
total_individuals = genotype_counts['AA'] + genotype_counts['Aa'] + genotype_counts['aa']
total_alleles = 2 * total_individuals

# Calculate allele frequencies
p = (2 * genotype_counts['AA'] + genotype_counts['Aa']) / total_alleles  # freq of A allele
q = (2 * genotype_counts['aa'] + genotype_counts['Aa']) / total_alleles  # freq of a allele

# Expected genotype frequencies under HWE
expected_AA = p**2 * total_individuals
expected_Aa = 2 * p * q * total_individuals
expected_aa = q**2 * total_individuals

# Chi-square test
observed_values = [genotype_counts['AA'], genotype_counts['Aa'], genotype_counts['aa']]
expected_values = [expected_AA, expected_Aa, expected_aa]

chi_square = sum([(O - E)**2 / E for O, E in zip(observed_values, expected_values)])

# Degrees of freedom = number of genotypes - 1 - number of estimated parameters
# For 3 genotypes with 1 estimated parameter (p), df = 3 - 1 - 1 = 1
df = 1

# Calculate FIS (inbreeding coefficient)
# FIS = (He - Ho) / He, where He is expected heterozygosity and Ho is observed heterozygosity
He = 2 * p * q  # Expected heterozygosity under HWE
Ho = genotype_counts['Aa'] / total_individuals  # Observed heterozygosity
FIS = (He - Ho) / He

# Calculate other population genetics parameters
heterozygosity_reduction = 1 - (Ho / He)  # Reduction from expected
allele_balance = abs(p - q)  # Difference in allele frequencies

# Assess evolutionary forces
evolutionary_forces = []
if abs(FIS) > 0.1:
    if FIS > 0:
        evolutionary_forces.append("Inbreeding")
    else:
        evolutionary_forces.append("Outbreeding")
if chi_square > 3.84:  # Critical value for p=0.05 with df=1
    evolutionary_forces.append("Deviates from HWE")

Analyze the population genetics data and determine the evolutionary forces at play.

Hint:

Calculate observed vs. expected genotype frequencies
Perform chi-square test to assess Hardy-Weinberg equilibrium
Calculate FIS to assess inbreeding/outbreeding
Consider the implications of observed allele frequencies

# TODO: Calculate genetics parameters
allele_frequency_A = 0    # Frequency of allele A
allele_frequency_a = 0    # Frequency of allele a
inbreeding_coefficient = 0  # FIS value
chi_square_statistic = 0   # Chi-square test result
hwe_status = ""            # In or out of equilibrium
evolutionary_force = ""    # Type of evolutionary pressure

# Calculate allele frequencies from counts
total_alleles = 2 * (genotype_counts['AA'] + genotype_counts['Aa'] + genotype_counts['aa'])
allele_frequency_A = (2 * genotype_counts['AA'] + genotype_counts['Aa']) / total_alleles
allele_frequency_a = (2 * genotype_counts['aa'] + genotype_counts['Aa']) / total_alleles

# Calculate expected frequencies under HWE
expected_AA = allele_frequency_A**2 * total_individuals
expected_Aa = 2 * allele_frequency_A * allele_frequency_a * total_individuals
expected_aa = allele_frequency_a**2 * total_individuals

# Calculate observed heterozygosity
observed_het = genotype_counts['Aa'] / total_individuals
expected_het = 2 * allele_frequency_A * allele_frequency_a

# Calculate FIS (inbreeding coefficient)
inbreeding_coefficient = (expected_het - observed_het) / expected_het

# Calculate chi-square statistic
chi_square_statistic = ((genotype_counts['AA'] - expected_AA)**2 / expected_AA + 
                        (genotype_counts['Aa'] - expected_Aa)**2 / expected_Aa + 
                        (genotype_counts['aa'] - expected_aa)**2 / expected_aa)

# Assess HWE status (critical value for df=1 at alpha=0.05 is 3.84)
if chi_square_statistic > 3.84:
    hwe_status = "Out of equilibrium"
else:
    hwe_status = "In equilibrium"

# Determine evolutionary force based on FIS and HWE
if inbreeding_coefficient > 0.1:
    evolutionary_force = "Genetic drift or inbreeding"
elif inbreeding_coefficient < -0.1:
    evolutionary_force = "Outbreeding or selection"
else:
    evolutionary_force = "Random mating (no significant force)"

# Print results
print(f"Allele frequency A: {allele_frequency_A:.3f}")
print(f"Allele frequency a: {allele_frequency_a:.3f}")
print(f"Inbreeding coefficient (FIS): {inbreeding_coefficient:.3f}")
print(f"Chi-square statistic: {chi_square_statistic:.2f}")
print(f"Hardy-Weinberg status: {hwe_status}")
print(f"Dominant evolutionary force: {evolutionary_force}")

# Additional population genetics assessment
heterozygosity_ratio = observed_het / expected_het
if heterozygosity_ratio < 0.9:
    diversity_status = "Reduced heterozygosity"
elif heterozygosity_ratio > 1.1:
    diversity_status = "Increased heterozygosity"
else:
    diversity_status = "Normal heterozygosity"
    
print(f"Heterozygosity status: {diversity_status}")

How might the population's genetic structure be affected if the observed deviation from Hardy-Weinberg equilibrium is due to population subdivision?

Genetics

Genetics

Mendelian Genetics

Historical Foundations

Law of Segregation

Law of Independent Assortment

Basic Genetic Concepts

Alleles and Genotypes

Monohybrid Crosses

Dihybrid Crosses

Deviations from Mendelian Ratios

Incomplete Dominance

Codominance

Multiple Alleles

Epistasis

Population Genetics

Hardy-Weinberg Equilibrium

Assumptions for Equilibrium

Testing Hardy-Weinberg Equilibrium

Chi-square Test

Factors Disrupting Equilibrium

Mutation

Migration (Gene Flow)

Genetic Drift

Natural Selection

Fitness and Selection Coefficient

Selection Models

Quantitative Genetics

Polygenic Inheritance

Components of Genetic Variance

Heritability

Narrow-sense heritability

Broad-sense heritability

Response to Selection

Genomics

Genome Structure and Organization

Prokaryotic Genomes

Eukaryotic Genomes

Genomic Technologies

DNA Sequencing

Sanger Sequencing

Next-Generation Sequencing (NGS)

Genotyping Technologies

Comparative Genomics

Synteny Analysis

Phylogenetics

Modern Genomic Applications

Whole Genome Analysis

Annotation

Variation Analysis

Functional Genomics

Gene Expression Analysis

Epigenomics

Population Genomics

Genetic Diversity Measures

Heterozygosity

Nucleotide Diversity

Population Structure

F-statistics (Fixation Indices)

Genome-Wide Association Studies (GWAS)

Applications in Medicine

Medical Genetics

Single-Gene Disorders

Complex Diseases

Pharmacogenomics

Computational Tools

Genetic Analysis Software

Real-World Application: Population Bottleneck Analysis

Bottleneck Analysis

Conservation Genetics Implications

Your Challenge: Hardy-Weinberg Analysis

Population Data

ELI10 Explanation

Self-Examination