Chapter 1

Molecular Biology

DNA replication, transcription, translation, and gene regulation.

Molecular Biology

Molecular biology is the study of biological activity at the molecular level, particularly focusing on the interactions between DNA, RNA, and proteins that drive cellular processes. Understanding these fundamental processes is essential for comprehending how genetic information is stored, expressed, and regulated in living organisms.

DNA Structure and Replication

DNA Structure

The double helix structure of DNA was elucidated by Watson and Crick in 1953:

DNA Structure=Antiparallel strands+Complementary base pairing+Right-handed helix\text{DNA Structure} = \text{Antiparallel strands} + \text{Complementary base pairing} + \text{Right-handed helix}

Base Pairing Rules

  • Adenine (A) pairs with Thymine (T) via 2 hydrogen bonds
  • Guanine (G) pairs with Cytosine (C) via 3 hydrogen bonds

The stability relationship:

StabilityGC content×number of hydrogen bonds\text{Stability} \propto \text{GC content} \times \text{number of hydrogen bonds}

DNA Strands

  • 5' to 3' direction: Phosphate group at 5', hydroxyl at 3'
  • Antiparallel: The two strands run in opposite directions
  • Complementarity: The sequence of one strand determines the sequence of the other

DNA Replication

DNA replication is semiconservative, meaning each new DNA molecule consists of one original strand and one newly synthesized strand:

\text{DNA} \xrightarrow{\text{DNA Polymerase}} \text{DNA (50% parental, 50% new)}

Replication Process

  1. Initiation

    • DNA helicase unwinds the double helix
    • Single-strand binding proteins (SSBs) stabilize unwound DNA
    • Primase synthesizes RNA primers
  2. Elongation

    • DNA polymerase III adds nucleotides in 5' to 3' direction
    • Leading strand: Continuous synthesis
    • Lagging strand: Discontinuous synthesis (Okazaki fragments)
  3. Termination

    • RNA primers removed by RNase H
    • Gaps filled by DNA polymerase I
    • DNA ligase seals nicks

Replication Enzymes

EnzymeFunction
DNA HelicaseUnwinds double helix
DNA Polymerase IIIMain replication enzyme (prokaryotes)
DNA Polymerase α, δ, εReplication enzymes (eukaryotes)
PrimaseSynthesizes RNA primers
DNA LigaseJoins Okazaki fragments
TopoisomerasePrevents supercoiling

Proofreading and Repair

Error rate=1×1010 (with proofreading)\text{Error rate} = 1 \times 10^{-10} \text{ (with proofreading)}
  • 3' to 5' exonuclease: Immediate error correction
  • Mismatch repair: Post-replication correction
  • Nucleotide excision repair: Damage-induced repair

Transcription

RNA Polymerase Mechanism

Transcription converts DNA sequence information into RNA:

DNA (template)RNA PolymerasemRNA\text{DNA (template)} \xrightarrow{\text{RNA Polymerase}} \text{mRNA}

Transcription Process

  1. Initiation

    • RNA polymerase binds promoter region
    • Transcription factors assist binding
    • DNA unwinds to form transcription bubble
  2. Elongation

    • RNA polymerase moves 3' to 5' along template DNA
    • RNA transcript grows 5' to 3'
    • DNA helix reforms behind polymerase
  3. Termination

    • Intrinsic termination (prokaryotes): Hairpin loop formation
    • Rho-dependent termination: Rho protein factor
    • Rho-independent termination: Hairpin + U-rich sequence

Prokaryotic vs. Eukaryotic Transcription

FeatureProkaryotesEukaryotes
LocationCytoplasmNucleus
RNA PolymerasesSingle enzymeRNA Pol I, II, III
CouplingTranscription/translation coupledSequential processes
ProcessingMinimalExtensive processing

Translation

The Genetic Code

The genetic code is degenerate and universal:

64 codons20 amino acids+start/stop signals\text{64 codons} \rightarrow \text{20 amino acids} + \text{start/stop signals}

Code Characteristics

  • Degenerate: Multiple codons for single amino acid
  • Universal: Conserved across organisms
  • Non-overlapping: Each base read once
  • Commaless: No punctuation between codons

Translation Process

  1. Initiation

    • Small ribosomal subunit binds mRNA
    • tRNA carrying methionine (fMet in prokaryotes) binds start codon (AUG)
    • Large ribosomal subunit joins, forming complete ribosome
  2. Elongation

    • A site: Accepts incoming aminoacyl-tRNA
    • P site: Holds growing peptide chain
    • E site: Exit for uncharged tRNA
    • Peptide bond formation: Peptidyl transferase activity
  3. Termination

    • Stop codons (UAA, UAG, UGA) recognized by release factors
    • Polypeptide released from ribosome
    • Ribosome subunits dissociate

tRNA Structure and Function

tRNA=Amino acid attachment+Anticodon loop+Secondary structure\text{tRNA} = \text{Amino acid attachment} + \text{Anticodon loop} + \text{Secondary structure}
  • Anticodon: 3-nucleotide sequence complementary to mRNA codon
  • Amino acid attachment: At 3' end (CCA sequence)
  • Secondary structure: Cloverleaf formation stabilized by H-bonds

Gene Regulation

Prokaryotic Gene Regulation: The Lac Operon

The lac operon is a classic model of gene regulation:

Lac Operon=Promoter+Operator+Structural genes+Regulatory gene\text{Lac Operon} = \text{Promoter} + \text{Operator} + \text{Structural genes} + \text{Regulatory gene}

Components

  • lacZ: β-galactosidase (breaks down lactose)
  • lacY: Permease (lactose transport)
  • lacA: Transacetylase (lactose metabolism)
  • lacI: Repressor gene (regulates operon)

Regulation Mechanism

  • Negative control: Repressor binding blocks transcription
  • Induction: Lactose presence inactivates repressor
  • Catabolite repression: Glucose inhibits lac operon via cAMP-CRP

Eukaryotic Gene Regulation

Transcriptional Control

  1. Chromatin Remodeling

    • Histone modifications: Acetylation, methylation
    • DNA methylation: Generally repressive
    • Chromatin accessibility: Open vs. closed domains
  2. Transcription Factors

    • General TFs: Basal transcription machinery
    • Specific TFs: Enhancers and silencers
    • Coactivators/corepressors: Modulate TF activity

Post-transcriptional Control

  1. RNA Processing

    • 5' capping: Protection and ribosome binding
    • 3' polyadenylation: Stability and transport
    • Splicing: Intron removal and exon joining
  2. Alternative Splicing

Exon selection=f(spliceosome,SR proteins,hnRNP proteins)\text{Exon selection} = f(\text{spliceosome}, \text{SR proteins}, \text{hnRNP proteins})

Post-translational Control

  • Protein modifications: Phosphorylation, glycosylation
  • Protein degradation: Ubiquitin-proteasome pathway
  • Regulatory proteins: Control protein activity

Advanced Topics in Gene Expression

RNA Processing in Eukaryotes

Pre-mRNA Splicing

Primary transcriptSpliceosomeMature mRNA\text{Primary transcript} \xrightarrow{\text{Spliceosome}} \text{Mature mRNA}

The spliceosome removes introns and joins exons:

  • Splice sites: Conserved sequences (GU-AG rule)
  • Branch point: Critical for splicing reaction
  • Lariat intermediate: Intron structure during splicing

Epigenetic Regulation

DNA Methylation

5’ cytosineDNA methyltransferase5’ methylcytosine\text{5' cytosine} \xrightarrow{\text{DNA methyltransferase}} \text{5' methylcytosine}
  • Context: Typically CpG dinucleotides
  • Effect: Generally repressive to transcription
  • Maintenance: Preserved during DNA replication

Histone Modifications

  • Acetylation: Generally activating (neutralizes positive charge)
  • Methylation: Can activate or repress (context-dependent)
  • Phosphorylation: Often involved in DNA damage response

Molecular Techniques

PCR (Polymerase Chain Reaction)

Target DNAThermal CyclingExponential amplification\text{Target DNA} \xrightarrow{\text{Thermal Cycling}} \text{Exponential amplification}

PCR Process

  • Denaturation: 94-98°C (DNA strands separate)
  • Annealing: 50-65°C (primers bind)
  • Extension: 72°C (DNA synthesis by Taq polymerase)

PCR Applications

  • Diagnostic: Pathogen detection
  • Research: Gene cloning, sequencing
  • Forensic: DNA fingerprinting

Recombinant DNA Technology

Restriction Enzymes

DNARestriction enzymeSpecific recognition sequence cleavage\text{DNA} \xrightarrow{\text{Restriction enzyme}} \text{Specific recognition sequence cleavage}
  • Palindromic recognition: 4-8 base pairs
  • Sticky ends: Single-strand overhangs
  • Blunt ends: Double-strand cuts

Modern Developments

CRISPR-Cas Systems

  • Guide RNA: Directs Cas nuclease to target
  • PAM sequence: Required for recognition
  • Versatility: Can target any genomic sequence

Single-cell Analysis

  • Single-cell RNA-seq: Transcriptome of individual cells
  • Spatial transcriptomics: Location-specific gene expression
  • Lineage tracing: Cell fate determination

Real-World Application: Antibiotic Resistance Mechanisms

Antibiotic resistance provides a practical example of molecular biology principles in action.

Mechanism Analysis

# Antibiotic resistance mechanisms at molecular level
antibiotic_resistance = {
    'ampicillin_resistance': {
        'mechanism': 'Beta-lactamase production',
        'gene': 'bla',
        'protein': 'Beta-lactamase enzyme',
        'function': 'Hydrolyzes beta-lactam ring'
    },
    'tetracycline_resistance': {
        'mechanism': 'Efflux pump expression',
        'gene': 'tetA',
        'protein': 'Tetracycline efflux protein',
        'function': 'Pumps antibiotic out of cell'
    },
    'kanamycin_resistance': {
        'mechanism': 'Enzymatic modification',
        'gene': 'aph(3\')-II',
        'protein': 'Aminoglycoside phosphotransferase',
        'function': 'Phosphorylates antibiotic, preventing binding'
    }
}

# Calculate mutation rates affecting resistance
mutation_rate = 1e-6  # per base pair per generation
genome_size = 4.6e6  # base pairs for E. coli
per_genome_rate = mutation_rate * genome_size  # ~4.6 mutations per genome per generation

# Estimate time to resistance development
bacterial_generations_per_day = 12  # assuming ideal growth
resistance_probability = 1 - (1 - per_genome_rate)**bacterial_generations_per_day  # probability per day

print(f"Estimated bacterial mutations per genome per generation: {per_genome_rate:.2e}")
print(f"Resistance development probability per day: {resistance_probability:.2e}")
print(f"Average time to first resistance mutation: {1/resistance_probability/365:.1f} years (in ideal conditions)")

# Calculate selection pressure effects
drug_concentration = 10  # relative to MIC
fitness_cost = 0.05  # 5% fitness cost for resistance gene
selection_coefficient = drug_concentration * (1 - fitness_cost)

print(f"Selection coefficient with drug pressure: {selection_coefficient:.2f}")
print("This demonstrates how antibiotic use accelerates resistance evolution")

# Molecular mechanism of beta-lactam resistance
print(f"\nBeta-lactamase mechanism:")
print(f"  - Gene: {antibiotic_resistance['ampicillin_resistance']['gene']}")
print(f"  - Function: {antibiotic_resistance['ampicillin_resistance']['function']}")
print(f"  - Result: Antibiotic inactivation through hydrolysis")

Evolutionary Implications

Understanding molecular mechanisms helps explain the rapid evolution of antibiotic resistance.


Your Challenge: Gene Expression Analysis

Analyze the regulation of a hypothetical gene and predict how mutations would affect expression levels.

Goal: Use molecular biology principles to analyze gene regulation and predict expression outcomes.

Gene Regulatory Sequence

import math

# Hypothetical gene regulatory region
gene_data = {
    'promoter_strength': 0.8,  # Relative strength (0-1)
    'operator_sites': [
        {'type': 'activator', 'affinity': 0.9},   # High affinity binding site
        {'type': 'repressor', 'affinity': 0.6}    # Medium affinity binding site
    ],
    'upstream_enhancer': True,
    'polyadenylation_signal': 'AATAAA',
    'splice_sites': {
        'donor': 'GTATGGT',
        'acceptor': 'CAGG',
        'branch_point': 'TACTAAC'
    },
    'length': 8500  # base pairs (including introns)
}

# Calculate expression level based on regulatory elements
basal_expression = gene_data['promoter_strength'] * 100  # arbitrary units

# Calculate activator effect (positive regulation)
activator_affinity = gene_data['operator_sites'][0]['affinity']
activator_effect = basal_expression * activator_affinity * 0.5  # 50% increase potential

# Calculate repressor effect (negative regulation)  
repressor_affinity = gene_data['operator_sites'][1]['affinity']
repressor_effect = basal_expression * repressor_affinity * 0.3  # 30% decrease potential

# Calculate net expression level
net_expression = basal_expression + activator_effect - repressor_effect

# Calculate splicing efficiency
splice_strength = 0.85  # efficiency factor
processing_efficiency = 0.9 if gene_data['polyadenylation_signal'] == 'AATAAA' else 0.6

# Calculate mature mRNA abundance
mature_mrna = net_expression * splice_strength * processing_efficiency

# Calculate protein production (assuming 100% translation efficiency)
mrna_half_life = 4  # hours in prokaryotes
protein_synthesis_rate = mature_mrna * 10  # 10 proteins per mRNA per hour

# Simulate mutation effects
mutations_to_test = [
    {'name': 'promoter_mutation', 'effect': -0.3},  # 30% decrease in promoter strength
    {'name': 'enhancer_deletion', 'effect': -0.2},  # 20% decrease for enhancer
    {'name': 'polyA_mutation', 'effect': -0.4}      # 40% decrease for polyA processing
]

expression_effects = {}
for mutation in mutations_to_test:
    mutated_strength = max(0.1, gene_data['promoter_strength'] + mutation['effect'])
    mutated_expression = mutated_strength * 100 + activator_effect - repressor_effect
    expression_effects[mutation['name']] = mutated_expression

Analyze the gene regulation system and predict the effects of mutations on expression levels.

Hint:

  • Consider how regulatory elements (promoter, operator, enhancer) affect transcription
  • Calculate the combined effects of activators and repressors
  • Evaluate the impact of post-transcriptional modifications
  • Estimate protein production from mRNA levels
# TODO: Calculate gene expression parameters
basal_expression_level = 0  # Arbitrary units (0-100 scale)
net_regulation_effect = 0   # Combined activator/repressor effect
mature_mrna_amount = 0      # Molecules per cell
protein_concentration = 0   # Units per cell
half_life_hours = 0         # RNA stability
expression_fold_change = 0  # Effect of regulatory mutations

# Calculate basal expression from promoter
basal_expression_level = gene_data['promoter_strength'] * 100

# Calculate net regulation (activator + repressor effects)
activator_contribution = gene_data['operator_sites'][0]['affinity'] * 50  # Scale factor
repressor_contribution = gene_data['operator_sites'][1]['affinity'] * 30  # Scale factor
net_regulation_effect = activator_contribution - repressor_contribution

# Calculate mature mRNA considering processing efficiency
processing_efficiency = 0.9 if gene_data['polyadenylation_signal'] == 'AATAAA' else 0.6
mature_mrna_amount = (basal_expression_level + net_regulation_effect) * processing_efficiency

# Calculate protein concentration (assuming 5 proteins per mRNA)
protein_concentration = mature_mrna_amount * 5

# Calculate fold change with mutations
control_expression = mature_mrna_amount
mutant_expression = control_expression * 0.7  # Example with 30% reduction
expression_fold_change = mutant_expression / control_expression

# RNA half-life calculation
if gene_data['polyadenylation_signal'] == 'AATAAA':
    half_life_hours = 4  # Stable message
else:
    half_life_hours = 1  # Less stable

# Print results
print(f"Basal expression level: {basal_expression_level:.1f} units")
print(f"Net regulation effect: {net_regulation_effect:.1f} units")
print(f"Mature mRNA amount: {mature_mrna_amount:.1f} molecules/cell")
print(f"Protein concentration: {protein_concentration:.1f} molecules/cell")
print(f"RNA half-life: {half_life_hours} hours")
print(f"Expression fold change with mutations: {expression_fold_change:.2f}")

# Regulatory assessment
if expression_fold_change < 0.5:
    regulation_type = "Strong downregulation"
elif expression_fold_change < 0.8:
    regulation_type = "Moderate downregulation"
elif expression_fold_change > 2.0:
    regulation_type = "Strong upregulation"
else:
    regulation_type = "Normal regulation"
    
print(f"Regulation assessment: {regulation_type}")

What would be the most effective strategy to increase expression of this gene for protein production purposes?

ELI10 Explanation

Simple analogy for better understanding

Think of molecular biology like studying the most important instruction manual in your body - the one written in a language made of just four letters (A, T, G, C). This manual (your DNA) contains all the instructions for building and running your body. Molecular biology is like learning how to read this manual, how to copy it when cells divide, and how to use its instructions to make proteins - the tiny machines that do most of the work in your body. Just like a factory follows a blueprint to make products, your cells follow DNA instructions to make proteins that keep you alive and functioning. It's like learning the language of life itself!

Self-Examination

Q1.

What are the key differences between DNA replication, transcription, and translation?

Q2.

How does the lac operon regulate gene expression in bacteria?

Q3.

What is the role of RNA splicing in eukaryotic gene expression?