Chapter 6

Genetic Engineering

CRISPR, recombinant DNA, and synthetic biology.

Genetic Engineering

Genetic engineering is the direct manipulation of an organism's genes using biotechnology. It involves the introduction of foreign DNA into host organisms or the modification of existing DNA to alter characteristics or produce biological products. This field encompasses a range of technologies from traditional recombinant DNA techniques to cutting-edge genome editing and synthetic biology approaches.

Recombinant DNA Technology

Historical Development

The concept of recombinant DNA emerged in the 1970s when Stanley Cohen and Herbert Boyer successfully transferred DNA from one bacterium to another, creating the first genetically modified organism.

Key Components and Techniques

Restriction Enzymes

DNARestriction enzymeDNA fragments (defined by recognition site)\text{DNA} \xrightarrow{\text{Restriction enzyme}} \text{DNA fragments (defined by recognition site)}
Recognition and Cleavage
Recognition sequence: 5GAATTC33CTTAAG5EcoRI5GAATTC33CTAATG5\text{Recognition sequence: } 5'-\text{GAATTC}-3' \\ 3'-\text{CTTAAG}-5' \xrightarrow{\text{EcoRI}} 5'-G \downarrow AATT \downarrow C-3' \\ 3'-C \downarrow TAAT \downarrow G-5'
Types of Ends
  • Sticky ends: Single-stranded overhangs (4-6 bases)
  • Blunt ends: Double-strand cuts producing flush termini

DNA Ligases

DNA fragment1+DNA fragment2DNA ligaseRecombinant DNA\text{DNA fragment}_1 + \text{DNA fragment}_2 \xrightarrow{\text{DNA ligase}} \text{Recombinant DNA}

DNA Vectors

Plasmids
Vector=Origin of replication+Selectable marker+Multiple cloning site\text{Vector} = \text{Origin of replication} + \text{Selectable marker} + \text{Multiple cloning site}
Essential Elements
  • ori: Origin of replication (allows plasmid to replicate)
  • antibiotic resistance gene: Selection marker
  • MCS: Multiple cloning site with restriction sites
Viral Vectors
  • Bacteriophage λ: 8-22 kb insert capacity
  • Cosmids: 35-45 kb capacity (contain cos sites for packaging)
  • BACs: Bacterial Artificial Chromosomes (~300 kb capacity)
  • YACs: Yeast Artificial Chromosomes (~1000 kb capacity)

Cloning Strategies

Traditional Cloning

Target gene+Linearized vectorLigationRecombinant plasmid\text{Target gene} + \text{Linearized vector} \xrightarrow{\text{Ligation}} \text{Recombinant plasmid}

TA Cloning

PCR product (3’-A overhang)+Vector with 3’-T overhangLigationClone\text{PCR product (3'-A overhang)} + \text{Vector with 3'-T overhang} \xrightarrow{\text{Ligation}} \text{Clone}

Golden Gate Assembly

Multiple parts+Type IIS restriction enzymeOne-pot reactionAssembled construct\text{Multiple parts} + \text{Type IIS restriction enzyme} \xrightarrow{\text{One-pot reaction}} \text{Assembled construct}

Transformation and Selection

Bacterial Transformation

Competent cells+Plasmid DNATransformed cells\text{Competent cells} + \text{Plasmid DNA} \rightarrow \text{Transformed cells}
Transformation Efficiency
TE=Colonies formedμg DNA used×dilution factor\text{TE} = \frac{\text{Colonies formed}}{\text{μg DNA used}} \times \text{dilution factor}

Selection Methods

  • Antibiotic resistance: Most common selection marker
  • Auxotrophic complementation: Nutritional selection
  • Blue-white screening: For insertional inactivation clones

CRISPR-Cas Systems

Discovery and Development

Bacterial Adaptive Immunity

AcquisitionExpression and processingInterference\text{Acquisition} \rightarrow \text{Expression and processing} \rightarrow \text{Interference}
  1. Adaptation: New spacer acquisition from invader DNA
  2. Expression: Pre-crRNA transcription
  3. Processing: crRNA maturation
  4. Interference: Target recognition and cleavage

Mechanism of Action

Class 1 Systems
  • Type I: Cascade complex (multi-subunit effector)
  • Type III: Csm/Cmr complexes (targeting RNA)
Class 2 Systems
  • Type II: Single protein (Cas9)
  • Type V: Single protein (Cas12)
  • Type VI: Single protein (Cas13)

Cas9-Mediated Genome Editing

Mechanism

sgRNA+Cas9+Target DNAR-loop formationDouble-strand break\text{sgRNA} + \text{Cas9} + \text{Target DNA} \rightarrow \text{R-loop formation} \rightarrow \text{Double-strand break}

Components

  • crRNA: CRISPR RNA (contains guide sequence)
  • tracrRNA: Trans-activating crRNA (facilitates processing)
  • sgRNA: Single guide RNA (fusion of crRNA and tracrRNA)

PAM Recognition

PAM sequence required: NGG (Cas9), NGA (Cpf1), etc.\text{PAM sequence required: NGG (Cas9), NGA (Cpf1), etc.}

R-Loop Formation

sgRNA:DNA=Hybrid duplex+Displaced ssDNA\text{sgRNA:DNA} = \text{Hybrid duplex} + \text{Displaced ssDNA}

CRISPR Variants

Modified Cas Systems

Base Editors
Cas9(nickase)+Cytidine deaminaseC→T (or G→A) conversion\text{Cas9(nickase)} + \text{Cytidine deaminase} \rightarrow \text{C→T (or G→A) conversion} Cas9(nickase)+Adenine deaminaseA→G (or T→C) conversion\text{Cas9(nickase)} + \text{Adenine deaminase} \rightarrow \text{A→G (or T→C) conversion}
Prime Editors
Reverse transcriptase fused to Cas9 nickase+pegRNAPrecise insertions/deletions/substitutions\text{Reverse transcriptase fused to Cas9 nickase} + \text{pegRNA} \rightarrow \text{Precise insertions/deletions/substitutions}

Applications

  • Gene knockouts: Induce indels via NHEJ
  • Gene knock-ins: HDR-mediated insertion
  • Gene regulation: dCas (dead Cas) systems for transcription
  • Epigenome editing: dCas fusions with epigenetic modifiers

Off-Target Effects and Safety

Predicting Off-Targets

OT score=i=1Nmismatch tolerance(i)\text{OT score} = \prod_{i=1}^{N} \text{mismatch tolerance}(i)

Reducing Off-Targets

  • High-fidelity Cas9 variants: eSpCas9, SpCas9-HF1
  • Truncated gRNAs: 17-18 nt instead of 20 nt
  • Modified PAM requirements: SaCas9-NLS-KKH

Synthetic Biology

Definition and Scope

Synthetic biology is the design and construction of new biological parts, devices, and systems, or the redesign of existing natural biological systems for useful purposes.

Standardized Parts

BioBricks

Part=Promoter+RBS+Coding sequence+Terminator\text{Part} = \text{Promoter} + \text{RBS} + \text{Coding sequence} + \text{Terminator}

Registry of Standard Biological Parts

  • Part categories: Promoters, RBSs, coding sequences, terminators
  • Characterization: Quantified expression levels, induction requirements

Genetic Circuits

Toggle Switch

PromoterARepressorBRepressorAPromoterB\text{Promoter}_A \rightarrow \text{Repressor}_B \rightleftarrows \text{Repressor}_A \leftarrow \text{Promoter}_B

Oscillators (Repressilator)

GeneAGeneBGeneCGeneA\text{Gene}_A \rightarrow \text{Gene}_B \rightarrow \text{Gene}_C \rightarrow \text{Gene}_A

AND Gates

Input 1+Input 2Output\text{Input 1} + \text{Input 2} \rightarrow \text{Output}

Applications

Metabolic Engineering

Pathway=PrecursorEnzyme1Intermediate1Enzyme2EnzymenProduct\text{Pathway} = \text{Precursor} \xrightarrow{\text{Enzyme}_1} \text{Intermediate}_1 \xrightarrow{\text{Enzyme}_2} \ldots \xrightarrow{\text{Enzyme}_n} \text{Product}

Biosensors

SignalReceptorReporter geneOutput (fluorescence, color, etc.)\text{Signal} \rightarrow \text{Receptor} \rightarrow \text{Reporter gene} \rightarrow \text{Output (fluorescence, color, etc.)}

Advanced Techniques

Homologous Recombination

Gene Targeting

Linear DNA construct+Target locusHDRCorrectly integrated\text{Linear DNA construct} + \text{Target locus} \xrightarrow{\text{HDR}} \text{Correctly integrated}

Gene Replacement

5’ homology+Selection cassette+3’ homology crossover Targeted allele\text{5' homology} + \text{Selection cassette} + \text{3' homology} \xrightarrow{\text{ crossover }} \text{Targeted allele}

Conditional Mutagenesis

Cre-LoxP System

LoxP sitesCre recombinaseExcision/insertion/inversion\text{LoxP sites} \xrightarrow{\text{Cre recombinase}} \text{Excision/insertion/inversion}

Flp-FRT System

FRT sitesFlp recombinaseSite-specific recombination\text{FRT sites} \xrightarrow{\text{Flp recombinase}} \text{Site-specific recombination}

Epigenome Editing

DNA Methylation Editing

dCas-DNMTTargeted methylation\text{dCas-DNMT} \rightarrow \text{Targeted methylation} dCas-TETTargeted demethylation\text{dCas-TET} \rightarrow \text{Targeted demethylation}

Chromatin Remodeling

dCas-SWI/SNFChromatin accessibility modulation\text{dCas-SWI/SNF} \rightarrow \text{Chromatin accessibility modulation}

Regulatory Considerations

Risk Assessment

Environmental Risk

  • Gene flow: Transfer to wild populations
  • Fitness effects: Impact on ecosystem dynamics
  • Horizontal gene transfer: Movement between species

Human Health Risk

  • Allergenicity: Potential allergic reactions
  • Antibiotic resistance markers: Selection pressure
  • Toxicity: Production of harmful compounds

Ethical Considerations

Germline Editing

  • Heritable genetic modifications
  • Intergenerational consent issues
  • Designer baby concerns

Agricultural Applications

  • Coexistence: GM vs. non-GM crops
  • Labeling: Consumer right to know
  • Patenting: Ownership of genetic sequences

Current Applications

Medicine

Gene Therapy

Corrective gene+Delivery vectorCell transductionTherapeutic effect\text{Corrective gene} + \text{Delivery vector} \xrightarrow{\text{Cell transduction}} \text{Therapeutic effect}

CAR-T Cell Therapy

T cellsGenetic modificationCAR-T cellsInfusionCancer targeting\text{T cells} \xrightarrow{\text{Genetic modification}} \text{CAR-T cells} \xrightarrow{\text{Infusion}} \text{Cancer targeting}

Agriculture

Crop Improvement

  • Herbicide resistance: Roundup Ready crops
  • Biotic stress tolerance: Bt crops
  • Abiotic stress tolerance: Drought-resistant varieties
  • Nutritional enhancement: Golden Rice

Industrial Biotechnology

Biomanufacturing

  • Recombinant proteins: Insulin, growth hormones, antibodies
  • Enzymes: Industrial catalysts
  • Biofuels: Ethanol, biodiesel, alkanes
  • Bioplastics: PHA, PLA production

Computational Tools

Design Software

CRISPR Design

  • CHOPCHOP: Guide RNA design and specificity
  • CRISPOR: Comprehensive CRISPR design tool
  • Benchling: Molecular biology design platform

Pathway Design

  • Pathway Tools: Metabolic pathway analysis
  • KEGG: Kyoto Encyclopedia of Genes and Genomes
  • BioBuilder: Synthetic biology design platform

Analysis Tools

  • BLAST: Sequence similarity searches
  • Clustal Omega: Multiple sequence alignment
  • SnapGene: DNA visualization and cloning design

Real-World Application: CRISPR Therapeutic Development

The development of CRISPR-based therapeutics involves careful consideration of delivery, specificity, and safety.

Therapeutic CRISPR Design

# CRISPR therapeutic development analysis
crispr_params = {
    'target_gene': 'HBB',  # Beta-globin gene for sickle cell disease
    'guide_length': 20,    # Nucleotides
    'pam_sequence': 'NGG',  # Required PAM for SpCas9
    'genome_build': 'hg38', # Human genome reference
    'off_target_score': 0.85,  # Specificity score (0-1, higher is better)
    'editing_efficiency': 0.75,  # 75% editing efficiency
    'delivery_method': 'electroporation',  # Method to deliver components
    'cell_type': 'hematopoietic_stem_cells',  # Target cells
    'therapeutic_strategy': 'correction'  # Type of edit needed
}

# Calculate potential off-target sites
# Using simplified specificity model
potential_off_targets = 0
for i in range(20):  # For each position in guide RNA
    # Calculate mismatch tolerance
    if i < 12:  # Seeds region - less tolerant of mismatches
        mismatch_penalty = 10
    else:  # Non-seed region - more tolerant of mismatches
        mismatch_penalty = 1

# Predict on/off-target binding probability
# Using thermodynamic model
target_binding_energy = -35  # kcal/mol (estimate)
off_target_binding_energy = -30  # kcal/mol (weaker binding)
k_on_ratio = math.exp(-(target_binding_energy - off_target_binding_energy) / (8.314e-3 * 310))  # At 37°C

# Calculate editing outcomes
original_allele = 1.0
edited_allele = crispr_params['editing_efficiency'] * original_allele
remaining_original = original_allele - edited_allele

# For sickle cell correction (E6V to E6E)
# Need to either correct the mutation or upregulate fetal hemoglobin
correction_outcome = {
    'normal_alleles': edited_allele,
    'sickle_alleles': remaining_original,
    'compensated_alleles': 0  # If using alternative approach
}

# Estimate therapeutic threshold
therapeutic_threshold = 0.15  # Need 15% normal alleles for clinical improvement
therapeutic_success = edited_allele > therapeutic_threshold

# Calculate predicted clinical outcome
predicted_clinical_improvement = edited_allele / 2 * 100  # Assuming heterozygous state is beneficial

print(f"CRISPR therapeutic design for {crispr_params['target_gene']}:")
print(f"  Guide RNA length: {crispr_params['guide_length']} nt")
print(f"  PAM sequence: {crispr_params['pam_sequence']}")
print(f"  Editing efficiency: {crispr_params['editing_efficiency']*100:.1f}%")
print(f"  Predicted on/off-target ratio: {k_on_ratio:.2f}")
print(f"  Therapeutic threshold ({therapeutic_threshold*100}%): {'Achieved' if therapeutic_success else 'Not achieved'}")
print(f"  Predicted clinical improvement: {predicted_clinical_improvement:.1f}%")
print(f"  Delivery method: {crispr_params['delivery_method']}")

# Safety assessment
if crispr_params['off_target_score'] < 0.9:
    safety_concern = "High off-target risk - extensive validation needed"
else:
    safety_concern = "Acceptable specificity - proceed with development"

print(f"  Safety assessment: {safety_concern}")

# Potential complications
potential_issues = []
if crispr_params['editing_efficiency'] > 0.9:
    potential_issues.append("High efficiency may increase risk of unwanted modifications")
if crispr_params['editing_efficiency'] < 0.1:
    potential_issues.append("Low efficiency may not achieve therapeutic benefit")
if crispr_params['off_target_score'] < 0.8:
    potential_issues.append("High off-target risk needs mitigation")

print(f"  Potential issues: {potential_issues if potential_issues else ['None identified']}")

Clinical Trial Considerations

Factors in translating CRISPR to therapeutic applications.


Your Challenge: Vector Design and Cloning Strategy

Design a vector system for expressing a therapeutic protein and outline the cloning strategy.

Goal: Engineer a recombinant DNA construct for therapeutic protein production.

Design Parameters

import math

# Therapeutic protein design parameters
protein_design = {
    'target_protein': 'Human insulin',
    'accession_number': 'P01308',
    'length': 51,  # Amino acids
    'molecular_weight': 5808,  # Da
    'signal_peptide': True,    # Secreted protein
    'required_modifications': ['disulfide_bonds', 'glycosylation'],
    'expression_host': 'E.coli',
    'selection_marker': 'ampicillin',
    'promoter_type': 'inducible',  # Constitutive or inducible
    'copy_number': 'medium',       # Low, medium, or high copy plasmid
    'codon_optimization': True     # For expression host
}

# Calculate codon adaptation index (CAI) for E. coli expression
# Simplified calculation based on codon frequency
def calculate_cai(sequence, host_codons):
    # This would normally use a reference set of highly expressed genes
    # For this exercise, we'll simulate a CAI calculation
    cai_score = 0.75  # Simulated score
    return cai_score

# Vector backbone requirements
vector_features = {
    'ori': 'ColE1 origin',  # High copy number
    'promoter': 'Ptac',     # IPTG-inducible
    'ribosome_binding_site': 'strong',  # AGGAGGT sequence
    'terminator': 'T1 from E. coli rrnB',  # Strong terminator
    'selection': 'ampR',    # Ampicillin resistance
    'multiple_cloning_site': ['BamHI', 'EcoRI', 'XhoI', 'XbaI']  # Common sites
}

# Calculate insert size for cloning
protein_coding_seq = 'ATGAAATTTATCATCGCCCTGGTGATCGTTATCCTGGCGCTGGCCCAGCCCGGCGAA'  # Insulin signal sequence & first part
poly_histidine_tag = 'CACCATCACCACCACCAC'  # 6xHis tag for purification
terminator_seq = 'TAG'  # Stop codon

full_insert = protein_coding_seq + poly_histidine_tag + terminator_seq
insert_length = len(full_insert)

# Calculate expression optimization
if protein_design['codon_optimization']:
    codon_adaptation_index = 0.82  # Optimized for E.coli
else:
    codon_adaptation_index = 0.55  # Native sequence

# Predict expression level based on design parameters
promoter_strength = 0.8 if protein_design['promoter_type'] == 'inducible' else 1.0  # Inducible is usually strong
rbs_strength = 0.9 if vector_features['ribosome_binding_site'] == 'strong' else 0.5
copy_number_factor = 10 if protein_design['copy_number'] == 'high' else 3  # High vs medium copy

predicted_expression_level = codon_adaptation_index * promoter_strength * rbs_strength * copy_number_factor

# Consider potential issues
expression_issues = []
if codon_adaptation_index < 0.6:
    expression_issues.append("Codon bias may reduce expression")
if protein_design['required_modifications']:
    if protein_design['expression_host'] == 'E.coli':
        expression_issues.append("E.coli lacks glycosylation machinery")
if predicted_expression_level < 1.0:
    expression_issues.append("Low expression level predicted")

# Calculate production yield estimation
culture_volume = 1  # Liters
cell_density = 4  # OD600 (approximately 2 g/L dry weight)
expression_level = 0.1  # Fraction of total protein as target

estimated_yield = culture_volume * cell_density * 2 * expression_level  # grams per liter

Design a recombinant DNA construct for therapeutic protein production.

Hint:

  • Consider the expression host and optimize for it
  • Include proper regulatory elements
  • Plan the cloning strategy with compatible restriction sites
  • Consider protein purification and secretion
# TODO: Design the recombinant construct
vector_backbone = ""  # Name of vector backbone to use
promoter_selected = ""  # Promoter to use for expression
cloning_strategy = ""  # Step-by-step cloning approach
expression_level_prediction = 0  # Predicted expression level (0-1 scale)
purification_tags = []  # Tags for protein purification
safety_considerations = []  # Safety considerations for therapeutic use

# Select appropriate vector
if protein_design['copy_number'] == 'high':
    vector_backbone = "pET series (T7 promoter)"
elif protein_design['copy_number'] == 'medium':
    vector_backbone = "pGEX series (glutathione S-transferase fusion)"
else:
    vector_backbone = "pACYC series (low copy, good for toxic proteins)"

# Select appropriate promoter
if protein_design['promoter_type'] == 'inducible':
    promoter_selected = "T7 or Ptac (IPTG-inducible)"
else:
    promoter_selected = "trc or lacUV5 (constitutive)"

# Design cloning strategy
# Step 1: Design oligos for gene synthesis with optimal codons
# Step 2: PCR amplify with restriction sites for cloning
# Step 3: Digest vector and insert with compatible enzymes
# Step 4: Ligate and transform
# Step 5: Select and verify clones

cloning_strategy = [
    "Synthesize gene with optimized codons for E. coli",
    f"Add restriction sites for {vector_features['multiple_cloning_site'][0]} and {vector_features['multiple_cloning_site'][1]}",
    f"Digest vector with {vector_features['multiple_cloning_site'][0]} and {vector_features['multiple_cloning_site'][1]}",
    "Ligate insert into linearized vector",
    "Transform into competent E. coli cells",
    "Select on antibiotic plates and verify by sequencing"
]

# Calculate expression level
expression_level_prediction = predicted_expression_level

# Add purification tags
if protein_design['required_modifications'] and 'purification' in str(protein_design['required_modifications']):
    purification_tags = ["6xHistidine tag", "FLAG tag"]
else:
    purification_tags = ["6xHistidine tag"]  # Standard for E. coli

# Consider safety factors
safety_considerations = []
if protein_design['expression_host'] == 'E.coli':
    safety_considerations.append("Endotoxin removal required for therapeutic use")
if protein_design['required_modifications']:
    if 'glycosylation' in str(protein_design['required_modifications']):
        safety_considerations.append("E. coli cannot glycosylate proteins - may affect function")
if vector_design['selection_marker'] == 'ampicillin':
    safety_considerations.append("Antibiotic resistance marker requires removal for clinical use")


# Print results
print(f"Vector backbone: {vector_backbone}")
print(f"Promoter selected: {promoter_selected}")
print(f"Cloning strategy: {cloning_strategy}")
print(f"Expression level prediction: {expression_level_prediction:.2f}")
print(f"Purification tags: {purification_tags}")
print(f"Estimated yield: {estimated_yield:.3f} g/L")
print(f"Safety considerations: {safety_considerations}")

# Design validation
if expression_level_prediction > 0.5 and not expression_issues:
    design_assessment = "Promising design - likely successful expression"
elif expression_level_prediction > 0.2:
    design_assessment = "Workable design - moderate expression expected"
else:
    design_assessment = "Suboptimal design - consider improvements"
    
print(f"Design assessment: {design_assessment}")

How would you modify your design if the therapeutic protein required proper eukaryotic post-translational modifications?

ELI10 Explanation

Simple analogy for better understanding

Think of genetic engineering like being a molecular editor who can carefully cut, copy, paste, and rewrite the sentences in the book of life (DNA/RNA). Just like a writer can edit a manuscript to improve it, scientists can edit genes to fix problems, add new capabilities, or remove harmful traits. CRISPR is like a very precise molecular word processor - it can find a specific 'sentence' in the DNA book and make precise edits (like changing a single letter or deleting a paragraph). Recombinant DNA is like creating hybrid books by combining chapters from different manuals to create something new and useful. Synthetic biology is like writing entirely new books of instruction that don't exist in nature - creating brand new biological tools and systems from scratch. It's like having a toolkit for rewriting the code of life itself, allowing scientists to create new medicines, improve crops, develop biodegradable plastics, and potentially cure genetic diseases.

Self-Examination

Q1.

How does the CRISPR-Cas system work and what makes it so precise?

Q2.

What are the key steps involved in creating recombinant DNA molecules?

Q3.

What are the applications and potential risks of synthetic biology?