Chapter 10

Structural Biology & Drug Design

X-ray crystallography and cryo-electron microscopy techniques, protein folding and design, molecular docking and virtual screening, structure-based drug design, GPCR and ion channel targeting.

Structural Biology & Drug Design

Structural biology is the study of the molecular architecture of biological macromolecules, particularly proteins and nucleic acids. Understanding the three-dimensional structure of biological molecules is crucial for understanding their function and for designing therapeutic interventions.

Protein Structure and Folding

Protein Structure Hierarchy

Primary Structure

Protein sequence=i=1nAmino acidi\text{Protein sequence} = \prod_{i=1}^{n} \text{Amino acid}_i

Secondary Structure

Secondary structure=f(local interactions,amino acid properties)\text{Secondary structure} = f(\text{local interactions}, \text{amino acid properties})
α-Helix
Rise per turn=5.4A˚,Residues per turn=3.6\text{Rise per turn} = 5.4\text{Å}, \quad \text{Residues per turn} = 3.6 ϕ=57°,ψ=47°\phi = -57°, \quad \psi = -47°
β-Sheet
  • Parallel: ϕ119°,ψ113°\phi \approx -119°, \psi \approx 113°
  • Antiparallel: ϕ139°,ψ135°\phi \approx -139°, \psi \approx 135°

Tertiary and Quaternary Structure

Protein Folding Problem

SequencefoldingNative structure\text{Sequence} \xrightarrow{\text{folding}} \text{Native structure}

Levinthal's Paradox

Possible conformations=3N\text{Possible conformations} = 3^N

For a 100-residue protein: 31005×10473^{100} \approx 5 \times 10^{47} possible conformations

Folding Thermodynamics

ΔGfold=ΔHfoldTΔSfold\Delta G_{fold} = \Delta H_{fold} - T\Delta S_{fold} Folded state stability=ΔGfold=RTln[folded][unfolded]\text{Folded state stability} = \Delta G_{fold} = RT \ln \frac{[folded]}{[unfolded]}

Experimental Structure Determination

X-ray Crystallography

Structure Factor

Fhkl=jfjexp[2πi(hxj+kyj+lzj)]F_hkl = \sum_j f_j \exp[-2\pi i(hx_j + ky_j + lz_j)]

Where fjf_j is the scattering factor of atom jj at position (xj,yj,zj)(x_j, y_j, z_j).

Resolution

Resolution12λsin(θmax)\text{Resolution} \approx \frac{1}{2} \frac{\lambda}{\sin(\theta_{max})}

Phase Problem

Structure determination:FFelectron density\text{Structure determination}: |\mathbf{F}| \rightarrow \mathbf{F} \rightarrow \text{electron density}

Data Processing Steps

  1. Indexing: Determine unit cell parameters
  2. Integration: Measure reflection intensities
  3. Scaling: Normalize for experimental effects
  4. Merging: Combine symmetry-related reflections

Data Quality Metrics

Rmerge=hk,lIhkl(i)Ihklhk,lIhklR_{merge} = \frac{\sum_h \sum_{k,l} |\mathbf{I}_{hkl}^{(i)} - \langle\mathbf{I}_{hkl}\rangle|}{\sum_h \sum_{k,l} \langle\mathbf{I}_{hkl}\rangle} CC1/2=στ2στ2+σε2CC_{1/2} = \frac{\sigma^2_{\tau}}{\sigma^2_{\tau} + \sigma^2_{\varepsilon}}

Cryo-Electron Microscopy (cryo-EM)

Resolution Limits

Theoretical limit:λ2=h2p=hc2E0.02A˚ for 300 keV electrons\text{Theoretical limit}: \frac{\lambda}{2} = \frac{h}{2p} = \frac{hc}{2E} \approx 0.02\text{Å for 300 keV electrons}

Contrast Transfer Function (CTF)

CTF(u,v,θ)=sin[χ(u,v)]exp[12u2+v2]2]\text{CTF}(u,v,\mathbf{\theta}) = -\sin[\chi(u,v)] \cdot \exp[-\frac{1}{2}u^2 + v^2]^2]

Single-Particle Analysis

3D reconstruction=i=1NPi[Projectioni(particle,ϕi,θi,ψi)]\text{3D reconstruction} = \sum_{i=1}^{N} \mathcal{P}_{i}[\text{Projection}_i(\text{particle}, \phi_i, \theta_i, \psi_i)]

Nuclear Magnetic Resonance (NMR)

Chemical Shift

σ=νsampleνreferenceνreference\sigma = \frac{\nu_{sample} - \nu_{reference}}{\nu_{reference}}

Nuclear Overhauser Effect (NOE)

NOE1r6\text{NOE} \propto \frac{1}{r^6}

Protein Structure Prediction

Homology Modeling

Query sequencealignTemplate structuremodelPredicted structure\text{Query sequence} \xrightarrow{\text{align}} \text{Template structure} \xrightarrow{\text{model}} \text{Predicted structure}

Model Quality Assessment

DOPE score=i,jϕstat(rij)\text{DOPE score} = \sum_{i,j} \phi_{stat}(r_{ij}) z-score=model scorerandom scoreσrandom\text{z-score} = \frac{\text{model score} - \langle\text{random score}\rangle}{\sigma_{random}}

AlphaFold & Deep Learning Approaches

SequenceAttention networksDistance matrixStructure prediction3D coordinates\text{Sequence} \xrightarrow{\text{Attention networks}} \text{Distance matrix} \xrightarrow{\text{Structure prediction}} \text{3D coordinates}

Attention Mechanism

Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q,K,V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

Molecular Docking and Virtual Screening

Binding Free Energy

ΔGbind=ΔGcomplex(ΔGligand+ΔGprotein)\Delta G_{bind} = \Delta G_{complex} - (\Delta G_{ligand} + \Delta G_{protein})

Scoring Functions

Force Field-Based

Etotal=Ebond+Eangle+Etorsion+EvdW+EelectrostaticE_{total} = E_{bond} + E_{angle} + E_{torsion} + E_{vdW} + E_{electrostatic}

Empirical Scoring

ΔGbind=iwifi\Delta G_{bind} = \sum_{i} w_i \cdot f_i

Where fif_i are interaction terms and wiw_i are weights.

Knowledge-Based Potentials

Epot=kTln(pobspref)E_{pot} = -kT \ln\left(\frac{p_{obs}}{p_{ref}}\right)

Docking Algorithms

Rigid Docking

minimize:E(r,θ)\text{minimize}: E(\mathbf{r}, \mathbf{\theta})

Flexible Docking

minimize:E(r,θ,ϕ)\text{minimize}: E(\mathbf{r}, \mathbf{\theta}, \mathbf{\phi})

Where ϕ\mathbf{\phi} represents ligand torsions.

Structure-Based Drug Design

Lead Optimization

Structure-activity relationship (SAR):IC50=f(molecular features)\text{Structure-activity relationship (SAR)}: IC_{50} = f(\text{molecular features})

Lipinski's Rule of Five

  • Molecular weight ≤ 500 Da
  • LogP ≤ 5
  • H-bond donors ≤ 5
  • H-bond acceptors ≤ 10

Fragment-Based Drug Discovery

Low molecular weight fragmentsgrow/linkHigh-affinity inhibitors\text{Low molecular weight fragments} \xrightarrow{\text{grow/link}} \text{High-affinity inhibitors}

Pharmacophore Modeling

Pharmacophore={features,geometry}\text{Pharmacophore} = \{\text{features}, \text{geometry}\}

Required Features

  • Hydrogen bond donors/acceptors
  • Hydrophobic regions
  • Aromatic rings
  • Excluded volumes

Target Classes

G-Protein Coupled Receptors (GPCRs)

Structural Features

  • Seven transmembrane helices (TM1-TM7)
  • Ligand binding pocket: Between helices
  • Conformational states: Active/inactive

Modeling Challenges

Conformational complexitySoluble proteins\text{Conformational complexity} \gg \text{Soluble proteins} Membrane environment:Requires specialized modeling\text{Membrane environment}: \text{Requires specialized modeling}

GPCR Classification

  • Class A (Rhodopsin-like)
  • Class B (Secretin-like)
  • Class C (Metabotropic glutamate)
  • Class F (Frizzled)

Ion Channels

Structural Considerations

  • Gating mechanism: Conformational changes
  • Selectivity filter: Discrimination between ions
  • Voltage sensitivity: Voltage-dependent gates

Kinases

ATP binding site=Common target for anti-cancer drugs\text{ATP binding site} = \text{Common target for anti-cancer drugs}

Kinase Inhibitors

  • Type I: Bind active conformation
  • Type II: Bind inactive conformation
  • Type III: Allosteric modulators

Drug-Target Interactions

Binding Determinants

ΔGbind=ΔGdesolvation+ΔGinteraction+entropic term\Delta G_{bind} = \Delta G_{desolvation} + \Delta G_{interaction} + \text{entropic term}

Intermolecular Forces

  • Hydrogen bonds: 1-5 kcal/mol
  • Hydrophobic interactions: 0.5-2 kcal/mol
  • Electrostatic interactions: 1-5 kcal/mol
  • π-stacking: 1-5 kcal/mol

Induced Fit

Protein+LigandbindingProtein-ligand complex (induced fit)\text{Protein} + \text{Ligand} \xrightarrow{\text{binding}} \text{Protein-ligand complex (induced fit)}

ADMET Prediction

Absorption, Distribution, Metabolism, Excretion, Toxicity

ADMET properties=f(physicochemical parameters)\text{ADMET properties} = f(\text{physicochemical parameters})

Prediction Models

  • Caco-2 permeability: Intestinal absorption
  • P-gp substrate: Efflux potential
  • CYP450 metabolism: Drug-drug interactions
  • hERG inhibition: Cardiotoxicity

Advanced Techniques

Free Energy Perturbation (FEP)

ΔA=kTlnexp[βΔH]0\Delta A = -kT \ln \langle \exp[-\beta \Delta H] \rangle_0

Molecular Dynamics Simulations

d2ridt2=1miFi(ri)\frac{d^2\mathbf{r}_i}{dt^2} = \frac{1}{m_i} \mathbf{F}_i(\mathbf{r}_i)

Alchemical Transformations

State AλState B\text{State A} \xrightarrow{\lambda} \text{State B}

Where λ\lambda controls the transformation.

Drug Discovery Pipeline

Hit Identification

Compound libraryscreeningHitsconfirmationValidated hits\text{Compound library} \xrightarrow{\text{screening}} \text{Hits} \xrightarrow{\text{confirmation}} \text{Validated hits}

Lead Optimization

HitSARLead compoundOptimizationCandidate\text{Hit} \xrightarrow{\text{SAR}} \text{Lead compound} \xrightarrow{\text{Optimization}} \text{Candidate}

Preclinical Development

In vitroIn vivoPharmacokineticsToxicologyIND filing\text{In vitro} \xrightarrow{\text{In vivo}} \text{Pharmacokinetics} \xrightarrow{\text{Toxicology}} \text{IND filing}

Challenges and Future Directions

Computational Limitations

  • Force field accuracy: Systematic errors
  • Implicit solvent models: Missing explicit effects
  • Conformational sampling: Limited exploration
  • Quantum effects: Born-Oppenheimer approximation

Emerging Technologies

  • AI-driven design: Generative models
  • Quantum computing: Quantum chemical calculations
  • Cryo-EM advances: Higher resolution structures
  • Machine learning: Predictive models

Real-World Application: Structure-Based Drug Design for SARS-CoV-2

Structure-based drug design has been crucial in developing antivirals for SARS-CoV-2.

SARS-CoV-2 Main Protease (Mpro) Inhibitors

# Structure-based drug design for SARS-CoV-2 Mpro
target_info = {
    'enzyme': 'Main protease (Mpro)',
    'function': 'Essential for viral replication',
    'substrate_cleavage_site': '(N-terminal)-Leu-Gln-|-[Ser-Ala-Nle-Phe-Gln-Gly]-C-terminal',
    'binding_pocket': 'Cys145 and His41 form catalytic dyad',
    'resolution_structure': 2.1,  # Angstroms
    'binding_affinity_known': 0.016,  # micromolar (for reference compound)
    'active_conformation': 'Antiparallel β-barrel dimer'
}

# Calculate binding energetics for drug design
kbt = 0.6  # kcal/mol at room temperature
binding_energy_reference = -kbt * math.log(target_info['binding_affinity_known'] * 1e-6)  # Convert μM to M

# Structure-based design considerations
binding_pocket_volume = 150  # Angstroms³ (approximate)
pocket_hydrophobicity = 0.6  # Fraction of hydrophobic contacts
pocket_flexibility = 0.3  # Fraction of flexible residues

# Calculate optimal ligand efficiency (LE)
molecular_weight = 500  # Da (typical drug size)
heavy_atoms = 35  # Number of non-hydrogen atoms
ligand_efficiency = binding_energy_reference / (heavy_atoms / 10)  # kcal/mol per 10 non-H atoms

# Estimate synthetic accessibility score (SAS)
sas_score = 3.2  # Typical for a drug-like compound
complexity_factor = 0.8  # Complexity penalty

# Predict binding affinity improvement
# For fragment-based optimization
fragment_size = 200  # Da
fragment_binding_energy = -6  # kcal/mol (typical)
linking_efficiency = 0.3  # Additional kcal/mol per connection

# Calculate predicted improvements
base_affinity = target_info['binding_affinity_known']  # microM
optimization_factor = 10 ** (-ligand_efficiency / (kbt * 2))  # Assuming 2-fold improvement per LE
predicted_improvement = base_affinity / optimization_factor

print(f"SARS-CoV-2 Mpro structure-based drug design:")
print(f"  Target enzyme: {target_info['enzyme']}")
print(f"  Function: {target_info['function']}")
print(f"  Catalytic residues: Cys145 and His41 (dyad)")
print(f"  Structure resolution: {target_info['resolution_structure']} Å")
print(f"  Reference binding affinity: {target_info['binding_affinity_known']} μM")
print(f"  Estimated binding energy: {binding_energy_reference:.1f} kcal/mol")
print(f"  Binding pocket volume: ~{binding_pocket_volume} ų")
print(f"  Pocket hydrophobicity: {pocket_hydrophobicity*100:.0f}%")
print(f"  Ligand efficiency: {ligand_efficiency:.2f} kcal/mol per 10 atoms")

# Assess druggability
if binding_energy_reference < -8:
    druggability = "High - strong binding predicted"
elif binding_energy_reference < -6:
    druggability = "Moderate - good binding potential"
else:
    druggability = "Low - challenging target"

print(f"  Target druggability: {druggability}")

# Clinical relevance
if predicted_improvement < 0.01:
    clinical_potential = "High - sub-micromolar inhibitors possible"
elif predicted_improvement < 0.1:
    clinical_potential = "Moderate - micromolar inhibitors feasible"
else:
    clinical_potential = "Low - significant optimization needed"

print(f"  Predicted clinical potential: {clinical_potential}")
print(f"  Synthetic accessibility score: {sas_score} (lower is better)")

Structure-Based Optimization

Analysis of how structure-guided design improves drug efficacy.


Your Challenge: Structure-Based Drug Design

Design a drug molecule targeting a specific protein using structure-based design principles.

Goal: Create a potential inhibitor for a protein target based on structural information.

Target Protein Analysis

import math

# Target protein structure data
target_data = {
    'name': 'ACE2',
    'function': 'Angiotensin converting enzyme',
    'binding_site_volume': 250,  # Angstroms cubed
    'binding_site_hydrophobicity': 0.7,  # 0-1 scale
    'binding_site_flexibility': 0.2,     # 0-1 scale
    'known_ligand_affinity': 15,         # nM (nanomolar)
    'catalytic_residues': ['His353', 'Glu384', 'Lys357'],
    'surface_charge': -1.5,  # Overall charge (negative)
    'interacting_residues': ['His353', 'Lys357', 'Gln388', 'Asp355']
}

# Available chemical fragments for design
fragments = {
    'benzene': {'weight': 78, 'hydrophobic': 0.8, 'h_bond_donor': 0, 'h_bond_acceptor': 0, 'rotatable_bonds': 0},
    'imidazole': {'weight': 68, 'hydrophobic': 0.4, 'h_bond_donor': 2, 'h_bond_acceptor': 1, 'rotatable_bonds': 0},
    'carboxylic_acid': {'weight': 46, 'hydrophobic': 0.1, 'h_bond_donor': 1, 'h_bond_acceptor': 2, 'rotatable_bonds': 1},
    'guanidine': {'weight': 43, 'hydrophobic': 0.1, 'h_bond_donor': 3, 'h_bond_acceptor': 1, 'rotatable_bonds': 1},
    'amide': {'weight': 43, 'hydrophobic': 0.3, 'h_bond_donor': 1, 'h_bond_acceptor': 1, 'rotatable_bonds': 1}
}

# Calculate target binding pocket properties
pocket_volume = target_data['binding_site_volume']
hydrophobic_complementarity = target_data['binding_site_hydrophobicity']
flexibility_complementarity = target_data['binding_site_flexibility']

# Calculate desired ligand properties
desired_molecular_weight = 300  # Da (optimal for drug-like properties)
max_rotatable_bonds = 10       # Flexibility constraint
optimal_h_bond_donors = 2      # For solubility and binding
optimal_h_bond_acceptors = 5   # For binding interactions

# Calculate binding energy prediction
# Using simplified relationship: binding energy ∝ -log(IC50)
kbt = 0.6  # kcal/mol at room temp
reference_affinity = target_data['known_ligand_affinity'] * 1e-9  # Convert nM to M
reference_binding_energy = -kbt * math.log(reference_affinity)

# Design approach: maximize complementarity while maintaining drug-likeness
# Calculate fragment contributions
fragment_weights = [frag['weight'] for frag in fragments.values()]
total_desired_weight = 0
selected_fragments = []

# Select fragments based on complementarity with binding site
hydrophobic_fragments = ['benzene']  # Fragments with high hydrophobicity
polar_fragments = ['carboxylic_acid', 'guanidine', 'amide', 'imidazole']  # Fragments with H-bond capacity

# Select fragments based on target complementarity
if hydrophobic_complementarity > 0.5:
    selected_fragments.extend(hydrophobic_fragments)
if target_data['surface_charge'] < 0:  # Negative surface favors positive groups
    selected_fragments.extend(['guanidine'])  # Positively charged group
else:
    selected_fragments.extend(['carboxylic_acid'])  # Negatively charged group

# Calculate predicted properties
total_weight = sum(fragments[frag]['weight'] for frag in selected_fragments)
total_h_donors = sum(fragments[frag]['h_bond_donor'] for frag in selected_fragments)
total_h_acceptors = sum(fragments[frag]['h_bond_acceptor'] for frag in selected_fragments)
total_rotatable = sum(fragments[frag]['rotatable_bonds'] for frag in selected_fragments)

# Estimate binding affinity improvement
# Simpler model: more H-bond interactions → better binding
h_bond_score = (min(total_h_donors, optimal_h_bond_donors) + min(total_h_acceptors, optimal_h_bond_acceptors)) / (optimal_h_bond_donors + optimal_h_bond_acceptors)
hydrophobic_score = (1 - abs(hydrophobic_complementarity - 0.5) * 2)  # Score based on complementarity
complexity_penalty = 1 / (1 + total_rotatable * 0.1)  # Penalty for too many rotatable bonds

predicted_affinity_improvement = h_bond_score * hydrophobic_score * complexity_penalty
estimated_ic50 = reference_affinity / predicted_affinity_improvement  # Moles, converted from reference

Design a potential inhibitor for the target protein based on structural considerations.

Hint:

  • Consider structural complementarity (hydrophobic vs polar interactions)
  • Balance molecular properties (molecular weight, H-bond donors/acceptors)
  • Account for binding site flexibility and charge complementarity
  • Estimate potential binding affinity improvement
# TODO: Calculate drug design parameters
predicted_binding_affinity = 0  # nM (predicted IC50 in nanomolar)
ligand_efficiency = 0          # kcal/mol per heavy atom
drug_likeness_score = 0        # Overall drug-likeness metric (0-1 scale)
optimization_potential = 0     # Potential for further improvements (%)
recommended_fragments = []     # List of fragments to incorporate in design

# Calculate predicted binding affinity
predicted_binding_affinity = estimated_ic50 * 1e9  # Convert from M to nM

# Calculate ligand efficiency (binding energy per heavy atom)
binding_energy = -kbt * math.log(predicted_binding_affinity * 1e-9)  # Convert to kcal/mol
ligand_efficiency = binding_energy / (total_weight / 12)  # Per heavy atom (approximate)

# Calculate drug-likeness score
mw_score = 1 if total_weight <= 500 else 0.5 * math.exp(-(total_weight-500)/200)  # Penalty for large MW
hbd_score = 1 if total_h_donors <= 5 else 1/(total_h_donors-4)  # Penalty for too many H-bond donors
hba_score = 1 if total_h_acceptors <= 10 else 1/(total_h_acceptors-9)  # Penalty for too many H-bond acceptors
rotb_score = 1 if total_rotatable <= 10 else 1/(total_rotatable-9)  # Penalty for flexibility

drug_likeness_score = (mw_score + hbd_score + hba_score + rotb_score) / 4

# Calculate optimization potential
# Based on current design vs ideal properties
deviation_from_ideal = abs(total_weight - desired_molecular_weight)/desired_molecular_weight + \
                      abs(total_h_donors - optimal_h_bond_donors)/optimal_h_bond_donors + \
                      abs(total_h_acceptors - optimal_h_bond_acceptors)/optimal_h_bond_acceptors

optimization_potential = max(0, (1 - deviation_from_ideal) * 100)  # Percentage

# Select fragments for design
recommended_fragments = selected_fragments

# Print results
print(f"Predicted binding affinity: {predicted_binding_affinity:.2f} nM")
print(f"Ligand efficiency: {ligand_efficiency:.3f} kcal/mol per heavy atom")
print(f"Drug-likeness score: {drug_likeness_score:.3f}")
print(f"Optimization potential: {optimization_potential:.1f}%")
print(f"Recommended fragments: {recommended_fragments}")
print(f"Total molecular weight: {total_weight} Da")
print(f"H-bond donors: {total_h_donors}, H-bond acceptors: {total_h_acceptors}")

# Design assessment
if drug_likeness_score > 0.7 and predicted_binding_affinity < 100:
    design_assessment = "High-quality design with good drug-likeness and predicted potency"
elif drug_likeness_score > 0.5 and predicted_binding_affinity < 1000:
    design_assessment = "Moderate design with room for optimization"
else:
    design_assessment = "Needs significant optimization for drug-likeness or potency"
    
print(f"Design assessment: {design_assessment}")

# Suggested improvements
improvement_suggestions = []
if total_weight > 500:
    improvement_suggestions.append("Reduce molecular weight (consider fragment-based approach)")
if total_h_donors > 5:
    improvement_suggestions.append("Reduce H-bond donors to comply with Lipinski's rules")
if total_h_acceptors > 10:
    improvement_suggestions.append("Reduce H-bond acceptors")
if total_rotatable > 10:
    improvement_suggestions.append("Reduce flexibility to improve binding affinity")

print(f"Suggested improvements: {improvement_suggestions}")

How would your drug design approach change if you were targeting a protein-protein interface rather than a traditional enzyme active site?

ELI10 Explanation

Simple analogy for better understanding

Think of structural biology like being a molecular architect who can build 3D models of the tiniest LEGO sculptures in existence - these 'LEGO sculptures' are proteins, which are the working machines inside our cells. Just like a LEGO structure has a specific shape that determines what it can do (a LEGO car looks different from a LEGO house and functions differently), proteins have specific 3D shapes that determine what they do in the cell. Structural biology is like having a super-powerful microscope that can see these tiny molecular sculptures in incredible detail, allowing scientists to see exactly where drugs can 'fit' into proteins like a key fits into a lock. Drug design is then like custom-building special keys that fit perfectly into these molecular locks to either activate them, deactivate them, or fix them when they're broken. Understanding the 3D shape of proteins is like having the blueprint for every machine in the cell's factory, allowing scientists to design new medicines that fit precisely into their target proteins.

Self-Examination

Q1.

How do X-ray crystallography and cryo-electron microscopy differ in determining protein structures?

Q2.

What is the role of molecular docking in drug discovery and design?

Q3.

How do structure-based drug design approaches improve therapeutic specificity?