Systems Biology & Omics Technologies
Genomics, transcriptomics, proteomics, metabolomics, multi-omics integration and analysis, systems modeling and network analysis, single-cell sequencing technologies, spatial transcriptomics and proteomics.
Systems Biology & Omics Technologies
Systems biology is an interdisciplinary field focusing on the study of complex interactions within biological systems using computational and mathematical modeling. It emphasizes the systematic study of the interactions between multiple biological components to understand complex behaviors and emergent properties.
Omics Technologies Overview
Genomics
Whole Genome Sequencing
Variant Calling
Bayesian approach for genotype calling.
Structural Variation Detection
Where represents read depth ratios.
Transcriptomics
RNA Sequencing (RNA-Seq)
Differential Expression Analysis
Where is expression for gene , sample ; is gene effect; is sample effect.
Proteomics
Mass Spectrometry-Based Proteomics
Label-Free Quantification
Isobaric Tagging (TMT/iTRAQ)
Where represents reporter ion signal intensities.
Metabolomics
Targeted vs Untargeted Metabolomics
Metabolite Identification
Data Integration and Analysis
Multi-Omics Integration
Concatenation Approach
Factor Analysis Approaches
Where is the data matrix for omics layer , is the loading matrix, is the shared factor matrix, and is the noise matrix.
Network-Based Integration
Where is genes, is regulatory edges, and is adjacency matrix.
Dimensionality Reduction
Principal Component Analysis (PCA)
t-SNE and UMAP
Clustering and Classification
Unsupervised Clustering
Supervised Classification
Network Biology
Gene Regulatory Networks
Boolean Networks
Where is the state vector at time .
Differential Equation Models
Where is the state variables, are inputs, and are parameters.
Protein-Protein Interaction Networks
Network Topology Measures
Where is number of shortest paths from to , passes through , is edges between neighbors of , and is degree of node .
Metabolic Networks
Flux Balance Analysis
Where is stoichiometric matrix, is flux vector.
Optimization Problem
Single-Cell Technologies
Single-Cell RNA Sequencing (scRNA-Seq)
Unique Molecular Identifier (UMI) Quantification
Quality Control Metrics
Normalization Approaches
Single-Cell ATAC-Seq
Spatial Transcriptomics
Spatial Coordinates and Gene Expression
Where are spatial coordinates and are gene expressions.
Spatial Clustering
Where is spatial weight between locations and .
Single-Cell Analysis Methods
Dimensionality Reduction for scRNA-Seq
Clustering in Single-Cell Data
Where is adjacency matrix, is degree of node , is total edges, is cluster of node .
Pseudotime Analysis
Computational Methods
Machine Learning in Systems Biology
Supervised Learning
Unsupervised Learning
Deep Learning
Network Inference
Correlation-Based Networks
Partial Correlation
Where is correlation matrix.
Differential Network Analysis
Applications in Medicine and Research
Disease Subtyping
Drug Response Prediction
Biomarker Discovery
Challenges and Considerations
Computational Challenges
- Scale: Millions of cells × thousands of features
- Noise: Technical and biological variability
- Integration: Combining different data modalities
Statistical Considerations
- Multiple testing: Bonferroni, FDR corrections
- Batch effects: ComBat, Harmony, Seurat integration
- Confounding factors: Cell cycle, quality metrics
Biological Interpretation
- Causality: Correlation vs causation
- Context dependence: Cell type, tissue, condition
- Validation: Experimental follow-up required
Real-World Application: Cancer Multi-Omics Analysis
Integrating multiple omics layers reveals the complex molecular landscape of cancer.
Cancer Multi-Omics Integration
# Multi-omics integration in cancer genomics
cancer_data = {
'genomics': {
'mutations': 12, # Number of mutations per MB
'cnv_burden': 0.8, # Fraction of genome with CNV
'tmb': 15, # Tumor mutational burden (mutations per MB)
'msi_status': 'MSS' # Microsatellite instability status
},
'transcriptomics': {
'tumor_purity': 0.7, # Estimated fraction of tumor cells
'immune_infiltration': 0.4, # Immune cell abundance
'proliferation_score': 0.65, # Cell cycle activity
'subtype_signature': 'Basal'
},
'clinical': {
'tumor_stage': 'III',
'grade': 3,
'patient_age': 58,
'treatment_response': 'Partial'
}
}
# Calculate oncogenic pathway activity
# Based on mutation patterns and expression changes
oncogenic_pathways = {
'p53_pathway': 0.9, # High activity (frequently mutated)
'RAS_MAPK': 0.4, # Moderate activity
'PI3K_AKT': 0.7, # High activity
'cell_cycle': 0.8, # High activity (consistent with proliferation)
'immune_check': 0.6 # Moderate immune checkpoint activity
}
# Calculate synthetic lethality scores
# For potential drug targeting
synthetic_lethality_targets = []
for pathway, activity in oncogenic_pathways.items():
if activity > 0.7:
# Identify potential synthetic lethal partners
score = (1 - activity) * cancer_data['genomics']['tmb'] / 10
synthetic_lethality_targets.append((pathway, score))
# Predict immunotherapy response
# Based on TMB, MSI, and immune infiltration
tmb_score = min(cancer_data['genomics']['tmb'] / 10, 1.0) # Normalize
msi_score = 1.0 if cancer_data['genomics']['msi_status'] == 'MSI' else 0.3
infiltration_score = cancer_data['transcriptomics']['immune_infiltration']
immunotherapy_response = (tmb_score * 0.4 + msi_score * 0.3 + infiltration_score * 0.3) * 100
# Calculate driver mutation probability
driver_probabilities = {}
for i in range(cancer_data['genomics']['mutations']):
# Based on known cancer genes and mutation types
prob = 0.15 if i < 5 else 0.05 # Higher probability for early mutations
driver_probabilities[f'mutation_{i}'] = prob
predicted_drivers = sum(1 for p in driver_probabilities.values() if p > 0.1)
print(f"Cancer multi-omics analysis:")
print(f" Genomics:")
print(f" TMB: {cancer_data['genomics']['tmb']} mutations/MB")
print(f" CNV burden: {cancer_data['genomics']['cnv_burden']}")
print(f" MSI status: {cancer_data['genomics']['msi_status']}")
print(f" Transcriptomics:")
print(f" Tumor purity: {cancer_data['transcriptomics']['tumor_purity']}")
print(f" Immune infiltration: {cancer_data['transcriptomics']['immune_infiltration']}")
print(f" Proliferation: {cancer_data['transcriptomics']['proliferation_score']}")
print(f" Predicted driver mutations: {predicted_drivers}")
print(f" Immunotherapy response score: {immunotherapy_response:.1f}%")
# Identify potential therapeutic targets
high_activity_pathways = [path for path, act in oncogenic_pathways.items() if act > 0.6]
print(f" High-activity pathways: {high_activity_pathways}")
# Assess tumor heterogeneity
clonal_mutations = int(cancer_data['genomics']['mutations'] * cancer_data['transcriptomics']['tumor_purity'])
subclonal_mutations = cancer_data['genomics']['mutations'] - clonal_mutations
heterogeneity_index = subclonal_mutations / cancer_data['genomics']['mutations']
print(f" Estimated clonal mutations: {clonal_mutations}")
print(f" Estimated subclonal mutations: {subclonal_mutations}")
print(f" Heterogeneity index: {heterogeneity_index:.2f}")
print(f" Higher heterogeneity suggests more aggressive tumor")
Therapeutic Implications
Using systems biology to predict therapeutic responses.
Your Challenge: Network Analysis of Gene Expression Data
Analyze a gene expression dataset to identify regulatory networks and potential therapeutic targets.
Goal: Construct and analyze a gene co-expression network to identify key regulatory genes.
Gene Expression Dataset
import math
# Simulated gene expression dataset
expression_data = {
'genes': ['TP53', 'BRCA1', 'MYC', 'EGFR', 'PTEN', 'AKT1', 'RB1', 'CCND1'],
'samples': ['tumor_1', 'tumor_2', 'tumor_3', 'normal_1', 'normal_2'],
'expression_matrix': [
[4.2, 3.8, 4.5, 2.1, 2.3], # TP53
[3.1, 3.3, 2.9, 4.2, 4.0], # BRCA1
[5.8, 6.2, 5.5, 2.8, 3.0], # MYC
[4.9, 5.1, 4.7, 1.9, 2.0], # EGFR
[2.1, 2.2, 2.0, 3.8, 3.9], # PTEN
[4.5, 4.3, 4.7, 2.2, 2.1], # AKT1
[3.3, 3.1, 3.4, 4.5, 4.4], # RB1
[5.1, 5.3, 4.9, 1.8, 1.9] # CCND1
], # Expression values (log2 transformed)
'sample_classes': ['tumor', 'tumor', 'tumor', 'normal', 'normal'] # Tumor vs normal
}
# Calculate Pearson correlation matrix
n_genes = len(expression_data['genes'])
n_samples = len(expression_data['samples'])
correlation_matrix = [[0 for _ in range(n_genes)] for _ in range(n_genes)]
# Calculate mean expression for each gene
gene_means = []
for i in range(n_genes):
mean_expr = sum(expression_data['expression_matrix'][i]) / n_samples
gene_means.append(mean_expr)
# Calculate correlations
for i in range(n_genes):
for j in range(n_genes):
if i == j:
correlation_matrix[i][j] = 1.0
else:
# Pearson correlation coefficient
numerator = 0
sum_sq_diff_i = 0
sum_sq_diff_j = 0
for k in range(n_samples):
diff_i = expression_data['expression_matrix'][i][k] - gene_means[i]
diff_j = expression_data['expression_matrix'][j][k] - gene_means[j]
numerator += diff_i * diff_j
sum_sq_diff_i += diff_i**2
sum_sq_diff_j += diff_j**2
denominator = math.sqrt(sum_sq_diff_i * sum_sq_diff_j)
if denominator != 0:
corr_val = numerator / denominator
else:
corr_val = 0
correlation_matrix[i][j] = corr_val
# Create network based on correlation threshold
corr_threshold = 0.7 # Absolute correlation threshold
network_edges = []
for i in range(n_genes):
for j in range(i+1, n_genes):
if abs(correlation_matrix[i][j]) > corr_threshold:
network_edges.append({
'gene1': expression_data['genes'][i],
'gene2': expression_data['genes'][j],
'correlation': correlation_matrix[i][j],
'significant': True
})
# Calculate network properties
node_degrees = [0 for _ in range(n_genes)]
for edge in network_edges:
idx1 = expression_data['genes'].index(edge['gene1'])
idx2 = expression_data['genes'].index(edge['gene2'])
node_degrees[idx1] += 1
node_degrees[idx2] += 1
# Identify hub genes (nodes with high connectivity)
avg_degree = sum(node_degrees) / n_genes
hub_genes = []
for i in range(n_genes):
if node_degrees[i] >= avg_degree + 0.5:
hub_genes.append(expression_data['genes'][i])
# Calculate differential expression (tumor vs normal)
diff_expr_results = {}
for i, gene in enumerate(expression_data['genes']):
tumor_expr = [expression_data['expression_matrix'][i][k] for k in range(3)]
normal_expr = [expression_data['expression_matrix'][i][k] for k in range(3,5)]
mean_tumor = sum(tumor_expr) / len(tumor_expr)
mean_normal = sum(normal_expr) / len(normal_expr)
fold_change = 2**(mean_tumor - mean_normal) # Convert from log2 scale
diff_expr_results[gene] = {
'fold_change': fold_change,
'tumor_mean': mean_tumor,
'normal_mean': mean_normal,
'is_significant': abs(mean_tumor - mean_normal) > 1.0 # Simple threshold
}
# Identify potential therapeutic targets
potential_targets = []
for gene, stats in diff_expr_results.items():
if stats['is_significant'] and stats['fold_change'] > 2:
# Check if it's connected to hub genes
gene_idx = expression_data['genes'].index(gene)
is_connected_to_hub = any(
abs(correlation_matrix[gene_idx][expression_data['genes'].index(hub)]) > 0.5
for hub in hub_genes
)
if is_connected_to_hub:
potential_targets.append(gene)
Analyze the gene co-expression network to identify key regulatory genes and therapeutic targets.
Hint:
- Calculate correlation coefficients between gene expression profiles
- Create network based on significant correlations
- Identify hub genes with high connectivity
- Consider differential expression between conditions
- Identify genes that are both deregulated and highly connected
# TODO: Calculate network analysis results
correlation_matrix = [[0 for _ in range(len(expression_data['genes']))] for _ in range(len(expression_data['genes']))] # Gene-gene correlation matrix
network_edges = [] # List of significant gene-gene connections
hub_genes = [] # Genes with high connectivity
differential_genes = {} # Genes with significant tumor vs normal differences
potential_therapeutic_targets = [] # Hub and differential genes combined
# Calculate correlation matrix (already computed above)
# Calculate network edges (already computed above)
# Identify hub genes (already computed above)
# Calculate differential expression
for gene, stats in diff_expr_results.items():
if stats['is_significant']:
differential_genes[gene] = stats
# Identify potential therapeutic targets
potential_therapeutic_targets = []
for gene in potential_targets:
potential_therapeutic_targets.append({
'gene': gene,
'fold_change': diff_expr_results[gene]['fold_change'],
'correlation_with_hub': max([abs(correlation_matrix[expression_data['genes'].index(gene)][expression_data['genes'].index(hub)]) for hub in hub_genes]),
'degree': node_degrees[expression_data['genes'].index(gene)]
})
# Print results
print(f"Network analysis results:")
print(f" Number of genes: {len(expression_data['genes'])}")
print(f" Number of significant correlations: {len(network_edges)}")
print(f" Identified hub genes: {hub_genes}")
print(f" Number of differentially expressed genes: {len(differential_genes)}")
print(f" Potential therapeutic targets: {[t['gene'] for t in potential_therapeutic_targets]}")
# Network properties
if hub_genes:
connectivity_analysis = f"Average connectivity: {sum(node_degrees)/len(node_degrees):.2f}"
else:
connectivity_analysis = "No network hubs identified"
print(f" Network connectivity: {connectivity_analysis}")
# Therapeutic target ranking
target_scores = [(target['gene'], target['fold_change'] * target['degree']) for target in potential_therapeutic_targets]
target_scores.sort(key=lambda x: x[1], reverse=True)
if target_scores:
top_target = target_scores[0][0]
print(f" Top potential therapeutic target: {top_target}")
else:
print(f" No significant therapeutic targets identified")
# Functional implications
if any(abs(corr) > 0.8 for row in correlation_matrix for corr in row if corr != 1):
functional_implication = "Highly co-regulated gene modules identified"
else:
functional_implication = "Limited co-regulation detected"
print(f" Functional implication: {functional_implication}")
How might the network analysis results differ if you analyzed single-cell RNA-seq data compared to bulk RNA-seq data?
ELI10 Explanation
Simple analogy for better understanding
Self-Examination
How do different omics technologies complement each other in systems biology?
What are the challenges in integrating multi-omics data?
How can network analysis reveal regulatory mechanisms in biological systems?