Chapter 11

Systems Biology & Omics Technologies

Genomics, transcriptomics, proteomics, metabolomics, multi-omics integration and analysis, systems modeling and network analysis, single-cell sequencing technologies, spatial transcriptomics and proteomics.

Systems Biology & Omics Technologies

Systems biology is an interdisciplinary field focusing on the study of complex interactions within biological systems using computational and mathematical modeling. It emphasizes the systematic study of the interactions between multiple biological components to understand complex behaviors and emergent properties.

Omics Technologies Overview

Genomics

Genomics={DNA sequence,genomic variation,regulatory elements}\text{Genomics} = \{\text{DNA sequence}, \text{genomic variation}, \text{regulatory elements}\}

Whole Genome Sequencing

Coverage=Total bases sequencedGenome size\text{Coverage} = \frac{\text{Total bases sequenced}}{\text{Genome size}} Coverage uniformity=Standard deviation of coverageMean coverage\text{Coverage uniformity} = \frac{\text{Standard deviation of coverage}}{\text{Mean coverage}}

Variant Calling

P(genotypedata)P(datagenotype)×P(genotype)P(\text{genotype}|\text{data}) \propto P(\text{data}|\text{genotype}) \times P(\text{genotype})

Bayesian approach for genotype calling.

Structural Variation Detection

Copy number=RtestRreference×2\text{Copy number} = \frac{R_{\text{test}}}{R_{\text{reference}}} \times 2

Where RR represents read depth ratios.

Transcriptomics

RNA Sequencing (RNA-Seq)

Expression level=Reads mapped to geneGene length (kb)×Total mapped reads (millions)=FPKM\text{Expression level} = \frac{\text{Reads mapped to gene}}{\text{Gene length (kb)} \times \text{Total mapped reads (millions)}} = \text{FPKM} TPM=Reads per kilobase(Reads per kilobase)/1,000,000\text{TPM} = \frac{\text{Reads per kilobase}}{\sum (\text{Reads per kilobase})/1,000,000}

Differential Expression Analysis

Log fold change=log2(condition Acondition B)\text{Log fold change} = \log_2 \left(\frac{\text{condition A}}{\text{condition B}}\right) Statistical model:yij=μ+αi+βj+εij\text{Statistical model}: y_{ij} = \mu + \alpha_i + \beta_j + \varepsilon_{ij}

Where yijy_{ij} is expression for gene ii, sample jj; αi\alpha_i is gene effect; βj\beta_j is sample effect.

Proteomics

Mass Spectrometry-Based Proteomics

Peptide identification:Observed massCalculated massTolerance\text{Peptide identification}: \frac{\text{Observed mass}}{\text{Calculated mass}} \leq \text{Tolerance} Quantification:Intensity=f(protein abundance,detection efficiency)\text{Quantification}: \text{Intensity} = f(\text{protein abundance}, \text{detection efficiency})

Label-Free Quantification

LFQ abundance=median(intensities) after normalization\text{LFQ abundance} = \text{median}(\text{intensities}) \text{ after normalization}

Isobaric Tagging (TMT/iTRAQ)

Ratio calculation:Rij=Schannel iSchannel j\text{Ratio calculation}: R_{ij} = \frac{S_{\text{channel i}}}{S_{\text{channel j}}}

Where SS represents reporter ion signal intensities.

Metabolomics

Targeted vs Untargeted Metabolomics

Targeted:Known metabolites with standardsQuantitative accuracy\text{Targeted}: \text{Known metabolites with standards} \rightarrow \text{Quantitative accuracy} Untargeted:All detectable metabolitesDiscovery potential\text{Untargeted}: \text{All detectable metabolites} \rightarrow \text{Discovery potential}

Metabolite Identification

Identification confidence=f(mass accuracy,retention time,MS/MS fragmentation,database match)\text{Identification confidence} = f(\text{mass accuracy}, \text{retention time}, \text{MS/MS fragmentation}, \text{database match})

Data Integration and Analysis

Multi-Omics Integration

Concatenation Approach

Xintegrated=[Xgenomics,Xtranscriptomics,Xproteomics,Xmetabolomics]\mathbf{X}_{\text{integrated}} = [\mathbf{X}_{\text{genomics}}, \mathbf{X}_{\text{transcriptomics}}, \mathbf{X}_{\text{proteomics}}, \mathbf{X}_{\text{metabolomics}}]

Factor Analysis Approaches

MOFA (Multi-Omics Factor Analysis):X(k)=W(k)Z+E(k)\text{MOFA (Multi-Omics Factor Analysis)}: \mathbf{X}^{(k)} = \mathbf{W}^{(k)} \mathbf{Z} + \mathbf{E}^{(k)}

Where X(k)\mathbf{X}^{(k)} is the data matrix for omics layer kk, W(k)\mathbf{W}^{(k)} is the loading matrix, Z\mathbf{Z} is the shared factor matrix, and E(k)\mathbf{E}^{(k)} is the noise matrix.

Network-Based Integration

Gene regulatory network:G=(V,E,A)\text{Gene regulatory network}: G = (V, E, A)

Where VV is genes, EE is regulatory edges, and AA is adjacency matrix.

Dimensionality Reduction

Principal Component Analysis (PCA)

X=UΣVT\mathbf{X} = \mathbf{U\Sigma V}^T PCi=j=1pXjloadingsij\text{PC}_i = \sum_{j=1}^{p} \mathbf{X}_j \cdot \text{loadings}_{ij}

t-SNE and UMAP

t-SNE:Pji=exp(xixj2/2σi2)kiexp(xixk2/2σi2)\text{t-SNE}: P_{j|i} = \frac{\exp(-||\mathbf{x}_i - \mathbf{x}_j||^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-||\mathbf{x}_i - \mathbf{x}_k||^2 / 2\sigma_i^2)} UMAP:Optimization of fuzzy topological structure preservation\text{UMAP}: \text{Optimization of fuzzy topological structure preservation}

Clustering and Classification

Unsupervised Clustering

K-means objective:mini=1kxjSixjμi2\text{K-means objective}: \min \sum_{i=1}^{k} \sum_{\mathbf{x}_j \in S_i} ||\mathbf{x}_j - \mathbf{\mu}_i||^2

Supervised Classification

Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} AUC=01TPR(FPR1(x))dx\text{AUC} = \int_0^1 \text{TPR}(FPR^{-1}(x)) dx

Network Biology

Gene Regulatory Networks

Boolean Networks

x(t+1)=f(x(t))\mathbf{x}(t+1) = f(\mathbf{x}(t))

Where x(t)\mathbf{x}(t) is the state vector at time tt.

Differential Equation Models

dxdt=f(x,u,t;θ)\frac{d\mathbf{x}}{dt} = \mathbf{f}(\mathbf{x}, \mathbf{u}, t; \mathbf{\theta})

Where x\mathbf{x} is the state variables, u\mathbf{u} are inputs, and θ\mathbf{\theta} are parameters.

Protein-Protein Interaction Networks

GPPI=(Vprotein,Einteraction)G_{PPI} = (V_{protein}, E_{interaction})

Network Topology Measures

Degree centrality:CD(v)=deg(v)V1\text{Degree centrality}: C_D(v) = \frac{\deg(v)}{|V|-1} Betweenness centrality:CB(v)=svtσst(v)σst\text{Betweenness centrality}: C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}} Clustering coefficient:Ci=2Eiki(ki1)\text{Clustering coefficient}: C_i = \frac{2|\mathcal{E}_i|}{k_i(k_i-1)}

Where σst\sigma_{st} is number of shortest paths from ss to tt, σst(v)\sigma_{st}(v) passes through vv, Ei\mathcal{E}_i is edges between neighbors of ii, and kik_i is degree of node ii.

Metabolic Networks

Flux Balance Analysis

dXdt=Sv(X,v)\frac{d\mathbf{X}}{dt} = \mathbf{Sv}(\mathbf{X}, \mathbf{v}) Steady-state:Sv=0\text{Steady-state}: \mathbf{Sv} = \mathbf{0}

Where S\mathbf{S} is stoichiometric matrix, v\mathbf{v} is flux vector.

Optimization Problem

maxcTv subject to Sv=0,vminvvmax\max \mathbf{c}^T \mathbf{v} \text{ subject to } \mathbf{Sv} = \mathbf{0}, \mathbf{v}_{min} \leq \mathbf{v} \leq \mathbf{v}_{max}

Single-Cell Technologies

Single-Cell RNA Sequencing (scRNA-Seq)

Unique Molecular Identifier (UMI) Quantification

Gene expression count={unique UMIs for gene}\text{Gene expression count} = |\{\text{unique UMIs for gene}\}|

Quality Control Metrics

Library complexity:Rseen, new2=(dCountXXdCountTot)2\text{Library complexity}: R^2_{\text{seen, new}} = \left(\frac{\text{dCountXX}}{\text{dCountTot}}\right)^2 Mitochondrial content:(mt genes)(all genes)\text{Mitochondrial content}: \frac{\sum(\text{mt genes})}{\sum(\text{all genes})}

Normalization Approaches

Log normalization:log(xij+1) where xij=Xijsize factorj×104\text{Log normalization}: \log(x_{ij} + 1) \text{ where } x_{ij} = \frac{X_{ij}}{\text{size factor}_j} \times 10^4 SCTransform:Xcorrected=Pearson residuals from regularized negative binomial regression\text{SCTransform}: \mathbf{X}_{\text{corrected}} = \text{Pearson residuals from regularized negative binomial regression}

Single-Cell ATAC-Seq

Accessibility score=normalized read count in genomic region\text{Accessibility score} = \text{normalized read count in genomic region} Motif enrichment:P-value from Fisher’s exact test=f(accessible peaks with motif)\text{Motif enrichment}: \text{P-value from Fisher's exact test} = f(\text{accessible peaks with motif})

Spatial Transcriptomics

Spatial Coordinates and Gene Expression

Xspatial=[x,y,g1,g2,...,gn]\mathbf{X}_{\text{spatial}} = [x, y, g_1, g_2, ..., g_n]

Where (x,y)(x,y) are spatial coordinates and gig_i are gene expressions.

Spatial Clustering

Moran’s I:I=Ni,jwiji,jwij(xixˉ)(xjxˉ)i(xixˉ)2\text{Moran's I}: I = \frac{N}{\sum_{i,j} w_{ij}} \frac{\sum_{i,j} w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_{i} (x_i - \bar{x})^2}

Where wijw_{ij} is spatial weight between locations ii and jj.

Single-Cell Analysis Methods

Dimensionality Reduction for scRNA-Seq

PCA in gene space:Xgenes=USVT\text{PCA in gene space}: \mathbf{X}_{\text{genes}} = \mathbf{USV}^T Graph-based embedding:Neighbors defined by gene expression similarity\text{Graph-based embedding}: \text{Neighbors defined by gene expression similarity}

Clustering in Single-Cell Data

Louvain algorithm:maxQ=12mij[Aijkikj2m]δ(ci,cj)\text{Louvain algorithm}: \max Q = \frac{1}{2m} \sum_{ij} \left[A_{ij} - \frac{k_i k_j}{2m}\right] \delta(c_i, c_j)

Where AijA_{ij} is adjacency matrix, kik_i is degree of node ii, mm is total edges, cic_i is cluster of node ii.

Pseudotime Analysis

Diffusion pseudotime:Time along differentiation trajectory\text{Diffusion pseudotime}: \text{Time along differentiation trajectory} Branching events:Points where cellular states diverge\text{Branching events}: \text{Points where cellular states diverge}

Computational Methods

Machine Learning in Systems Biology

Supervised Learning

Random Forest:f(x)=1Bb=1BTb(x;Θb)\text{Random Forest}: f(\mathbf{x}) = \frac{1}{B} \sum_{b=1}^{B} T_b(\mathbf{x}; \mathbf{\Theta}_b)

Unsupervised Learning

t-SNE:Pij=Pij+Pji2N\text{t-SNE}: P_{ij} = \frac{P_{i|j} + P_{j|i}}{2N}

Deep Learning

Autoencoder:z=encoder(x),x=decoder(z)\text{Autoencoder}: \mathbf{z} = \text{encoder}(\mathbf{x}), \mathbf{x'} = \text{decoder}(\mathbf{z})

Network Inference

Correlation-Based Networks

rij=k=1n(xikxiˉ)(xjkxjˉ)k=1n(xikxiˉ)2k=1n(xjkxjˉ)2r_{ij} = \frac{\sum_{k=1}^{n} (x_{ik} - \bar{x_i})(x_{jk} - \bar{x_j})}{\sqrt{\sum_{k=1}^{n}(x_{ik} - \bar{x_i})^2 \sum_{k=1}^{n}(x_{jk} - \bar{x_j})^2}}

Partial Correlation

ρijothers=[C1]ij[C1]ii[C1]jj\rho_{ij \cdot others} = -\frac{[C^{-1}]_{ij}}{\sqrt{[C^{-1}]_{ii}[C^{-1}]_{jj}}}

Where CC is correlation matrix.

Differential Network Analysis

A(condition1)A(condition2)Regulatory rewiring\mathbf{A}^{(condition1)} \neq \mathbf{A}^{(condition2)} \Rightarrow \text{Regulatory rewiring} ΔConnectivityi=jAij(1)Aij(2)\Delta \text{Connectivity}_i = \sum_{j} |A^{(1)}_{ij} - A^{(2)}_{ij}|

Applications in Medicine and Research

Disease Subtyping

Molecular subtypes=f(omics profiles)Clinical outcomes\text{Molecular subtypes} = f(\text{omics profiles}) \rightarrow \text{Clinical outcomes}

Drug Response Prediction

Response model:R=f(molecular features,drug properties)\text{Response model}: R = f(\text{molecular features}, \text{drug properties})

Biomarker Discovery

Biomarker=argmaxfeatureAUC(diagnosis)\text{Biomarker} = \arg\max_{\text{feature}} \text{AUC}(\text{diagnosis})

Challenges and Considerations

Computational Challenges

  • Scale: Millions of cells × thousands of features
  • Noise: Technical and biological variability
  • Integration: Combining different data modalities

Statistical Considerations

  • Multiple testing: Bonferroni, FDR corrections
  • Batch effects: ComBat, Harmony, Seurat integration
  • Confounding factors: Cell cycle, quality metrics

Biological Interpretation

  • Causality: Correlation vs causation
  • Context dependence: Cell type, tissue, condition
  • Validation: Experimental follow-up required

Real-World Application: Cancer Multi-Omics Analysis

Integrating multiple omics layers reveals the complex molecular landscape of cancer.

Cancer Multi-Omics Integration

# Multi-omics integration in cancer genomics
cancer_data = {
    'genomics': {
        'mutations': 12,      # Number of mutations per MB
        'cnv_burden': 0.8,   # Fraction of genome with CNV
        'tmb': 15,           # Tumor mutational burden (mutations per MB)
        'msi_status': 'MSS'  # Microsatellite instability status
    },
    'transcriptomics': {
        'tumor_purity': 0.7,  # Estimated fraction of tumor cells
        'immune_infiltration': 0.4,  # Immune cell abundance
        'proliferation_score': 0.65, # Cell cycle activity
        'subtype_signature': 'Basal'
    },
    'clinical': {
        'tumor_stage': 'III',
        'grade': 3,
        'patient_age': 58,
        'treatment_response': 'Partial'
    }
}

# Calculate oncogenic pathway activity
# Based on mutation patterns and expression changes
oncogenic_pathways = {
    'p53_pathway': 0.9,      # High activity (frequently mutated)
    'RAS_MAPK': 0.4,         # Moderate activity
    'PI3K_AKT': 0.7,         # High activity
    'cell_cycle': 0.8,       # High activity (consistent with proliferation)
    'immune_check': 0.6      # Moderate immune checkpoint activity
}

# Calculate synthetic lethality scores
# For potential drug targeting
synthetic_lethality_targets = []
for pathway, activity in oncogenic_pathways.items():
    if activity > 0.7:
        # Identify potential synthetic lethal partners
        score = (1 - activity) * cancer_data['genomics']['tmb'] / 10
        synthetic_lethality_targets.append((pathway, score))

# Predict immunotherapy response
# Based on TMB, MSI, and immune infiltration
tmb_score = min(cancer_data['genomics']['tmb'] / 10, 1.0)  # Normalize
msi_score = 1.0 if cancer_data['genomics']['msi_status'] == 'MSI' else 0.3
infiltration_score = cancer_data['transcriptomics']['immune_infiltration']

immunotherapy_response = (tmb_score * 0.4 + msi_score * 0.3 + infiltration_score * 0.3) * 100

# Calculate driver mutation probability
driver_probabilities = {}
for i in range(cancer_data['genomics']['mutations']):
    # Based on known cancer genes and mutation types
    prob = 0.15 if i < 5 else 0.05  # Higher probability for early mutations
    driver_probabilities[f'mutation_{i}'] = prob

predicted_drivers = sum(1 for p in driver_probabilities.values() if p > 0.1)

print(f"Cancer multi-omics analysis:")
print(f"  Genomics:")
print(f"    TMB: {cancer_data['genomics']['tmb']} mutations/MB")
print(f"    CNV burden: {cancer_data['genomics']['cnv_burden']}")
print(f"    MSI status: {cancer_data['genomics']['msi_status']}")
print(f"  Transcriptomics:")
print(f"    Tumor purity: {cancer_data['transcriptomics']['tumor_purity']}")
print(f"    Immune infiltration: {cancer_data['transcriptomics']['immune_infiltration']}")
print(f"    Proliferation: {cancer_data['transcriptomics']['proliferation_score']}")
print(f"  Predicted driver mutations: {predicted_drivers}")
print(f"  Immunotherapy response score: {immunotherapy_response:.1f}%")

# Identify potential therapeutic targets
high_activity_pathways = [path for path, act in oncogenic_pathways.items() if act > 0.6]
print(f"  High-activity pathways: {high_activity_pathways}")

# Assess tumor heterogeneity
clonal_mutations = int(cancer_data['genomics']['mutations'] * cancer_data['transcriptomics']['tumor_purity'])
subclonal_mutations = cancer_data['genomics']['mutations'] - clonal_mutations
heterogeneity_index = subclonal_mutations / cancer_data['genomics']['mutations']

print(f"  Estimated clonal mutations: {clonal_mutations}")
print(f"  Estimated subclonal mutations: {subclonal_mutations}")
print(f"  Heterogeneity index: {heterogeneity_index:.2f}")
print(f"  Higher heterogeneity suggests more aggressive tumor")

Therapeutic Implications

Using systems biology to predict therapeutic responses.


Your Challenge: Network Analysis of Gene Expression Data

Analyze a gene expression dataset to identify regulatory networks and potential therapeutic targets.

Goal: Construct and analyze a gene co-expression network to identify key regulatory genes.

Gene Expression Dataset

import math

# Simulated gene expression dataset
expression_data = {
    'genes': ['TP53', 'BRCA1', 'MYC', 'EGFR', 'PTEN', 'AKT1', 'RB1', 'CCND1'],
    'samples': ['tumor_1', 'tumor_2', 'tumor_3', 'normal_1', 'normal_2'],
    'expression_matrix': [
        [4.2, 3.8, 4.5, 2.1, 2.3],  # TP53
        [3.1, 3.3, 2.9, 4.2, 4.0],  # BRCA1
        [5.8, 6.2, 5.5, 2.8, 3.0],  # MYC
        [4.9, 5.1, 4.7, 1.9, 2.0],  # EGFR
        [2.1, 2.2, 2.0, 3.8, 3.9],  # PTEN
        [4.5, 4.3, 4.7, 2.2, 2.1],  # AKT1
        [3.3, 3.1, 3.4, 4.5, 4.4],  # RB1
        [5.1, 5.3, 4.9, 1.8, 1.9]   # CCND1
    ],  # Expression values (log2 transformed)
    'sample_classes': ['tumor', 'tumor', 'tumor', 'normal', 'normal']  # Tumor vs normal
}

# Calculate Pearson correlation matrix
n_genes = len(expression_data['genes'])
n_samples = len(expression_data['samples'])
correlation_matrix = [[0 for _ in range(n_genes)] for _ in range(n_genes)]

# Calculate mean expression for each gene
gene_means = []
for i in range(n_genes):
    mean_expr = sum(expression_data['expression_matrix'][i]) / n_samples
    gene_means.append(mean_expr)

# Calculate correlations
for i in range(n_genes):
    for j in range(n_genes):
        if i == j:
            correlation_matrix[i][j] = 1.0
        else:
            # Pearson correlation coefficient
            numerator = 0
            sum_sq_diff_i = 0
            sum_sq_diff_j = 0
            
            for k in range(n_samples):
                diff_i = expression_data['expression_matrix'][i][k] - gene_means[i]
                diff_j = expression_data['expression_matrix'][j][k] - gene_means[j]
                numerator += diff_i * diff_j
                sum_sq_diff_i += diff_i**2
                sum_sq_diff_j += diff_j**2
            
            denominator = math.sqrt(sum_sq_diff_i * sum_sq_diff_j)
            if denominator != 0:
                corr_val = numerator / denominator
            else:
                corr_val = 0
            correlation_matrix[i][j] = corr_val

# Create network based on correlation threshold
corr_threshold = 0.7  # Absolute correlation threshold
network_edges = []
for i in range(n_genes):
    for j in range(i+1, n_genes):
        if abs(correlation_matrix[i][j]) > corr_threshold:
            network_edges.append({
                'gene1': expression_data['genes'][i],
                'gene2': expression_data['genes'][j],
                'correlation': correlation_matrix[i][j],
                'significant': True
            })

# Calculate network properties
node_degrees = [0 for _ in range(n_genes)]
for edge in network_edges:
    idx1 = expression_data['genes'].index(edge['gene1'])
    idx2 = expression_data['genes'].index(edge['gene2'])
    node_degrees[idx1] += 1
    node_degrees[idx2] += 1

# Identify hub genes (nodes with high connectivity)
avg_degree = sum(node_degrees) / n_genes
hub_genes = []
for i in range(n_genes):
    if node_degrees[i] >= avg_degree + 0.5:
        hub_genes.append(expression_data['genes'][i])

# Calculate differential expression (tumor vs normal)
diff_expr_results = {}
for i, gene in enumerate(expression_data['genes']):
    tumor_expr = [expression_data['expression_matrix'][i][k] for k in range(3)]
    normal_expr = [expression_data['expression_matrix'][i][k] for k in range(3,5)]
    
    mean_tumor = sum(tumor_expr) / len(tumor_expr)
    mean_normal = sum(normal_expr) / len(normal_expr)
    
    fold_change = 2**(mean_tumor - mean_normal)  # Convert from log2 scale
    diff_expr_results[gene] = {
        'fold_change': fold_change,
        'tumor_mean': mean_tumor,
        'normal_mean': mean_normal,
        'is_significant': abs(mean_tumor - mean_normal) > 1.0  # Simple threshold
    }

# Identify potential therapeutic targets
potential_targets = []
for gene, stats in diff_expr_results.items():
    if stats['is_significant'] and stats['fold_change'] > 2:
        # Check if it's connected to hub genes
        gene_idx = expression_data['genes'].index(gene)
        is_connected_to_hub = any(
            abs(correlation_matrix[gene_idx][expression_data['genes'].index(hub)]) > 0.5
            for hub in hub_genes
        )
        if is_connected_to_hub:
            potential_targets.append(gene)

Analyze the gene co-expression network to identify key regulatory genes and therapeutic targets.

Hint:

  • Calculate correlation coefficients between gene expression profiles
  • Create network based on significant correlations
  • Identify hub genes with high connectivity
  • Consider differential expression between conditions
  • Identify genes that are both deregulated and highly connected
# TODO: Calculate network analysis results
correlation_matrix = [[0 for _ in range(len(expression_data['genes']))] for _ in range(len(expression_data['genes']))]  # Gene-gene correlation matrix
network_edges = []  # List of significant gene-gene connections
hub_genes = []      # Genes with high connectivity
differential_genes = {}  # Genes with significant tumor vs normal differences
potential_therapeutic_targets = []  # Hub and differential genes combined

# Calculate correlation matrix (already computed above)
# Calculate network edges (already computed above)
# Identify hub genes (already computed above)

# Calculate differential expression
for gene, stats in diff_expr_results.items():
    if stats['is_significant']:
        differential_genes[gene] = stats

# Identify potential therapeutic targets
potential_therapeutic_targets = []
for gene in potential_targets:
    potential_therapeutic_targets.append({
        'gene': gene,
        'fold_change': diff_expr_results[gene]['fold_change'],
        'correlation_with_hub': max([abs(correlation_matrix[expression_data['genes'].index(gene)][expression_data['genes'].index(hub)]) for hub in hub_genes]),
        'degree': node_degrees[expression_data['genes'].index(gene)]
    })

# Print results
print(f"Network analysis results:")
print(f"  Number of genes: {len(expression_data['genes'])}")
print(f"  Number of significant correlations: {len(network_edges)}")
print(f"  Identified hub genes: {hub_genes}")
print(f"  Number of differentially expressed genes: {len(differential_genes)}")
print(f"  Potential therapeutic targets: {[t['gene'] for t in potential_therapeutic_targets]}")

# Network properties
if hub_genes:
    connectivity_analysis = f"Average connectivity: {sum(node_degrees)/len(node_degrees):.2f}"
else:
    connectivity_analysis = "No network hubs identified"

print(f"  Network connectivity: {connectivity_analysis}")

# Therapeutic target ranking
target_scores = [(target['gene'], target['fold_change'] * target['degree']) for target in potential_therapeutic_targets]
target_scores.sort(key=lambda x: x[1], reverse=True)

if target_scores:
    top_target = target_scores[0][0]
    print(f"  Top potential therapeutic target: {top_target}")
else:
    print(f"  No significant therapeutic targets identified")

# Functional implications
if any(abs(corr) > 0.8 for row in correlation_matrix for corr in row if corr != 1):
    functional_implication = "Highly co-regulated gene modules identified"
else:
    functional_implication = "Limited co-regulation detected"

print(f"  Functional implication: {functional_implication}")

How might the network analysis results differ if you analyzed single-cell RNA-seq data compared to bulk RNA-seq data?

ELI10 Explanation

Simple analogy for better understanding

Think of systems biology like being a conductor of a massive orchestra where each musician is a different molecule in the cell. Instead of studying individual musicians (single genes or proteins), systems biology studies the whole concert - how all the musicians play together, how they respond to the conductor's cues, and how the music changes when different sections are emphasized or muted. Omics technologies are like having special microphones that can listen to every single instrument in the orchestra simultaneously - genomics listens to the sheet music (DNA), transcriptomics hears the notes being played (RNA), proteomics sees the musicians performing (proteins), and metabolomics tastes the sounds (small molecules). When you combine all this information, you can understand not just how individual parts work, but how the entire biological symphony comes together in health and disease. It's like shifting from listening to solo performances to appreciating the full orchestral experience.

Self-Examination

Q1.

How do different omics technologies complement each other in systems biology?

Q2.

What are the challenges in integrating multi-omics data?

Q3.

How can network analysis reveal regulatory mechanisms in biological systems?