RNA-seq Analysis Pipeline
Build a complete RNA-seq differential expression analysis workflow using Python and R nodes.
Prerequisites
- A counts matrix file (CSV/TSV format)
- Sample metadata file (CSV/TSV format)
Step 1: Load the Data
Add a Python node to load your count data:
import pandas as pd
counts = pd.read_csv("/app/data/counts.csv", index_col=0)
samples = pd.read_csv("/app/data/samples.csv", index_col=0)
print(f"Loaded {counts.shape[0]} genes, {counts.shape[1]} samples")
display(counts.head())Step 2: Filter Low-Count Genes
Add another Python node connected to the first:
import numpy as np
# Keep genes with at least 10 counts in at least 3 samples
mask = (counts >= 10).sum(axis=1) >= 3
filtered = counts[mask]
print(f"Filtered: {filtered.shape[0]} genes remaining")Step 3: Differential Expression with R
Add an R node and connect the filtered counts:
library(DESeq2)
# Create DESeq2 dataset
dds <- DESeqDataSetFromMatrix(
countData = filtered,
colData = samples,
design = ~condition
)
# Run differential expression analysis
dds <- DESeq(dds)
results <- results(dds)
# Output significant genes
sig <- subset(results, padj < 0.05 & abs(log2FoldChange) > 1)
print(paste("Found", nrow(sig), "significant genes"))Step 4: Visualize Results
Add a final Python node for visualization:
import matplotlib.pyplot as plt
# Volcano plot
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(results['log2FoldChange'], -np.log10(results['padj']), alpha=0.5)
ax.set_xlabel('Log2 Fold Change')
ax.set_ylabel('-Log10 Adjusted P-value')
ax.set_title('Volcano Plot')
plt.tight_layout()
plt.show()Step 5: Run the Pipeline
Click Run All and watch each step execute, passing data between Python and R nodes automatically.
Next: Using the AI Chat
Last updated on