RNA-seq Analysis Pipeline

Build a complete RNA-seq differential expression analysis workflow using Python and R nodes.

Prerequisites

A counts matrix file (CSV/TSV format)
Sample metadata file (CSV/TSV format)

Step 1: Load the Data

Add a Python node to load your count data:


import pandas as pd
 
counts = pd.read_csv("/app/data/counts.csv", index_col=0)
samples = pd.read_csv("/app/data/samples.csv", index_col=0)
 
print(f"Loaded {counts.shape[0]} genes, {counts.shape[1]} samples")
display(counts.head())

Step 2: Filter Low-Count Genes

Add another Python node connected to the first:


import numpy as np
 
# Keep genes with at least 10 counts in at least 3 samples
mask = (counts >= 10).sum(axis=1) >= 3
filtered = counts[mask]
 
print(f"Filtered: {filtered.shape[0]} genes remaining")

Step 3: Differential Expression with R

Add an R node and connect the filtered counts:


library(DESeq2)
 
# Create DESeq2 dataset
dds <- DESeqDataSetFromMatrix(
  countData = filtered,
  colData = samples,
  design = ~condition
)
 
# Run differential expression analysis
dds <- DESeq(dds)
results <- results(dds)
 
# Output significant genes
sig <- subset(results, padj < 0.05 & abs(log2FoldChange) > 1)
print(paste("Found", nrow(sig), "significant genes"))

Step 4: Visualize Results

Add a final Python node for visualization:


import matplotlib.pyplot as plt
 
# Volcano plot
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(results['log2FoldChange'], -np.log10(results['padj']), alpha=0.5)
ax.set_xlabel('Log2 Fold Change')
ax.set_ylabel('-Log10 Adjusted P-value')
ax.set_title('Volcano Plot')
plt.tight_layout()
plt.show()

Step 5: Run the Pipeline

Click Run All and watch each step execute, passing data between Python and R nodes automatically.

Next: Using the AI Chat