Skip to Content
TutorialsRNA-seq Pipeline

RNA-seq Analysis Pipeline

Build a complete RNA-seq differential expression analysis workflow using Python and R nodes.

Prerequisites

  • A counts matrix file (CSV/TSV format)
  • Sample metadata file (CSV/TSV format)

Step 1: Load the Data

Add a Python node to load your count data:

import pandas as pd counts = pd.read_csv("/app/data/counts.csv", index_col=0) samples = pd.read_csv("/app/data/samples.csv", index_col=0) print(f"Loaded {counts.shape[0]} genes, {counts.shape[1]} samples") display(counts.head())

Step 2: Filter Low-Count Genes

Add another Python node connected to the first:

import numpy as np # Keep genes with at least 10 counts in at least 3 samples mask = (counts >= 10).sum(axis=1) >= 3 filtered = counts[mask] print(f"Filtered: {filtered.shape[0]} genes remaining")

Step 3: Differential Expression with R

Add an R node and connect the filtered counts:

library(DESeq2) # Create DESeq2 dataset dds <- DESeqDataSetFromMatrix( countData = filtered, colData = samples, design = ~condition ) # Run differential expression analysis dds <- DESeq(dds) results <- results(dds) # Output significant genes sig <- subset(results, padj < 0.05 & abs(log2FoldChange) > 1) print(paste("Found", nrow(sig), "significant genes"))

Step 4: Visualize Results

Add a final Python node for visualization:

import matplotlib.pyplot as plt # Volcano plot fig, ax = plt.subplots(figsize=(8, 6)) ax.scatter(results['log2FoldChange'], -np.log10(results['padj']), alpha=0.5) ax.set_xlabel('Log2 Fold Change') ax.set_ylabel('-Log10 Adjusted P-value') ax.set_title('Volcano Plot') plt.tight_layout() plt.show()

Step 5: Run the Pipeline

Click Run All and watch each step execute, passing data between Python and R nodes automatically.


Next: Using the AI Chat

Last updated on