how to use scrublet

2 min read 18-10-2024

Scrublet is a powerful tool designed to help researchers analyze single-cell RNA sequencing (scRNA-seq) data. It is particularly useful for identifying doublets—instances where two cells are erroneously captured as one during the sequencing process. In this guide, we will walk you through how to use Scrublet effectively.

What is Scrublet?

Scrublet is a Python package that employs a simulated doublet approach to detect doublets in scRNA-seq datasets. It generates synthetic doublets and compares them to the actual data, enabling you to assess the likelihood that a given cell is a doublet.

Installation

Before you can use Scrublet, you need to install it. Follow these steps:

Prerequisites: Make sure you have Python (preferably 3.6 or higher) and pip installed on your machine.
Install Scrublet: Open your terminal or command prompt and execute the following command:
```
pip install scrublet
```

Preparing Your Data

Scrublet operates on the output of an scRNA-seq analysis. You will typically start with a matrix of gene expression data, where rows represent genes and columns represent cells.

Load Your Data: You can load your data into a Python environment using libraries such as Pandas or NumPy. Ensure your data is in the correct format.
Create a Scrublet Object: Import Scrublet and create a Scrublet object, passing your gene expression matrix to it:
```
import scrublet as scr
scrub = scr.Scrublet(expression_matrix)
```

Running Scrublet

Now that you have set up Scrublet, you can run its core functions to identify doublets:

Run the Doublet Detection: Use the scrub_doublets method to perform doublet detection:
```
doublet_scores = scrub.scrub_doublets()
```
This will return a score for each cell, indicating the likelihood that it is a doublet.
Analyze Results: You can visualize the results using a histogram to see the distribution of doublet scores:
```
scrub.plot_histogram(doublet_scores)
```
This plot will help you determine a threshold score to classify cells as doublets or singlets.

Setting a Threshold

You need to decide on a threshold score to classify your cells. Often, a common practice is to set the threshold at the 95th percentile of the doublet scores. You can calculate this easily:

threshold = np.percentile(doublet_scores, 95)

Filtering Doublets

Once you have established a threshold, you can filter out the identified doublets from your dataset:

singlet_indices = doublet_scores < threshold
filtered_data = expression_matrix[:, singlet_indices]

Conclusion

Scrublet is a robust tool for detecting doublets in scRNA-seq data, helping you to clean your dataset and ensure more accurate analysis. By following the steps outlined above, you can implement Scrublet effectively in your research. Happy analyzing!

This guide should provide a clear understanding of how to utilize Scrublet for your single-cell RNA sequencing projects. If you have any questions, feel free to consult the official documentation or relevant literature for deeper insights into the tool's capabilities.