10x Data Science Tools You Should Know

3 min readOct 15, 2021

I used to think I was decent at data science. Then I started using Pandas-Profiling, BitRook, Mito, and SweetViz. Now I’m a 10x faster in my job.

Data science tools can help you quickly crunch through large datasets, allowing you to answer analytical questions and make data-driven decisions. Without these tools, such tasks would be difficult, if not impossible, to accomplish.

In this article, we take a look at 10 tools that you can use to speed up your data science workflow. Here are some of the best data science tools you can use right now, all either open source or have a free version.

Pandas Profiling

Pandas Profiling creates an exploratory data analysis report that is full-featured and even handles large datasets. Basically extensive EDA report in 3 lines of code. Its a no-brainer way to get a sense of a dataset in seconds.

Features

Column Type Detection
Unique values, missing values
Quantile statistics
Descriptive statistics
Most frequent values
Correlations
Missing values matrix and counts
Duplicate rows
Text analysis learn about categories

Installation

pip install pandas-profiling

Usage

With 3 lines of code, you can take a DataFrame and turn it into an interactive HTML report or even a notebook widget on your data.

2. BitRook

BitRook is a unique desktop app that is more like a Data Science swiss army knife. It uses ML to analyze and help clean your data — it even generates a python script to automate your cleaning. I used to spend a LOT of time copying and pasting code from other data cleaning projects and this has completely removed that issue. On top of that it helps you analyze your data, and instead of you searching for issues — it raises them to your attention and can tell you in seconds if a dataset is predictive. It handles large datasets and is 10x faster at loading data than Excel. Definetly worth checking out — even the free version is amazing.

Features

Generates a python script for you
Predictive Data Detection (Correlation Matrix & Predictive Power Score)
Handles large data
Column Type Detection & Type Standardization
Common Data Cleaning Functions Built-In
Unique values, missing values
Quantile statistics
Descriptive statistics
Most frequent values (category, letter frequency & word frequency)
Outlier handling
PII Data Detection
Data validation script generation
Binning (including WOE binning)
Viewing CSV is faster than Excel and EDA built in
Splitting data
Data profiling script generation

Installation

It’s a simple downloadable desktop app from bitrook.com

Usage

Great video tutorials and support

3. Mito

I like to think of Mito as Excel in your jupyter notebook. It gives you a lot of the capabilities and ease-of-use of Excel, but it generates python code of the changes you are doing. I can easily see this as a great way for people to even learn pandas.

Features

Pivot tables
Generates the code for each edit
Exploratory graphs
Dataframe merging
Excel-like formulas
Exploratory data analysis
Data filtering

Installation

python -m pip install mitoinstaller
python -m mitoinstaller install

Usage

import mitosheet
mitosheet.sheet()

4. SweetViz

Sweetviz is a Python library that generates EDA visualizations in a fully self-contained HTML application. It has all the common data points like missing, distinct, duplicates but Sweetviz also can compare training data vs test data and shows how a target value relates to other features. Its really simple to use and in the couple of use cases its usually just 2 lines of code, so the docs are a little light due to that.

Features

Overview of data
Descriptive statistics
Automatically detects data types
Missing values charts
Correlations
Visual Comparisons (training vs test data)
Target analysis
Comparing 2 datasets together (training vs test data)

Installation

pip install sweetviz

Usage

import sweetvizreport = sweetviz.analyze(df)report.show_html('report.html')

10x Data Science Tools You Should Know

Written by Stephen

No responses yet