10x Data Science Tools You Should Know

Stephen
3 min readOct 15, 2021

--

I used to think I was decent at data science. Then I started using Pandas-Profiling, BitRook, Mito, and SweetViz. Now I’m a 10x faster in my job.

Data science tools can help you quickly crunch through large datasets, allowing you to answer analytical questions and make data-driven decisions. Without these tools, such tasks would be difficult, if not impossible, to accomplish.

In this article, we take a look at 10 tools that you can use to speed up your data science workflow. Here are some of the best data science tools you can use right now, all either open source or have a free version.

  1. Pandas Profiling

Pandas Profiling creates an exploratory data analysis report that is full-featured and even handles large datasets. Basically extensive EDA report in 3 lines of code. Its a no-brainer way to get a sense of a dataset in seconds.

Features

  1. Column Type Detection
  2. Unique values, missing values
  3. Quantile statistics
  4. Descriptive statistics
  5. Most frequent values
  6. Correlations
  7. Missing values matrix and counts
  8. Duplicate rows
  9. Text analysis learn about categories

Installation

pip install pandas-profiling

Usage

With 3 lines of code, you can take a DataFrame and turn it into an interactive HTML report or even a notebook widget on your data.

2. BitRook

BitRook is a unique desktop app that is more like a Data Science swiss army knife. It uses ML to analyze and help clean your data — it even generates a python script to automate your cleaning. I used to spend a LOT of time copying and pasting code from other data cleaning projects and this has completely removed that issue. On top of that it helps you analyze your data, and instead of you searching for issues — it raises them to your attention and can tell you in seconds if a dataset is predictive. It handles large datasets and is 10x faster at loading data than Excel. Definetly worth checking out — even the free version is amazing.

Features

  1. Generates a python script for you
  2. Predictive Data Detection (Correlation Matrix & Predictive Power Score)
  3. Handles large data
  4. Column Type Detection & Type Standardization
  5. Common Data Cleaning Functions Built-In
  6. Unique values, missing values
  7. Quantile statistics
  8. Descriptive statistics
  9. Most frequent values (category, letter frequency & word frequency)
  10. Outlier handling
  11. PII Data Detection
  12. Data validation script generation
  13. Binning (including WOE binning)
  14. Viewing CSV is faster than Excel and EDA built in
  15. Splitting data
  16. Data profiling script generation

Installation

It’s a simple downloadable desktop app from bitrook.com

Usage

Great video tutorials and support

3. Mito

I like to think of Mito as Excel in your jupyter notebook. It gives you a lot of the capabilities and ease-of-use of Excel, but it generates python code of the changes you are doing. I can easily see this as a great way for people to even learn pandas.

Features

  1. Pivot tables
  2. Generates the code for each edit
  3. Exploratory graphs
  4. Dataframe merging
  5. Excel-like formulas
  6. Exploratory data analysis
  7. Data filtering

Installation

python -m pip install mitoinstaller
python -m mitoinstaller install

Usage

import mitosheet
mitosheet.sheet()

4. SweetViz

Sweetviz is a Python library that generates EDA visualizations in a fully self-contained HTML application. It has all the common data points like missing, distinct, duplicates but Sweetviz also can compare training data vs test data and shows how a target value relates to other features. Its really simple to use and in the couple of use cases its usually just 2 lines of code, so the docs are a little light due to that.

Features

  • Overview of data
  • Descriptive statistics
  • Automatically detects data types
  • Missing values charts
  • Correlations
  • Visual Comparisons (training vs test data)
  • Target analysis
  • Comparing 2 datasets together (training vs test data)

Installation

pip install sweetviz

Usage

import sweetvizreport = sweetviz.analyze(df)report.show_html('report.html')

--

--

Stephen
Stephen

Written by Stephen

An Always Curious Software Engineer

No responses yet