10 Simple Hacks to Speed Up Your Data Analysis in Python

Popular Categories

11 Jul

General

What Are Agricultural Steel Buildings & Why Do UK Farms Depend on…

Team Assistsuite

9 Jul

General

Window Films: Practical Solutions for Comfort, Safety and Energy Efficiency

Team Assistsuite

4 Jul

Internet

Decoding The Matrix: Why Philosophy Of Tech Matters (More Than Ever)

Team Assistsuite

3 Jul

Internet

QR Codes: From Assembly Line Savior To Everywhere You Look

Team Assistsuite

30 Jun

General

Steel Fabrication: Creating the Backbone of Contemporary Construction

Team Assistsuite

Tips & Tricks may be pretty valuable, particularly in the programming industry. Sometimes a little hack can save both time and your life. Occasionally, a little shortcut or add-on can prove to be a godsend and a great productivity enhancer. The following are some of my favorite tips and techniques that can be utilized to speed up your data analysis in Python. Some may be pretty well-known, while others may be relatively new, but we are sure you will find them helpful the next time you work on a Data Analysis project.

Profiling the Panda’s Data Frame

Profiling is a procedure that aids in the comprehension of our data, and PandasProfiling is a Python library that does just that. It is a straightforward and practical method for exploratory data analysis on a Pandas Dataframe. Typically, the df. describe(), and df. info()functions serve as the first step in the EDA process. However, it provides a very rudimentary overview of the data and is not very useful for huge data sets.

In contrast, the Pandas Profiling method extends the pandas DataFrame with df. Profile Report() for rapid data processing. It presents actual data with a single line of code and in an interactive HTML report.

For Candidates who want to advance their careers, Python Online training is the best option

The pandas profiling module computes the following statistics for a given dataset:

Usage

Let’s illustrate the possibilities of the adaptable Python profiler using the titanic dataset.

import pandas as pd

import pandas_profiling

titanic_df = pd.read_csv (‘/kaggle/input/titanic/train.csv’)

Executing the following code will enable the report to display in a Jupyter notebook. The data profiling report may be shown in a Jupyter notebook with only this one line of code, which is all required. The information is relatively comprehensive and includes charts wherever they are needed.

titanic_df.profile_report()

It is possible to export the report as an interactive HTML file using the https://mindmajix.com/python-tutorial code provided below.

profile = titanic_df.profile_report(title=’Pandas Profiling Report’)

profile.to_file(output_file=”Titanic data profiling.html”)

Adding Interactivity to Panda Plots

Pandas’ DataFrame class has a built-in .plot() method. However, the visuals produced by this function are not interactive, which makes them less engaging. On the contrary, it the easy to plot charts with pandas.DataFrame.plot() function also cannot be ruled out. What if we could use pandas to create interactive graphs without making big changes to the code? You may accomplish so with the aid of the Cufflinks library.

The strength of the plot and the adaptability of pandas are combined in the Cufflinks library to enable graphing more easily.

You can do the installations using the below codes –

pip install plotly # Plotly is a pre-requisite before installing cufflinks

pip install cufflinks

Hack Usage –

#importing Pandas

import pandas as pd

#importing plotly and cufflinks in offline mode

import cufflinks as cf

import plotly.offline

cf.go_offline()

cf.set_config_file(offline=False, world_readable=True)

Magic Commands

“magic commands” refers to a group of functions found in Jupyter Notebooks. These functions were developed to help in some of the most typical issues during standard data analysis. Using the %lsmagic command will allow you to see all accessible magics.

Line magics are prefixed by a single % character and operate on a single line of input. In contrast, cell magics are associated with the double %% prefix and run on multiple lines of information. Both types of magic instructions are denoted by their respective prefixes. If the value is set to 1, it is possible to execute magical functions without first entering the initial%.

Let’s have a look at a few of them that might help perform frequent activities involving data analysis:

% Pastebin – The %pastebin function uploads code to Pastebin and returns the corresponding URL. Pastebin is an online service storing and sharing plain text, such as source code snippets and related URLs. GitHub gist is similar to Pastebin but with version control.

%matplotlib notebook – The %matplotlib inline code renders static matplotlib charts inside a Jupyter notebook. Replace inline with a notebook to generate zoomable and resizable graphs quickly. Ensure that the function is called before the matplotlib library is imported.

%run – The %run function executes a Python script inside a notebook.

%%writefile – %%writefile writes a cell’s contents to a file. Here, the code will be stored in the current directory as a file called foo.py.

%%latex – The %%latex function formats the contents of a cell as LaTeX. It is handy to write mathematical equations and formulas in a cell.

Finding and Eliminating Errors

The interactive debugger is similarly a magical function, but I have categorized it separately. If an error occurs when executing the code cell, enter %debug on a new line and run it. This will launch an interactive debugging environment and bring you to the point where the error occurred.

You may also use this function to verify the values of variables assigned in the programme and to conduct actions. Press q to quit the debugger.

Pretty Printing

Print is the appropriate tool to use if you want to generate visually appealing representations of your data structures. It is beneficial for printing dictionaries and JSON data. Let’s see an example that displays output using both print and pprint.

Making the Notes Stand Out

In your Jupyter Notebooks, we may draw attention to anything significant or anything else that needs to be highlighted by using the alert and note boxes. The kind of warning that is selected will determine the color of the message that is shown. Insert one or more of the codes below into a cell that needs to be highlighted.

Blue Alert Box: info

<b>Tip:</b> Use blue boxes (alert-info) for tips and notes.

If it’s a note, you don’t have to include the word “Note”.

</div>

Yellow Alert Box: Warning

<b>Example:</b> Yellow Boxes are generally used to include additional examples or mathematical formulas.

</div>

Green Alert Box: Success

Use green box only when necessary like to display links to related content.

</div>

Red Alert Box: Danger

It is good to avoid red boxes but can be used to alert users to not delete some important part of code etc.

</div>

Printing all the Outputs of a Cell

Take into consideration a cell in Jupyter Notebook that has the following lines of code in it:

In [1]: 10+5

11+6

Out [1]: 17

Only the most recent output will be written since this is the default behavior of the cell; to print any of the others, the print() function will need to be used. It has come to our attention that we can print all of the outputs by including the following code snippet at the very beginning of the notebook.

from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = “all”

To return to the initial configuration:

Interactive shell.ast_node_interactivity = “last_expr”

Running Python Scripts with the ‘i’ Option

The typical command line syntax for executing a Python script is python hello.py. However, running the same script with an extra -I, such as Python -I hello.py, has additional benefits. How can we do this?

Python does not quit the interpreter after reaching the programme’s conclusion. Thus, we can validate the variables’ values and the specified functions’ validity.

Second, given that we are still within the interpreter, it is pretty simple for us to start a Python debugger by doing the following:

import pdb

pdb.pm()

After this, we will be able to work on the code since we will have been brought to the point where the exception has happened.

Commenting out Code Automatically

Ctrl/Cmd + / comments out selected lines in the cell automatically. Hitting the combination again will uncomment the same line of code.

To Delete is Human; to Restore Divine

When working in Jupyter Notebook, have you ever made the mistake of deleting a cell by accident? If you answered yes, then the following is a shortcut that will reverse the operation of deleting that file.

If you have deleted a cell’s contents, you can quickly recover it by hitting CTRL/CMD+Z.

If you need to recover an entire deleted cell, hit ESC+Z or EDIT > Undo Delete Cells.

FINAL WORDS

I hope you have found this post helpful. These tips will give you an excellent foundation to work from if you start coding. Once you get more comfortable with Python and Jupyter Notebooks, plenty of other resources are available to help you learn even more. Stay curious and keep learning; that is the best way to grow as a programmer. Thanks for reading!

Frequently Asked Questions

Is it challenging to do data analysis using Python?

Python and R are free, open-source programming languages that may be executed on various operating systems, including Windows, macOS, and Linux. Both are capable of doing almost any data analysis activity, and both are regarded as languages that are not too difficult to master, particularly for those just starting.

Is R more challenging to learn than Python?

Because R’s code is not standardized, learning R might be challenging for novices. For most students, Python is often simpler to understand and has a straight learning curve. In addition, the amount of time spent coding is reduced while using Python since it is simpler to maintain and has a grammar comparable to the English language.

Should I spend more time learning R or Python?

It is recommended that you first get familiar with Python before moving on to R. There are still plenty of jobs where R is required, so if you have the time, it doesn’t hurt to learn both. Still, I’d suggest that these days, Python is becoming the dominant programming language for data scientists and is the better first choice to focus on. There are still plenty of jobs where R is required if you have the time.