Tips & Tricks may be pretty valuable, particularly in the programming industry. Sometimes a little hack can save both time and your life. Occasionally, a little shortcut or add-on can prove to be a godsend and a great productivity enhancer. The following are some of my favorite tips and techniques that can be utilized to speed up your data analysis in Python. Some may be pretty well-known, while others may be relatively new, but we are sure you will find them helpful the next time you work on a Data Analysis project.
Profiling is a procedure that aids in the comprehension of our data, and PandasProfiling is a Python library that does just that. It is a straightforward and practical method for exploratory data analysis on a Pandas Dataframe. Typically, the df. describe(), and df. info()functions serve as the first step in the EDA process. However, it provides a very rudimentary overview of the data and is not very useful for huge data sets.
In contrast, the Pandas Profiling method extends the pandas DataFrame with df. Profile Report() for rapid data processing. It presents actual data with a single line of code and in an interactive HTML report.
For Candidates who want to advance their careers, Python Online training is the best option
The pandas profiling module computes the following statistics for a given dataset:
Let’s illustrate the possibilities of the adaptable Python profiler using the titanic dataset.
import pandas as pd
titanic_df = pd.read_csv (‘/kaggle/input/titanic/train.csv’)
Executing the following code will enable the report to display in a Jupyter notebook. The data profiling report may be shown in a Jupyter notebook with only this one line of code, which is all required. The information is relatively comprehensive and includes charts wherever they are needed.
It is possible to export the report as an interactive HTML file using the https://mindmajix.com/python-tutorial code provided below.
profile = titanic_df.profile_report(title=’Pandas Profiling Report’)
profile.to_file(output_file=”Titanic data profiling.html”)
Pandas’ DataFrame class has a built-in .plot() method. However, the visuals produced by this function are not interactive, which makes them less engaging. On the contrary, it the easy to plot charts with pandas.DataFrame.plot() function also cannot be ruled out. What if we could use pandas to create interactive graphs without making big changes to the code? You may accomplish so with the aid of the Cufflinks library.
The strength of the plot and the adaptability of pandas are combined in the Cufflinks library to enable graphing more easily.
You can do the installations using the below codes –
pip install plotly # Plotly is a pre-requisite before installing cufflinks
pip install cufflinks
Hack Usage –
import pandas as pd
#importing plotly and cufflinks in offline mode
import cufflinks as cf
“magic commands” refers to a group of functions found in Jupyter Notebooks. These functions were developed to help in some of the most typical issues during standard data analysis. Using the %lsmagic command will allow you to see all accessible magics.
Line magics are prefixed by a single % character and operate on a single line of input. In contrast, cell magics are associated with the double %% prefix and run on multiple lines of information. Both types of magic instructions are denoted by their respective prefixes. If the value is set to 1, it is possible to execute magical functions without first entering the initial%.
Let’s have a look at a few of them that might help perform frequent activities involving data analysis:
The interactive debugger is similarly a magical function, but I have categorized it separately. If an error occurs when executing the code cell, enter %debug on a new line and run it. This will launch an interactive debugging environment and bring you to the point where the error occurred.
You may also use this function to verify the values of variables assigned in the programme and to conduct actions. Press q to quit the debugger.
Print is the appropriate tool to use if you want to generate visually appealing representations of your data structures. It is beneficial for printing dictionaries and JSON data. Let’s see an example that displays output using both print and pprint.
In your Jupyter Notebooks, we may draw attention to anything significant or anything else that needs to be highlighted by using the alert and note boxes. The kind of warning that is selected will determine the color of the message that is shown. Insert one or more of the codes below into a cell that needs to be highlighted.
<div class=”alert alert-block alert-info”>
<b>Tip:</b> Use blue boxes (alert-info) for tips and notes.
If it’s a note, you don’t have to include the word “Note”.
<div class=”alert alert-block alert-warning”>
<b>Example:</b> Yellow Boxes are generally used to include additional examples or mathematical formulas.
<div class=”alert alert-block alert-success”>
Use green box only when necessary like to display links to related content.
<div class=”alert alert-block alert-danger”>
It is good to avoid red boxes but can be used to alert users to not delete some important part of code etc.
Take into consideration a cell in Jupyter Notebook that has the following lines of code in it:
In : 10+5
Out : 17
Only the most recent output will be written since this is the default behavior of the cell; to print any of the others, the print() function will need to be used. It has come to our attention that we can print all of the outputs by including the following code snippet at the very beginning of the notebook.
from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = “all”
To return to the initial configuration:
Interactive shell.ast_node_interactivity = “last_expr”
The typical command line syntax for executing a Python script is python hello.py. However, running the same script with an extra -I, such as Python -I hello.py, has additional benefits. How can we do this?
Python does not quit the interpreter after reaching the programme’s conclusion. Thus, we can validate the variables’ values and the specified functions’ validity.
Second, given that we are still within the interpreter, it is pretty simple for us to start a Python debugger by doing the following:
After this, we will be able to work on the code since we will have been brought to the point where the exception has happened.
Ctrl/Cmd + / comments out selected lines in the cell automatically. Hitting the combination again will uncomment the same line of code.
When working in Jupyter Notebook, have you ever made the mistake of deleting a cell by accident? If you answered yes, then the following is a shortcut that will reverse the operation of deleting that file.
If you have deleted a cell’s contents, you can quickly recover it by hitting CTRL/CMD+Z.
If you need to recover an entire deleted cell, hit ESC+Z or EDIT > Undo Delete Cells.
I hope you have found this post helpful. These tips will give you an excellent foundation to work from if you start coding. Once you get more comfortable with Python and Jupyter Notebooks, plenty of other resources are available to help you learn even more. Stay curious and keep learning; that is the best way to grow as a programmer. Thanks for reading!
Python and R are free, open-source programming languages that may be executed on various operating systems, including Windows, macOS, and Linux. Both are capable of doing almost any data analysis activity, and both are regarded as languages that are not too difficult to master, particularly for those just starting.
Because R’s code is not standardized, learning R might be challenging for novices. For most students, Python is often simpler to understand and has a straight learning curve. In addition, the amount of time spent coding is reduced while using Python since it is simpler to maintain and has a grammar comparable to the English language.
It is recommended that you first get familiar with Python before moving on to R. There are still plenty of jobs where R is required, so if you have the time, it doesn’t hurt to learn both. Still, I’d suggest that these days, Python is becoming the dominant programming language for data scientists and is the better first choice to focus on. There are still plenty of jobs where R is required if you have the time.