Let us understand what are the most important and useful python libraries that can be used in data science.
Data Science, as you all know, it is the process involved in studying the data. Yes, all you got to do is study the data and get new insights from the data. Here there is no need to focus on applying from scratch or learning new algorithms, all you need to know is learn how to approach the data and solve the problem. One of the key things that you need to know is using appropriate libraries to solve a data science problem. This article is all about providing the context to the important libraries used in Data Science. Before dwelling into the topic I would like to introduce the 5 primitive steps involved in solving a data science problem. Now I have sat down and designed these steps from scratch, so there is no right or wrong answer, the correct answer depends on how you approach the data. You can find more tutorials and code for data science, python on my GitHub Repository shown below:
The five important steps involved in Data Science is as shown below:
- Getting the data.
- Cleaning the data
- Exploring the data
- Building the data
- Presenting the data
Now, these steps are designed based on my experience, don’t fall into the assumption that this is the universal answer, but when you sit down and think about the problem, then these steps will make a lot more sense.
1. Getting the data
This is one of the most important steps for solving a data science problem because you have to think of a problem and then eventually think of solving it. One of the best ways to get the data is scraping the data from the internet or download the data set from Kaggle. Now it depends on you how and where to get the data from. I found that Kaggle is one of the best ways to get the data from. Below is the link which leads you to the official website of Kaggle, I need you guys to spend some time in using Kaggle.
Alternatively, you can scrape the data from the websites, to scrape the data you need specific ways and tools to do so. Below is my article where I have shown how to scrape the data from the websites.
Some of the most important libraries that are used to get or scrape the data from the internet are as shown below:
- Beautiful Soup
Beautiful Soup:It is a python library that is used to extract or get the data from HTML or the XML files. Below is the official documentation of the Beautiful Soup library, I recommend you to go through the link.
To manually install Beautiful Soup just type the command below, here I have given you how to manually install all the libraries too, and make sure first you have python installed, but I recommend you guys to use Google Colab to type and practice your code, because in google colab you don’t need to install any libraries, you just have to just tell “import library_name” and the Colab will automatically import the library for you.
pip install beautifulsoup4
To use Beautiful Soup, you need to import it as shown below:
from bs4 import BeautifulSoupSoup = BeautifulSoup(page_name.text, ‘html.parser’)
Requests:The Requests library in python is used to send HTTP requests in an easy and more friendly way. There are so many methods in request library one of the most commonly used methods is the request.get() which returns the status of the URL passed whether it is a success or failure. Below is the documentation of the requests library, I recommend you go through the documentation for more details.
To manually install request type the following command:
pip install requests
To import the requests library you need to use:
import requestspaga_name = requests.get('url_name')
Pandas: Pandas is a high performance, easy-to-use and convenient data structure and an analysis tool for python programming language. Pandas provide us a data frame to store the data in a clear and concise way. Below is the official documentation of the panda's library.
To manually install pandas just type the code:
pip install pandas
To import pandas library all you have to do is:
import pandas as pd