Como o Agile Coach evoluir na jornada

A quantidade de profissionais que atuam na função de Agile Coach têm crescido exponencialmente, mas aquele que veste este chapéu, nem sempre sabe quais deveriam ser suas habilidades e domínios para…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Data Scraping and Data Cleaning

1. Introduction to Beautiful Soup

Beautiful Soup is a Python library which is an expert at pulling out of HTML and XML files. The official website[1] mentions its three powerful features:

(1) Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need.

(2) Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don’t have to think about encodings, unless the document doesn’t specify an encoding and Beautiful Soup can’t detect one. Then you just have to specify the original encoding.

2. Installation of Beautiful Soup

The latest release of Beautiful Soup is version 4.6.3. And it this version works on both Python2 (2.7+) and Python 3. You can issue the following command to check out which version of Python is installed:

python — version

For Mac users, I recommend to use pip.

2.1 Install pip

We can use the following command to install pip in terminal:

We can also consider upgrading pip via the following command:

pip install — upgrade pip

If you happened to see:

distributed 1.21.8 requires msgpack, which is not installed.

Just type the following command:

pip install msgpack

2.2 Install beautifulsoup4

Now we are ready to install beautifulsoup4 by typing the following command:

pip install beautifulsoup4

2.3 Install other libraries that we may use

pip install pandas

pip install requests

3. Several Resources of NBA statistics

From this website, we download 20 CSV files about Schedule and Results, Team Per Game Stats, Opponent Per Game Stats, Miscellaneous Stats from 2013 to 2018.

We use web scraping to derive team statistics via beautifulsoup. And the detail will be stated in part 4.

4. Web Scraping (Our code)

4.1 Import

First of all, we import all possible libraries that we plan to use.

Requests is a library that are used for getting the source code of the website. For more details, you can scan through the website[2].

5. Data Cleaning

5.1 Import Raw Game Results Data (data frame named df)

In this part, we import game results of 2013–2014 regular season, 2014–2015 regular season, 2015–2016 regular season, 2016–2017 regular season, 2017–2018 regular season.

5.2 Data Preprocessing

In this part, we combine above 5 data frames at first. There are not any missing values or abnormal values.

Due to the fact that Charlotte Bobcats changed its team name to Charlotte Hornets, we unify this team name.

We also notice that there exists overtime situations, so we add a column named ‘Overtime’, which symbolises the number of overtimes in each game. And the values in column ‘Overtime’ range from 0 to 4.

For this data frame, we select needed columns for our project. Here is the head of our first data frame:

5.3 Additional Datasets

5.3.1 Import Miscellaneous Stats from 2013 to 2018 (Data frame named: dfM)

In this part, we merge 5 data frames about Miscellaneous Stats together. And then, we unify the team name Charlotte Hornets and drop columns that we do not need: ‘Rk’, ‘Arena’, ‘Attend’, ‘Attend./G’, rename columns.Here is the head of this data frame:

5.3.2 Import Opponent Per Game Stats from 2013 to 2018 (Data frame named: dfO)

5.3.3 Import Team Per Game Stats from 2013 to 2018 (Data frame named: dfT)

5.4 Merge all Dataframes

References

Add a comment

Related posts:

How to Give Feedback if Your Office Dog is a Millennial

This is part two of our series on giving your office dog their performance review. Please find the first guide here. Congratulations! By deciding to give your office dog performance feedback, you’ve…

Parasitic Astral Entities Plague Us

While we wander through life, we open different doorways to our destiny. It is our choices that define us and who we are as people. Learning about the unknown can be often scary, since we don’t know…

The Real CBD Exposed

The real CBD exposed asks the question of what the real benefits are from taking a CBD product regularly. Most people who have tried cannabinoid oil products, find them to be soothing and relaxing…