arsalandywriter.com

Mastering Exploratory Data Analysis with Python: A Hands-On Guide

Written on

Chapter 1: Introduction to the Fortune 500 Dataset

Before diving into the coding aspect, ensure you have imported the necessary libraries for this analysis. We will utilize Pandas to read the CSV file, rename columns, and sort the data.

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

data = pd.read_csv("fortune500.csv")

data.head()

To prevent any confusion caused by the column names, I renamed the "Revenue (in millions)" column to simply "Revenue." Next, I filtered the dataset to include only those entries from the year 1971.

data.rename(columns={"Revenue (in millions)": "Revenue"}, inplace=True)

data_1971 = data[data['Year'] == 1971]

I chose to break down the code into manageable sections for clarity. I created a new variable that sorts the 1971 data by revenue. Since we are interested in the "top 20 companies," I set the sorting order to descending.

data_sorted = data_1971.sort_values('Revenue', ascending=False).head(20)

data_sorted

For visualization, I utilized Matplotlib to create a straightforward bar graph. Notably, the last five companies exhibit minimal revenue differences, which I highlighted with a red line for better visibility.

plt.figure(figsize=(15, 9))

plt.plot(data_sorted['Company'], data_sorted['Revenue'], color='red')

plt.bar(data_sorted['Company'], data_sorted['Revenue'], color='lightgrey')

plt.xlabel('Company Name')

plt.ylabel('Revenue (in Millions)')

plt.title('Top 20 Company Revenues in 1971')

plt.xticks(rotation=45)

plt.show()

Chapter 2: Analyzing Profit Growth (1990-1999)

Next, we will analyze which top 10 companies experienced the greatest profit increases between 1990 and 1999. To achieve this, I stored the profit data for each year in separate variables.

data_1990 = data[data['Year'] == 1990]

data_1999 = data[data['Year'] == 1999]

Using the .merge function, I combined the two datasets to facilitate comparison.

merged_data = pd.merge(data_1990, data_1999, on='Company', suffixes=('_1990', '_1999'))

merged_data.dtypes

To enhance clarity, I converted the profit columns to float and removed unnecessary columns.

merged_data['Profit_1999'] = pd.to_numeric(merged_data['Profit_1999'], errors='coerce')

merged_data['Profit_1990'] = pd.to_numeric(merged_data['Profit_1990'], errors='coerce')

merged_data.drop(columns=['Revenue_1990', 'Revenue_1999', 'Rank_1990', 'Rank_1999'], inplace=True)

To address the ambiguity in the question, I calculated both absolute and percentage profit increases.

merged_data['Profit_Increase'] = merged_data['Profit_1999'] - merged_data['Profit_1990']

merged_data['Profit_Percentage_Increase'] = ((merged_data['Profit_1999'] - merged_data['Profit_1990']) / merged_data['Profit_1990']) * 100

I created a variable to store the top 10 companies based on absolute profit increase and reset the index.

top_10_increases_absolute = merged_data.nlargest(10, 'Profit_Increase').reset_index(drop=True)

top_10_increases_absolute.head(5)

To enhance visualization, I included a red line in the plots to better illustrate the differences in profit increases.

plt.figure(figsize=(10, 6))

plt.bar(top_10_increases_absolute['Company'], top_10_increases_absolute['Profit_Increase'], color='skyblue')

plt.plot(top_10_increases_absolute['Company'], top_10_increases_absolute['Profit_Increase'], color='red')

plt.xlabel('Company')

plt.ylabel('Profit Increase in $')

plt.title('Top 10 Companies with the Most Profit Increase (1990-1999)')

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Transform Your Life with These 6 Must-Read Self-Help Books

Discover six transformative self-help books that can enhance your personal growth journey and inspire lasting change.

Exploring the Art of Breathwork: A Guide to Natural Healing

Discover the transformative power of breathwork and pranayama for personal healing and improved well-being.

Avoid These Common Money Drainers to Boost Your Savings

Discover common financial pitfalls and how to avoid them for better budgeting and savings.