Exploring Zomato Restaurant Insights in Indian Cities with Python
Written on
Chapter 1: Introduction to Zomato Restaurant Analysis
In this blog post, we will delve into a dataset featuring 900 restaurants spread across 13 major Indian cities. By leveraging Python and its visualization libraries, we will uncover valuable insights into dining and delivery ratings, customer preferences, popular cuisines, and more. Our aim is to distill complex data into easily digestible insights, making it accessible to food enthusiasts, analysts, and business owners alike.
The first video provides an overview of how to locate top dining establishments in your city using the Zomato API and Python, showcasing practical applications of data science.
Section 1.1: Overview of the Dataset
This dataset from Kaggle encompasses a wide array of information about the restaurant landscape in 13 metropolitan areas of India. It includes customer ratings for dining and delivery services, reviews, the most popular dishes, and pricing data. This extensive dataset not only allows for an examination of dining trends but also facilitates comparisons among different restaurants and cuisines.
Section 1.2: Preparing for Data Analysis
To kickstart our analysis, we need to load several essential libraries:
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
Next, we'll load the Zomato dataset for our analysis:
df_zomato = pd.read_csv("zomato_dataset.csv", skipinitialspace=True)
Chapter 2: Data Exploration and Insights
In this chapter, we will explore the dataset, check for data types, and clean the data.
The second video dives into the analysis of the Zomato dataset using Python, demonstrating various programming techniques for data exploration.
Section 2.1: Data Cleaning and Preparation
Initially, we will perform a basic inspection of the data to identify any missing values and their types.
df_zomato.head()
df_zomato.info()
We will replace any missing values in the 'Dining Rating' and 'Delivery Rating' columns with zero:
df_zomato['DeliveryRating'] = df_zomato['DeliveryRating'].fillna(0)
df_zomato['DiningRating'] = df_zomato['DiningRating'].fillna(0)
Section 2.2: Extracting Insights from the Data
We will create a function to visualize data insights:
def get_plots(df, criteria, title):
list_cities = list(df.City.unique())
my_colors = ['#FA8072', '#6495ED', '#40E0D0', '#808080', '#28B463']
fig, axs = plt.subplots(17, 1, figsize=(10, 60), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace=.5, wspace=.001)
axs = axs.ravel()
for i, city in enumerate(list_cities):
df_filtered = df[df['City'] == city]
df_filtered.sort_values('Prices', ascending=False, inplace=True)
df_filtered = df_filtered.head(5)
bars = axs[i].barh(df_filtered[criteria], df_filtered['Prices'], color=my_colors)
axs[i].set_title(f'{title} {criteria} in {city}')
axs[i].bar_label(bars)
Insight 1: Most Expensive Cuisine by City
df_city = df_zomato.groupby(['City', 'Cuisine'], as_index=False)['Prices'].max()
idx = df_city.groupby('City')['Prices'].idxmax()
df_city_max_price = df_city.loc[idx]
Insight 2: Most Popular Restaurants
We will calculate a 'Total Rating' by summing the dining and delivery ratings:
df_zomato['Total_rating'] = df_zomato['DiningRating'] + df_zomato['DeliveryRating']
df_rating = df_zomato.groupby(['City', 'RestaurantName'], as_index=False)['Total_rating'].max()
Insight 3: Cities with the Most Restaurants
num_rest = df_zomato['City'].value_counts().nlargest(12).sort_values(ascending=False)
plt.figure(figsize=(12, 6))
ax = num_rest.plot(kind='barh', color='#6495ED')
plt.xlabel("Count of Restaurants")
plt.ylabel("City")
plt.title("Number of Restaurants in Various Cities", fontsize=8, weight='bold')
ax.invert_yaxis()
plt.tight_layout()
Insight 4: Best Sellers Analysis
df_bestseller = df_zomato['BestSeller'].value_counts().nlargest(5).sort_values(ascending=False)
plt.figure(figsize=(12, 6))
plt.pie(df_bestseller, labels=df_bestseller.index, autopct='%1.1f%%', pctdistance=0.85)
plt.title('% Share of Top 5 Best Sellers')
plt.show()
Insight 5: Top 5 Restaurants Post-Consolidation
To consolidate areas in Bangalore, we will combine certain neighborhoods:
BLR_areas = ['Banaswadi', 'Ulsoor', 'Malleshwaram', 'Magrath Road']
df_BLR = df_zomato.copy()
df_BLR['City'].replace(BLR_areas, 'Bangalore', inplace=True)
df_BLR_cnt = df_BLR.groupby(['City', 'RestaurantName'], as_index=False)['RestaurantName'].value_counts()
get_restaurant_plots(df_BLR_cnt, "Top 5 Restaurants in ")
Chapter 3: Conclusion
Through this analysis, we have unearthed valuable insights regarding the culinary environment in various Indian cities, shedding light on pricing, popular dishes, and customer ratings. Such information is invaluable for food lovers and business owners alike, enabling data-driven decisions in the ever-evolving food industry. By harnessing the power of Python, we gain a deeper appreciation of the rich tapestry that is Indian food culture. We hope you found this exploration insightful and engaging!
Connect with me on LinkedIn, GitHub, or Medium for more insights on data science using Python and R.