Analyzing the Paris 2024 Olympic Dataset with Python and Power BI

The Paris 2024 Summer Olympic promise to be a grand event, and analyzing the related datasets in Power BI and provide valuable insights into various aspects of the games. In this blog, we will explore how to use Python to download and manage the dataset from Kaggle and then utilize Power BI to visualize the insights for effective analysis.

This project involves several steps, from downloading the data to preparing it for visualization. Here’s a breakdown of the entire process:

Step 1: Setting Up Kaggle API and Python Environment

First, we need to set up the Kaggle API to access and download datasets. Follow these steps:

Sign up on Kaggle: If you don’t have a Kaggle account, sign up here.
Create an API Token: In your Kaggle account settings, go to the “API” section and click “Create New API Token.” This will download a kaggle.json file containing your credentials.
Save the API Token: Place the kaggle.json file in a secure directory on your machine, where it will be used to authenticate the Kaggle API.

Python Environment Setup

Make sure you have Python installed along with the necessary libraries:

bash

pip install kaggle pandas

Step 2: Python Script for Downloading the Olympic Dataset

Below is the Python script that automates the process of downloading the Paris 2024 Olympic dataset from Kaggle and loading it into Pandas DataFrames for analysis.

python

import kaggle
import os
import pandas as pd

# Set Kaggle API credentials directory
os.environ['KAGGLE_CONFIG_DIR'] = 'C:/Users/faisa/.kaggle'  # Update this path to your Kaggle configuration directory

# Specify the dataset identifier
dataset = 'piterfm/paris-2024-olympic-summer-games'

# Set the download path
download_path = 'C:/Users/faisa/Downloads/Power BI_Imp Summary/Olympic/Source' # Change this to your preferred download directory

# Remove existing files in the folder to prevent duplicates or outdated files
for file in os.listdir(download_path):
    file_path = os.path.join(download_path, file)
    try:
        if os.path.isfile(file_path):
            os.unlink(file_path)  # Delete the file
            print(f"Deleted {file_path}")
    except Exception as e:
        print(f"Error deleting {file_path}: {e}")

# Download the dataset using the Kaggle API and unzip the files
kaggle.api.dataset_download_files(dataset, path=download_path, unzip=True)

# List of CSV files to be imported
csv_files = [
    'athletes.csv',
    'events.csv',
    'medallists.csv',
    'medals.csv',
    'medals_total.csv',
    'schedules.csv',
    'schedules_preliminary.csv',
    'teams.csv',
    'torch_route.csv',
    'venues.csv'
]

# Initialize a dictionary to hold DataFrames
dataframes = {}

# Iterate through each CSV file and load it into a DataFrame
for file in csv_files:
    # Construct the full path to the CSV file
    file_path = os.path.join(download_path, file)
    
    # Load the CSV file into a Pandas DataFrame
    df = pd.read_csv(file_path)
    
    # Add the DataFrame to the dictionary using the file name as the key
    table_name = file.split('.')[0]  # Remove the .csv extension
    dataframes[table_name] = df

Explanation of the Code

Setting the Environment: We start by setting the KAGGLE_CONFIG_DIR to point to the directory containing the kaggle.json file. This enables the script to authenticate and access Kaggle’s API.
Dataset Identifier: The dataset identifier piterfm/paris-2024-olympic-summer-games is used to specify which dataset to download.
Clearing Existing Files: To prevent duplicates or outdated data, the script deletes any existing files in the download directory before proceeding with the new download.
Downloading and Unzipping: The kaggle.api.dataset_download_files function downloads and unzips the dataset files to the specified path.
Loading CSV Files into DataFrames: Finally, the script iterates through the list of CSV files, loading each into a Pandas DataFrame. These DataFrames are stored in a dictionary for easy access using the file name as the key.

Step 3: Importing Olympic Data into Power BI

With the dataset downloaded and organized into DataFrames, we can now import this data into Power BI for visualization.

Connecting Power BI to Python

Open Power BI: Start by opening Power BI Desktop.
Get Data: Click on “Get Data” in the Home tab and choose “Python script” as the data source.
Load the Script: Copy and paste the following Python script into the Power BI editor to load the DataFrames:

python

# Import necessary libraries
import pandas as pd
import os

# Define the path to the downloaded CSV files
download_path = 'C:/Users/faisa/Downloads/Power BI_Imp Summary/Olympic/Source'

# Load the data into DataFrames
athletes = pd.read_csv(os.path.join(download_path, 'athletes.csv'))
events = pd.read_csv(os.path.join(download_path, 'events.csv'))
medallists = pd.read_csv(os.path.join(download_path, 'medallists.csv'))
medals = pd.read_csv(os.path.join(download_path, 'medals.csv'))
medals_total = pd.read_csv(os.path.join(download_path, 'medals_total.csv'))
schedules = pd.read_csv(os.path.join(download_path, 'schedules.csv'))
schedules_preliminary = pd.read_csv(os.path.join(download_path, 'schedules_preliminary.csv'))
teams = pd.read_csv(os.path.join(download_path, 'teams.csv'))
torch_route = pd.read_csv(os.path.join(download_path, 'torch_route.csv'))
venues = pd.read_csv(os.path.join(download_path, 'venues.csv'))

Load the Data: Once the script is executed, Power BI will load the data from the CSV files into tables that can be used for building reports.

Step 4: Creating Visualizations in Power BI

With the data loaded into Power BI, we can now create compelling visualizations to analyze the Olympic dataset. Here are a few examples:

Visual 1: Olympic Athlete Participation by Country

Type: Bar Chart
Fields: Use the athletes table to plot the number of athletes participating from each country. This provides a quick overview of which countries have the most representation in the Olympics.

Visual 2: Olympic Medal Count by Country

Type: Pie Chart
Fields: Utilize the medals_total table to display the total medal count for each country. This visualization helps identify the top-performing nations in terms of medals won.

Visual 3: Olympic Event Schedule

Type: Table
Fields: Use the schedules table to create a detailed schedule of events, including event names, dates, and venues. This allows viewers to plan their viewing schedule and follow specific events.

Visual 4: Olympic Torch Route Map

Type: Map
Fields: Employ the torch_route table to visualize the Olympic torch route on a map. This visualization adds geographical context to the relay and highlights the journey of the Olympic flame.

Visual 5: Olympic Medal Trends Over Time

Type: Line Chart
Fields: Use the medals table to display trends in medal wins over time, broken down by country or sport. This visualization offers insights into how countries’ performances have changed across different Olympics.

Conclusion

By leveraging Python and Power BI, we’ve efficiently downloaded, organized, and visualized the Paris 2024 Olympic dataset. This project demonstrates the power of data analysis and visualization in understanding the dynamics of a global event like the Olympics.

Through this exploration, we gain valuable insights into athlete participation, medal trends, event schedules, and more. As the 2024 Summer Olympics draw nearer, this analysis provides a fascinating glimpse into the event’s potential outcomes and highlights the immense scale and diversity of the games.

Further Exploration

Advanced Analytics: Dive deeper into advanced analytics by applying machine learning models to predict medal outcomes or identify key performance indicators for athletes.
Dynamic Dashboards: Enhance Power BI reports with dynamic dashboards, providing interactive and real-time updates on Olympic events and statistics.

Feel free to download the code and datasets, and experiment with your own visualizations and analyses. Happy analyzing!

Download Olympics Dashboard

Olympic with Python and Power BI