Analyzing the Paris 2024 Olympic Dataset with Python and Power BI
The Paris 2024 Summer Olympic promise to be a grand event, and analyzing the related datasets in Power BI and provide valuable insights into various aspects of the games. In this blog, we will explore how to use Python to download and manage the dataset from Kaggle and then utilize Power BI to visualize the insights for effective analysis.
This project involves several steps, from downloading the data to preparing it for visualization. Here’s a breakdown of the entire process:
Step 1: Setting Up Kaggle API and Python Environment
First, we need to set up the Kaggle API to access and download datasets. Follow these steps:
- Sign up on Kaggle: If you don’t have a Kaggle account, sign up here.
- Create an API Token: In your Kaggle account settings, go to the “API” section and click “Create New API Token.” This will download a
kaggle.jsonfile containing your credentials. - Save the API Token: Place the
kaggle.jsonfile in a secure directory on your machine, where it will be used to authenticate the Kaggle API.
Python Environment Setup
Make sure you have Python installed along with the necessary libraries:
bash
pip install kaggle pandas
Step 2: Python Script for Downloading the Olympic Dataset
Below is the Python script that automates the process of downloading the Paris 2024 Olympic dataset from Kaggle and loading it into Pandas DataFrames for analysis.
python
import kaggle
import os
import pandas as pd
# Set Kaggle API credentials directory
os.environ['KAGGLE_CONFIG_DIR'] = 'C:/Users/faisa/.kaggle' # Update this path to your Kaggle configuration directory
# Specify the dataset identifier
dataset = 'piterfm/paris-2024-olympic-summer-games'
# Set the download path
download_path = 'C:/Users/faisa/Downloads/Power BI_Imp Summary/Olympic/Source' # Change this to your preferred download directory
# Remove existing files in the folder to prevent duplicates or outdated files
for file in os.listdir(download_path):
file_path = os.path.join(download_path, file)
try:
if os.path.isfile(file_path):
os.unlink(file_path) # Delete the file
print(f"Deleted {file_path}")
except Exception as e:
print(f"Error deleting {file_path}: {e}")
# Download the dataset using the Kaggle API and unzip the files
kaggle.api.dataset_download_files(dataset, path=download_path, unzip=True)
# List of CSV files to be imported
csv_files = [
'athletes.csv',
'events.csv',
'medallists.csv',
'medals.csv',
'medals_total.csv',
'schedules.csv',
'schedules_preliminary.csv',
'teams.csv',
'torch_route.csv',
'venues.csv'
]
# Initialize a dictionary to hold DataFrames
dataframes = {}
# Iterate through each CSV file and load it into a DataFrame
for file in csv_files:
# Construct the full path to the CSV file
file_path = os.path.join(download_path, file)
# Load the CSV file into a Pandas DataFrame
df = pd.read_csv(file_path)
# Add the DataFrame to the dictionary using the file name as the key
table_name = file.split('.')[0] # Remove the .csv extension
dataframes[table_name] = df
Explanation of the Code
- Setting the Environment: We start by setting the
KAGGLE_CONFIG_DIRto point to the directory containing thekaggle.jsonfile. This enables the script to authenticate and access Kaggle’s API. - Dataset Identifier: The dataset identifier
piterfm/paris-2024-olympic-summer-gamesis used to specify which dataset to download. - Clearing Existing Files: To prevent duplicates or outdated data, the script deletes any existing files in the download directory before proceeding with the new download.
- Downloading and Unzipping: The
kaggle.api.dataset_download_filesfunction downloads and unzips the dataset files to the specified path. - Loading CSV Files into DataFrames: Finally, the script iterates through the list of CSV files, loading each into a Pandas DataFrame. These DataFrames are stored in a dictionary for easy access using the file name as the key.
Step 3: Importing Olympic Data into Power BI
With the dataset downloaded and organized into DataFrames, we can now import this data into Power BI for visualization.
Connecting Power BI to Python
- Open Power BI: Start by opening Power BI Desktop.
- Get Data: Click on “Get Data” in the Home tab and choose “Python script” as the data source.
- Load the Script: Copy and paste the following Python script into the Power BI editor to load the DataFrames:
python
# Import necessary libraries import pandas as pd import os # Define the path to the downloaded CSV files download_path = 'C:/Users/faisa/Downloads/Power BI_Imp Summary/Olympic/Source' # Load the data into DataFrames athletes = pd.read_csv(os.path.join(download_path, 'athletes.csv')) events = pd.read_csv(os.path.join(download_path, 'events.csv')) medallists = pd.read_csv(os.path.join(download_path, 'medallists.csv')) medals = pd.read_csv(os.path.join(download_path, 'medals.csv')) medals_total = pd.read_csv(os.path.join(download_path, 'medals_total.csv')) schedules = pd.read_csv(os.path.join(download_path, 'schedules.csv')) schedules_preliminary = pd.read_csv(os.path.join(download_path, 'schedules_preliminary.csv')) teams = pd.read_csv(os.path.join(download_path, 'teams.csv')) torch_route = pd.read_csv(os.path.join(download_path, 'torch_route.csv')) venues = pd.read_csv(os.path.join(download_path, 'venues.csv'))
- Load the Data: Once the script is executed, Power BI will load the data from the CSV files into tables that can be used for building reports.
Step 4: Creating Visualizations in Power BI
With the data loaded into Power BI, we can now create compelling visualizations to analyze the Olympic dataset. Here are a few examples:
Visual 1: Olympic Athlete Participation by Country
- Type: Bar Chart
- Fields: Use the
athletestable to plot the number of athletes participating from each country. This provides a quick overview of which countries have the most representation in the Olympics.
Visual 2: Olympic Medal Count by Country
- Type: Pie Chart
- Fields: Utilize the
medals_totaltable to display the total medal count for each country. This visualization helps identify the top-performing nations in terms of medals won.
Visual 3: Olympic Event Schedule
- Type: Table
- Fields: Use the
schedulestable to create a detailed schedule of events, including event names, dates, and venues. This allows viewers to plan their viewing schedule and follow specific events.
Visual 4: Olympic Torch Route Map
- Type: Map
- Fields: Employ the
torch_routetable to visualize the Olympic torch route on a map. This visualization adds geographical context to the relay and highlights the journey of the Olympic flame.
Visual 5: Olympic Medal Trends Over Time
- Type: Line Chart
- Fields: Use the
medalstable to display trends in medal wins over time, broken down by country or sport. This visualization offers insights into how countries’ performances have changed across different Olympics.
Conclusion
By leveraging Python and Power BI, we’ve efficiently downloaded, organized, and visualized the Paris 2024 Olympic dataset. This project demonstrates the power of data analysis and visualization in understanding the dynamics of a global event like the Olympics.
Through this exploration, we gain valuable insights into athlete participation, medal trends, event schedules, and more. As the 2024 Summer Olympics draw nearer, this analysis provides a fascinating glimpse into the event’s potential outcomes and highlights the immense scale and diversity of the games.
Further Exploration
- Advanced Analytics: Dive deeper into advanced analytics by applying machine learning models to predict medal outcomes or identify key performance indicators for athletes.
- Dynamic Dashboards: Enhance Power BI reports with dynamic dashboards, providing interactive and real-time updates on Olympic events and statistics.
Feel free to download the code and datasets, and experiment with your own visualizations and analyses. Happy analyzing!


