Python

Customer Segmentation: Clustering

In this project, I will be performing an unsupervised clustering of data on the customer’s records from a groceries firm’s database. Customer segmentation is the practice of separating customers into groups that reflect similarities among customers in each cluster. I will divide customers into segments to optimize the significance of each customer to the business. To modify products according to distinct needs and behaviors of the customers. It also helps the business to cater to the concerns of different types of customers.

Sales Analysis

In this project I used Python Pandas & Python Matplotlib to analyze and answer business questions about 12 months worth of sales data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.

Correlation in Movie Industry

For this project the first questions that came to my mind was: Is the movie industry dying? Is Netflix the new entertainment king? And the best way to answer those is analyzing that dataset of four decades using Pandas, Matpoltlib and Seaborn to also understand more factors that intervene in this industry, like actors, genres, user ratings and more.

R

Analyzing Super Bowl Ads

Like millions of viewers who tune into the big game year after year, I love Super Bowl commercials. I love them so much, in fact, that I wanted to know everything about them… by analyzing and categorizing them, of course. I dug into the defining characteristics of a Super Bowl ad, then grouped commercials based on which criteria they shared — and let me tell you, I found some really interesting clusters of commercials.

Transit Cost Project

The goal of this work is to figure out how to deliver more high-capacity transit projects for a fraction of the cost in countries like the United States. Additionally, we hope that our site will be a useful resource for elected officials, planners, researchers, journalists, advocates, and others interested in contextualizing transit-infrastructure costs and fighting for better projects.

Multi Linear Regression

I used the build-in dataset (meapsingle) on wooldridge pacakge, to study the effects of single-parent households on student math performance, using multi linear regression. These data are for a subset of schools in southeast Michigan for the year 2000. The socioeconomic variables are obtained at the ZIP code level (where ZIP code is assigned to schools based on their mailing addresses).

SQL

SQL Data Exploration

For this beginner-mid level project I used a Covid dataset as my base to use queries to explore the information that the dataset contains. Skills used: Joins, CTE’s, Temp Tables, Windows Functions, Aggregate Functions, Creating Views, Converting Data Types.

Data Cleaning in SQL

Continuing my SQL projects, I will be performing data cleaning in a more advanced way. The data set that I will be cleaning have the information about Nashville Housing, that contains information about the Property Address, Sales Date, Sales Price, Owner Name and much more.

Tableau Projects

Here you will find my projects using this powerful visualization tool, Tableau.

Portfolio

Python

R

SQL