NYC Taxi & Limousine Commission Data Analytics
Video Demo
Technology Used
Python (Pandas)
SQL (BigQuery)
Looker Studio
GCP (Engine)
Mage AI (ETL Tool)
Results: Insights
- Total Revenue: The total revenue generated by the taxi trips in the dataset is $1,639,072.09.
- Average Trip Distance: The average distance traveled by the taxi trips in the dataset is 3.03 miles.
- Average Fare Amount: The average fare amount charged by the taxi trips in the dataset is $13.25.
- Average Tip Amount: The average tip amount given by the passengers to the taxi drivers in the dataset is $1.87.
- Payment Type: The most common payment type used by the passengers in the dataset is Credit card (34.9%), followed by Dispute (25.2%), No charge (20.5%), and Cash (19.4%).
- Rate Code: The most common rate code applied to the taxi trips in the dataset is Standard rate (75.5%), followed by JFK (11.8%), Negotiated fare (6.7%), Newark (3.8%), and Nassau or Westchester (2.2%).
Problem Statement
- The New York City Taxi and Limousine Commission (TLC), created in 1971, is the agency responsible for licensing and regulating New York City's Medallion (Yellow) taxi cabs, for-hire vehicles (community-based liveries, black cars, and luxury limousines), commuter vans, and paratransit vehicles.
- The problem is to analyze the data from the TLC website in terms of revenue, distance, fare, tip, rate codes, and payment types.
- Creating a simple data visualization tool that can be accessed daily so one can make data-driven decisions for the company.
Data Extraction Using Python
- Extracting data from the NYC website as parquest files.
- Converting the parquest files to pandas dataframe.
- Checked garbage values or outliers in the data frame in every column and marked them.
- Checked for blank or null values in every column and marked them as missing.
Data Cleaning and Transformation(ETL) using Python and Mage AI
- Performed data cleaning and transformation to prepare the data for analysis.
- Removed all records with blank or null values in all columns.
- Created custom columns by combining columns and slicing information.
- Performing the ETL process using the mage ai on the Google Compute Engine and loading data to the BigQuery database.
Data Analysis Using SQL (BigQuery)
- Performed analysis to extract insights from the transaction and date tables.
- Joined the columns using joins and applied filters by rate codes, payment types, and fare amounts to select the relevant data.
- Calculated the total fare amount, number of rides, and other metrics of interest using aggregation functions.
Data Visualization using Looker Studio
- Performed data visualization using Looker Studio to create interactive and insightful reports and dashboards.
- Created measures to define custom calculations and metrics for the data.
- Created bar, pie, and map charts to visualize the data and compare categories and trends.
- Create vendor ID, payment type, rate code, Trip distance, passenger count, & and total amount slicers and dropdowns to filter the data and enable dynamic analysis.