NYC Taxi & Limousine Commission Data Analytics

Video Demo

Technology Used

Python (Pandas)
SQL (BigQuery)
Looker Studio
GCP (Engine)
Mage AI (ETL Tool)

Results: Insights

  • Total Revenue: The total revenue generated by the taxi trips in the dataset is $1,639,072.09.
  • Average Trip Distance: The average distance traveled by the taxi trips in the dataset is 3.03 miles.
  • Average Fare Amount: The average fare amount charged by the taxi trips in the dataset is $13.25.
  • Average Tip Amount: The average tip amount given by the passengers to the taxi drivers in the dataset is $1.87.
  • Payment Type: The most common payment type used by the passengers in the dataset is Credit card (34.9%), followed by Dispute (25.2%), No charge (20.5%), and Cash (19.4%).
  • Rate Code: The most common rate code applied to the taxi trips in the dataset is Standard rate (75.5%), followed by JFK (11.8%), Negotiated fare (6.7%), Newark (3.8%), and Nassau or Westchester (2.2%).

Problem Statement

  • The New York City Taxi and Limousine Commission (TLC), created in 1971, is the agency responsible for licensing and regulating New York City's Medallion (Yellow) taxi cabs, for-hire vehicles (community-based liveries, black cars, and luxury limousines), commuter vans, and paratransit vehicles.
  • The problem is to analyze the data from the TLC website in terms of revenue, distance, fare, tip, rate codes, and payment types.
  • Creating a simple data visualization tool that can be accessed daily so one can make data-driven decisions for the company.

Data Extraction Using Python

  • Extracting data from the NYC website as parquest files.
  • Converting the parquest files to pandas dataframe.
  • Checked garbage values or outliers in the data frame in every column and marked them.
  • Checked for blank or null values in every column and marked them as missing.

Data Cleaning and Transformation(ETL) using Python and Mage AI

  • Performed data cleaning and transformation to prepare the data for analysis.
  • Removed all records with blank or null values in all columns.
  • Created custom columns by combining columns and slicing information.
  • Performing the ETL process using the mage ai on the Google Compute Engine and loading data to the BigQuery database.

Data Analysis Using SQL (BigQuery)

  • Performed analysis to extract insights from the transaction and date tables.
  • Joined the columns using joins and applied filters by rate codes, payment types, and fare amounts to select the relevant data.
  • Calculated the total fare amount, number of rides, and other metrics of interest using aggregation functions.

Data Visualization using Looker Studio

  • Performed data visualization using Looker Studio to create interactive and insightful reports and dashboards.
  • Created measures to define custom calculations and metrics for the data.
  • Created bar, pie, and map charts to visualize the data and compare categories and trends.
  • Create vendor ID, payment type, rate code, Trip distance, passenger count, & and total amount slicers and dropdowns to filter the data and enable dynamic analysis.