Summary of AI ML Project
AI/ML - Industries - Developed/ Created/ Expanded work
Projects in this section are listed according to Industry/Business Domain.
Sometimes it is difficult for me to find out in what domain a particular project falls into, therefore I have created this page to decide the domain.
BFSI (Banking, Financial Services, and Insurance)
BFSI includes financial institutions, banks, insurance companies, investment firms, and other entities offering services such as lending, investment, wealth management, and financial protection. This sector is heavily regulated and technology-driven for security and risk management.
Credit-Fraud-Detection
DoeJones-Prediction-with-News
Loan-Approval
HR (Human Resources)
This domain covers employee management, recruitment, training, compensation, and workplace culture. It also includes HR technology and services related to people management and organizational development.
HR Analysis of Employee Attrition & Performance
- Github Repo
- Colab
- HR Analysis of Employee Attrition & Performance - R
- HR Analysis of Employee Attrition & Performance - Python
- Github dataset
- Objective: Uncover the factors that lead to employee attrition and explore important questions such as ‘show me a breakdown of distance from home by job role and attrition’ or ‘compare average monthly income by education and attrition’. This is a fictional data set created by IBM data scientists.
Health
The health domain includes healthcare providers, hospitals, pharmaceuticals, health insurance, and healthcare technology focused on improving patient care, medical research, and public health initiatives. This vertical will not include project related to Health-Infra development.
Liver Patient Analysis
Breast-Cancer-Prediction
Chest-XRay - Effusion Segmentation
Chest-XRay - Effusion Classification
Covid-worldwide-EDA
India-Covid-Graphs
Malaria-Detection_dep
pnemonia_prediction
Energy
This domain involves the production, management, and distribution of energy, including fossil fuels, renewable energy (solar, wind, hydro), nuclear power, and energy conservation technologies, along with grid management.
UK-Energy-Consumption
AirQuality-Prediction
Climate
Climate and energy are inter-related, therefore to avoid confusion any project related to Energy will not come in climate vertical. This domain focuses on climate science, environmental monitoring, and sustainability initiatives, including research and development on climate change, renewable energy, environmental policy, and green technologies to reduce carbon footprints
Acea Smart Water Analytics & Prediction
Objective
- The Acea Group deals with four different type of waterbodies: water spring (for which three datasets are provided), lake (for which a dataset is provided), river (for which a dataset is provided) and aquifers (for which four datasets are provided).
- This competition uses nine different datasets, completely independent and not linked to each other. Each dataset can represent a different kind of waterbody. As each waterbody is different from the other, the related features as well are different from each other.
- It is of the utmost importance to notice that some features like rainfall and temperature, which are present in each dataset, don’t go alongside the date. Indeed, both rainfall and temperature affect features like level, flow, depth to groundwater and hydrometry some time after it fell down. This means, for instance, that rain fell on 1st January doesn’t affect the mentioned features right the same day but some time later. As we don’t know how many days/weeks/months later rainfall affects these features, this is another aspect to keep into consideration when analyzing the dataset.
eCommerce
The e-commerce domain comprises online platforms and businesses that facilitate buying and selling goods and services over the internet. It includes marketplaces, payment processing, logistics, and digital retailing.
Black Friday Sales Data Analysis Prediction
- Github Repo
- Colab
- Dataset
- Black Friday Sales Data Analysis Prediction - Kaggle
- About Dataset: This dataset comprises of sales transactions captured at a retail store. It’s a classic dataset which has multiple shopping experiences. This is a regression problem. The dataset has 550,069 rows and 12 columns.
Amazon Sentiment Analysis
Bigdata-AmazonReviews
Online-Retail-Customer-Clustering
Recommendation System Amazon Electronics
Economics and International Trade
This field involves the study and application of economic theories, policies, and data analysis to understand markets, consumer behavior, global trade, and financial trends. It serves as the foundation for economic research, policy-making, and financial planning.
Economy-Analysis
Prosperity-Clustering
Marine Consultant - GOI
- Github Repo - This is a GPT on chatGPT prototype which helps them planning strategy for bilateral or multilateral engagements with other countries.
Electronics
The electronics domain includes the design, manufacturing, and distribution of electronic devices and components, such as semiconductors, consumer electronics, computing hardware, and embedded systems.
Hand Gesture Recognition
Industrial Safety
Industrial safety focuses on workplace safety standards, risk management, and protocols to protect employees and prevent accidents in industrial environments. It includes safety training, hazard assessments, and regulatory compliance.
Industrial Accident Cause Analysis
OSHA Accidents and Injury
Tourism, Hospitality, Hotel, Restaurant and Event Management
The hospitality domain involves businesses that provide accommodation, food, and leisure services, such as hotels, resorts, restaurants, and cafes, focusing on guest experiences and comfort.
Zomato Review
Indian Food Item Recommendations in Restaurants
FoodDemand Forcast
Travel & Logistic
The Travel & Logistics domain encompasses the movement of people and goods. It includes various industries such as transportation, warehousing, distribution, and supply chain management for both individuals and businesses. The focus in this domain is on efficient, timely, and cost-effective transport, as well as providing seamless travel experiences. This sector is heavily influenced by technology for tracking, route optimization, and resource management. This domain has some overlap with eCommerce and Sales.
Flight Delay Analysis using Hive
This dataset contains the 2004-2005 flights data from the 2009 ASA Statistical Computing and Graphics Data Expo consisted of flight arrival and departure details for all commercial flights on major carriers within the United States of America from October 1987 to April 2008
Activities (Pipeline) in project:
- Creating hive table (for storage) from the external files
- Create partition table schema
- Parition hive table based on the year and putting data in partition table.
- Performing sql querries on the partitioned table
Links
Flight Delay Analysis - 2008 (Bigdata)
The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.
Apache Hive is a data warehousing and SQL-like query engine built on top of Hadoop. Hadoop has Hadoop Distributed File System (HDFS). It can handle distributed storage and processing of the data in hand. Hive can handle billions of transactions. We can perform any kind of SQL Query without bothering whether aggregation functions or filter function will be ever completed or not. Hive can handle all CRUD operations.
Dataset contains airlines daily flight information like Origin,Dest, Distance,DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, TaxiIn, TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay. Airlines wanted to analyze the of last 20 years.
Links
Tech Stack: Hadoop/HDFS, Hive, SQL, HiveQL, ORC (Optimized Row Columnar) or Parquet, Python, Matplotlib/Seaborn.
Flight Delay and Cancellation Analysis - 2015
The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.
NYC Parking - 2008
NYC Parking - 2004-2005 (Bigdata and pySpark)
NYC Parking - 2015
NYC Parking - 2017
Driver Availablity Prediction
- Github Repo
- Colab
- Dataset & ping.csv
Uber Cancellation
Vehicle Classification
Vehicle Tracking
Entertainment, Games & Sports
This sector covers the creation/production, distribution, and consumption of media, including film, music, gaming, and live performances. It involves production houses, streaming services, and digital content platforms.
Movies-Recommendations
Olympic-QA-System-with-GPT
Media and Publication
This domain includes businesses involved in publishing content across print, digital, and broadcast formats. It covers books, news, newspapers, magazines, digital media platforms, and content creation and distribution.
Media+Publication-TWO - Talk with Osho
- Github Repo - This is an eduational GPT on ChatGPT. It is based on selected books of Osho. It is a prototype, because there is a limit of books loads on ChatGPT. In future when this constraints will be removed, this project will be updated with more books.
Media+Publication-TWSV - Talk with Swami Vivekananda
- Github Repo - This is Educational GPT on ChatGPT. It is based on the 8 volumes of complete works of Swami Vivekananda.
Youtube - to - MP3 - Text Transcription - Hindi and English
- Github Repo - In this project we are trying to transcribe English or Hindi language youtube video into Latin script. Multiple api’s and tools were tried to evalute the performace, costing and accuracy. It is WIP project and as technology upgrade happens I will keep updating this project.
Fakenews-Detection
HBQAS
NewsClassification-20Groups
Multiclass classification. Overall Test accuracy: 0.883809506893158
16 classes of news are : rec.sport.hockey ,rec.motorcycles ,rec.sport.baseball ,rec.autos ,talk.politics.guns ,talk.religion.misc ,sci.med ,sci.electronics ,sci.space ,sci.crypt ,misc.forsale ,comp.os.ms-windows.misc ,comp.graphics ,comp.sys.ibm.pc.hardware ,comp.windows.x ,comp.sys.mac.hardware ,soc.religion.christian ,talk.politics.mideast ,alt.atheism ,talk.politics.misc
Class 0:, precision: 0.46, recall: 0.50, fscore: 0.48, support: 12.00
Class 1:, precision: 0.83, recall: 0.52, fscore: 0.64, support: 29.00
Class 2:, precision: 0.62, recall: 0.69, fscore: 0.66, support: 29.00
Class 3:, precision: 0.67, recall: 0.52, fscore: 0.58, support: 31.00
Class 4:, precision: 0.85, recall: 0.68, fscore: 0.76, support: 25.00
Class 5:, precision: 0.89, recall: 0.92, fscore: 0.91, support: 26.00
Class 6:, precision: 0.89, recall: 0.69, fscore: 0.77, support: 35.00
Class 7:, precision: 0.90, recall: 0.89, fscore: 0.89, support: 108.00
Class 8:, precision: 0.88, recall: 0.97, fscore: 0.93, support: 117.00
Class 9:, precision: 0.94, recall: 0.97, fscore: 0.96, support: 121.00
Class 10:, precision: 0.95, recall: 0.99, fscore: 0.97, support: 124.00
Class 11:, precision: 1.00, recall: 0.97, fscore: 0.98, support: 31.00
Class 12:, precision: 0.67, recall: 0.73, fscore: 0.70, support: 45.00
Class 13:, precision: 0.75, recall: 0.88, fscore: 0.81, support: 43.00
Class 14:, precision: 0.81, recall: 0.92, fscore: 0.86, support: 38.00
Class 15:, precision: 0.92, recall: 0.55, fscore: 0.69, support: 20.00
Class 16:, precision: 0.87, recall: 0.97, fscore: 0.92, support: 103.00
Class 17:, precision: 0.93, recall: 0.93, fscore: 0.93, support: 14.00
Class 18:, precision: 1.00, recall: 0.39, fscore: 0.56, support: 18.00
Class 19:, precision: 0.91, recall: 0.89, fscore: 0.90, support: 81.00
SDSHL
Toxic-Comment
Twitter-Sentiment-Analysis
YELP-Review-Prediction
- Github Repo
- Colab - Fine_Tuning_Transformer_BERT_Customer_Review
- Colab - Yelp customer_review_classification
Sales
Sales is overlap of e-Commerce and Retail. To avoid the confusion anything which is related to sales of big items like Car, House or any Capital items are put in sales domain. They may online or via a physical shop.
House-Price-Prediction
CarPrice
CarSales
Lead-Conversion
Telecom
This domain covers telecommunications services and infrastructure, including phone networks, internet providers, satellite communications, and emerging technologies like 5G, enabling global connectivity and communication.
Telecom Churn
Public Safety and Security
This sector involves efforts to maintain public order, safety, and security in communities. It includes law enforcement, emergency services, disaster response, and security solutions for protecting people and assets.
Barcelona Accidents
Indian Judiciary - Verdict Dataset
Agriculture
This domain encompasses activities related to farming, crop production, livestock management, and the broader agricultural supply chain. It also includes agricultural technology (agritech), sustainable farming practices, and rural development.
Education
This sector covers educational institutions, e-learning platforms, and educational technology (edtech) designed to facilitate teaching, learning, and research. It spans K-12, higher education, corporate training, and continuous learning.
Infrastructure (Infra) Development
This domain encompasses the planning, design, and construction of physical facilities and systems, including transportation, telecommunications, water supply, and utilities essential for supporting economic activity.