11 minute read

AI/ML - Industries - Developed/ Created/ Expanded work

Projects in this section are listed according to Industry/Business Domain.

Sometimes it is difficult for me to find out in what domain a particular project falls into, therefore I have created this page to decide the domain.


BFSI (Banking, Financial Services, and Insurance)

BFSI includes financial institutions, banks, insurance companies, investment firms, and other entities offering services such as lending, investment, wealth management, and financial protection. This sector is heavily regulated and technology-driven for security and risk management.

Credit-Fraud-Detection

DoeJones-Prediction-with-News

Loan-Approval


HR (Human Resources)

This domain covers employee management, recruitment, training, compensation, and workplace culture. It also includes HR technology and services related to people management and organizational development.

HR Analysis of Employee Attrition & Performance


Health

The health domain includes healthcare providers, hospitals, pharmaceuticals, health insurance, and healthcare technology focused on improving patient care, medical research, and public health initiatives. This vertical will not include project related to Health-Infra development.

Liver Patient Analysis

Breast-Cancer-Prediction

Chest-XRay - Effusion Segmentation

Chest-XRay - Effusion Classification

Covid-worldwide-EDA

India-Covid-Graphs

Malaria-Detection_dep

pnemonia_prediction


Energy

This domain involves the production, management, and distribution of energy, including fossil fuels, renewable energy (solar, wind, hydro), nuclear power, and energy conservation technologies, along with grid management.

UK-Energy-Consumption

AirQuality-Prediction


Climate

Climate and energy are inter-related, therefore to avoid confusion any project related to Energy will not come in climate vertical. This domain focuses on climate science, environmental monitoring, and sustainability initiatives, including research and development on climate change, renewable energy, environmental policy, and green technologies to reduce carbon footprints

Acea Smart Water Analytics & Prediction

Objective

  • The Acea Group deals with four different type of waterbodies: water spring (for which three datasets are provided), lake (for which a dataset is provided), river (for which a dataset is provided) and aquifers (for which four datasets are provided).
  • This competition uses nine different datasets, completely independent and not linked to each other. Each dataset can represent a different kind of waterbody. As each waterbody is different from the other, the related features as well are different from each other.
  • It is of the utmost importance to notice that some features like rainfall and temperature, which are present in each dataset, don’t go alongside the date. Indeed, both rainfall and temperature affect features like level, flow, depth to groundwater and hydrometry some time after it fell down. This means, for instance, that rain fell on 1st January doesn’t affect the mentioned features right the same day but some time later. As we don’t know how many days/weeks/months later rainfall affects these features, this is another aspect to keep into consideration when analyzing the dataset.

eCommerce

The e-commerce domain comprises online platforms and businesses that facilitate buying and selling goods and services over the internet. It includes marketplaces, payment processing, logistics, and digital retailing.

Black Friday Sales Data Analysis Prediction

Amazon Sentiment Analysis

Bigdata-AmazonReviews

Online-Retail-Customer-Clustering

Recommendation System Amazon Electronics


Economics and International Trade

This field involves the study and application of economic theories, policies, and data analysis to understand markets, consumer behavior, global trade, and financial trends. It serves as the foundation for economic research, policy-making, and financial planning.

Economy-Analysis

Prosperity-Clustering

Marine Consultant - GOI

  • Github Repo - This is a GPT on chatGPT prototype which helps them planning strategy for bilateral or multilateral engagements with other countries.

Electronics

The electronics domain includes the design, manufacturing, and distribution of electronic devices and components, such as semiconductors, consumer electronics, computing hardware, and embedded systems.

Hand Gesture Recognition


Industrial Safety

Industrial safety focuses on workplace safety standards, risk management, and protocols to protect employees and prevent accidents in industrial environments. It includes safety training, hazard assessments, and regulatory compliance.

Industrial Accident Cause Analysis

OSHA Accidents and Injury


Tourism, Hospitality, Hotel, Restaurant and Event Management

The hospitality domain involves businesses that provide accommodation, food, and leisure services, such as hotels, resorts, restaurants, and cafes, focusing on guest experiences and comfort.

Zomato Review

Indian Food Item Recommendations in Restaurants

FoodDemand Forcast


Travel & Logistic

The Travel & Logistics domain encompasses the movement of people and goods. It includes various industries such as transportation, warehousing, distribution, and supply chain management for both individuals and businesses. The focus in this domain is on efficient, timely, and cost-effective transport, as well as providing seamless travel experiences. This sector is heavily influenced by technology for tracking, route optimization, and resource management. This domain has some overlap with eCommerce and Sales.

Flight Delay Analysis using Hive

This dataset contains the 2004-2005 flights data from the 2009 ASA Statistical Computing and Graphics Data Expo consisted of flight arrival and departure details for all commercial flights on major carriers within the United States of America from October 1987 to April 2008

Activities (Pipeline) in project:

  • Creating hive table (for storage) from the external files
  • Create partition table schema
  • Parition hive table based on the year and putting data in partition table.
  • Performing sql querries on the partitioned table

Links

Flight Delay Analysis - 2008 (Bigdata)

The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.

Apache Hive is a data warehousing and SQL-like query engine built on top of Hadoop. Hadoop has Hadoop Distributed File System (HDFS). It can handle distributed storage and processing of the data in hand. Hive can handle billions of transactions. We can perform any kind of SQL Query without bothering whether aggregation functions or filter function will be ever completed or not. Hive can handle all CRUD operations.

Dataset contains airlines daily flight information like Origin,Dest, Distance,DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, TaxiIn, TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay. Airlines wanted to analyze the of last 20 years.

Links

Tech Stack: Hadoop/HDFS, Hive, SQL, HiveQL, ORC (Optimized Row Columnar) or Parquet, Python, Matplotlib/Seaborn.

Flight Delay and Cancellation Analysis - 2015

The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.

NYC Parking - 2008

NYC Parking - 2004-2005 (Bigdata and pySpark)

NYC Parking - 2015

NYC Parking - 2017

Driver Availablity Prediction

Uber Cancellation

Vehicle Classification

Vehicle Tracking


Entertainment, Games & Sports

This sector covers the creation/production, distribution, and consumption of media, including film, music, gaming, and live performances. It involves production houses, streaming services, and digital content platforms.

Movies-Recommendations

Olympic-QA-System-with-GPT


Media and Publication

This domain includes businesses involved in publishing content across print, digital, and broadcast formats. It covers books, news, newspapers, magazines, digital media platforms, and content creation and distribution.

Media+Publication-TWO - Talk with Osho

  • Github Repo - This is an eduational GPT on ChatGPT. It is based on selected books of Osho. It is a prototype, because there is a limit of books loads on ChatGPT. In future when this constraints will be removed, this project will be updated with more books.

Media+Publication-TWSV - Talk with Swami Vivekananda

  • Github Repo - This is Educational GPT on ChatGPT. It is based on the 8 volumes of complete works of Swami Vivekananda.

Youtube - to - MP3 - Text Transcription - Hindi and English

  • Github Repo - In this project we are trying to transcribe English or Hindi language youtube video into Latin script. Multiple api’s and tools were tried to evalute the performace, costing and accuracy. It is WIP project and as technology upgrade happens I will keep updating this project.

Fakenews-Detection

HBQAS

NewsClassification-20Groups

Multiclass classification. Overall Test accuracy: 0.883809506893158
16 classes of news are : rec.sport.hockey ,rec.motorcycles ,rec.sport.baseball ,rec.autos ,talk.politics.guns ,talk.religion.misc ,sci.med ,sci.electronics ,sci.space ,sci.crypt ,misc.forsale ,comp.os.ms-windows.misc ,comp.graphics ,comp.sys.ibm.pc.hardware ,comp.windows.x ,comp.sys.mac.hardware ,soc.religion.christian ,talk.politics.mideast ,alt.atheism ,talk.politics.misc

Class 0:, precision: 0.46, recall: 0.50, fscore: 0.48, support: 12.00
Class 1:, precision: 0.83, recall: 0.52, fscore: 0.64, support: 29.00
Class 2:, precision: 0.62, recall: 0.69, fscore: 0.66, support: 29.00
Class 3:, precision: 0.67, recall: 0.52, fscore: 0.58, support: 31.00
Class 4:, precision: 0.85, recall: 0.68, fscore: 0.76, support: 25.00
Class 5:, precision: 0.89, recall: 0.92, fscore: 0.91, support: 26.00
Class 6:, precision: 0.89, recall: 0.69, fscore: 0.77, support: 35.00
Class 7:, precision: 0.90, recall: 0.89, fscore: 0.89, support: 108.00
Class 8:, precision: 0.88, recall: 0.97, fscore: 0.93, support: 117.00
Class 9:, precision: 0.94, recall: 0.97, fscore: 0.96, support: 121.00
Class 10:, precision: 0.95, recall: 0.99, fscore: 0.97, support: 124.00
Class 11:, precision: 1.00, recall: 0.97, fscore: 0.98, support: 31.00
Class 12:, precision: 0.67, recall: 0.73, fscore: 0.70, support: 45.00
Class 13:, precision: 0.75, recall: 0.88, fscore: 0.81, support: 43.00
Class 14:, precision: 0.81, recall: 0.92, fscore: 0.86, support: 38.00
Class 15:, precision: 0.92, recall: 0.55, fscore: 0.69, support: 20.00
Class 16:, precision: 0.87, recall: 0.97, fscore: 0.92, support: 103.00
Class 17:, precision: 0.93, recall: 0.93, fscore: 0.93, support: 14.00
Class 18:, precision: 1.00, recall: 0.39, fscore: 0.56, support: 18.00
Class 19:, precision: 0.91, recall: 0.89, fscore: 0.90, support: 81.00

SDSHL

Toxic-Comment

Twitter-Sentiment-Analysis

YELP-Review-Prediction


Sales

Sales is overlap of e-Commerce and Retail. To avoid the confusion anything which is related to sales of big items like Car, House or any Capital items are put in sales domain. They may online or via a physical shop.

House-Price-Prediction

CarPrice

CarSales

Lead-Conversion


Telecom

This domain covers telecommunications services and infrastructure, including phone networks, internet providers, satellite communications, and emerging technologies like 5G, enabling global connectivity and communication.

Telecom Churn


Public Safety and Security

This sector involves efforts to maintain public order, safety, and security in communities. It includes law enforcement, emergency services, disaster response, and security solutions for protecting people and assets.

Barcelona Accidents

Indian Judiciary - Verdict Dataset


Agriculture

This domain encompasses activities related to farming, crop production, livestock management, and the broader agricultural supply chain. It also includes agricultural technology (agritech), sustainable farming practices, and rural development.

Education

This sector covers educational institutions, e-learning platforms, and educational technology (edtech) designed to facilitate teaching, learning, and research. It spans K-12, higher education, corporate training, and continuous learning.

Infrastructure (Infra) Development

This domain encompasses the planning, design, and construction of physical facilities and systems, including transportation, telecommunications, water supply, and utilities essential for supporting economic activity.


Datasets

Updated: