Skip to main content
  1. Data Science Blog/

Topic Modeling with BERT

·225 words·2 mins· loading · ·
Natural Language Processing (NLP) Data Analysis & Visualization AI/ML Models Language Models (LLMs) NLP Applications Text Analysis Natural Language Processing (NLP) Machine Learning (ML) Text Mining

Topic Modeling with BERT

Topic Modeling with BERT
#

Key steps in BERTopic modelling are as following.

  • Use “Sentence Embedding” models to embed the sentences of the article
  • Reduce the dimensionality of embedding using UMAP
  • Cluster these documents (reduced dimensions) using HDBSAN
  • Use c-TF-IDF extract keywords, their frequency and IDF for each cluster.
  • MMR: Maximize Candidate Relevance. How many words in a topic can represent the topic?
  • Intertopic Distance Map
  • Use similarity matrix (heatmap), dandogram (hierarchical map), to visualize the topics and key_words.
  • Traction of topic over time period. Some may be irrelevant and for other traction may be increasing or decreasing.

Installation
#

# Installation, with sentence-transformers, can be done using pypi:

pip install bertopic

# If you want to install BERTopic with other embedding models, you can choose one of the following:

# Choose an embedding backend
pip install bertopic[flair, gensim, spacy, use]

# Topic modeling with images
pip install bertopic[vision]

Supported Topic Modelling Techniques
#

BERTopic supports all kinds of topic modeling techniques as below.

  • Guided
  • Supervised
  • Semi-supervised
  • Manual
  • Multi-topic distributions
  • Hierarchical
  • Class-based
  • Dynamic
  • Online/Incremental
  • Multimodal
  • Multi-aspect
  • Text Generation/LLM
  • Merge Models

Related Resources#

Tools in BERTopic
#

Tools-in-BERTopic

Best Topic Modeling Tool in BERTopic
#

BEST-Tools-in-BERTopic

BERTopic Model Building
#

BERTopic-Model-Building

Application
#

  • arXiv Dataset (1.7m+ STEP papers)
  • Images/photographs
  • Historical Documents
  • News articles

Related

Experimenting with Vertex AI: A Practical Guide from Account Setup to First Model Call
·4895 words·23 mins· loading
Cloud Computing Artificial Intelligence LLM Vertex AI Google Cloud Platform Gemini GCP Vertex AI Studio Model Garden IAM MLOps
Experimenting with Vertex AI: A Practical Guide from Account Setup to First Model Call # 1. …
Cursor Chat: Architecture, Data Flow & Storage
·1318 words·7 mins· loading
Artificial Intelligence Developer Tools Software Architecture Cursor IDE Cursor Chat AI Code Editor SQLite Turbopuffer Codebase Indexing RAG Semantic Search Data Flow Local Storage Composer
Cursor Chat: Architecture, Data Flow & Storage # This document explains how Cursor chat works …
Safeguarding PII When Using LLMs in Alternative Investment Banking
·4261 words·21 mins· loading
Artificial Intelligence Financial Technology Data Privacy PII Protection LLM Privacy Alternative Investment Banking BFSI Data Privacy AI Compliance Differential Privacy Federated Learning Financial AI Security
Safeguarding PII When Using LLMs in Alternative Investment Banking # 1. Introduction # The …
AI Hallucinations in BFSI - A Comprehensive Guide
·2975 words·14 mins· loading
Artificial Intelligence Financial Technology AI Hallucinations BFSI AI Implementation Financial AI Risk Management Banking AI Ethics RAG in Finance Knowledge Graphs BFSI LLM Risk Mitigation Financial AI Compliance
AI Hallucinations in the BFSI Domain - A Comprehensive Guide # Introduction # Artificial …
Roadmap to Reality
·990 words·5 mins· loading
Philosophy & Cognitive Science Interdisciplinary Topics Scientific Journey Self-Discovery Personal Growth Cosmic Perspective Human Evolution Technology Biology Neuroscience
Roadmap to Reality # A Scientific Journey to Know the Universe — and the Self # 🌱 Introduction: The …