Topic Modeling for Space Travel

NLP Project by Janet Lin

1. Install Required Packages

2. Loading Data

3. Data Processing

3.1 Deduplication
3.2 Remove Stopwords and Punctuation Then Lemmatize
3.3 Create Dictionary and Corpus

4. Training a LDA Base Model

5. Hyperparameter Tuning

5.1 on Number of Topics

Based on the tuning result, generating 18 topics produces the highest coherence score.

5.2 on Number of Passes

6. Train LDA model with parameters from the elbow method

7. Final Model

The elbow method provides an improved coherence score, however, the visualization shows that there are overlapping words within multiple clusters. To have refined topic clusters for this data set, we further enhance the model with 4 topic clusters to avoid overlaps.