Analysis of PureGym Reviews: Data Cleaning, Sentiment Analysis, and Topic Modeling

With over 2 million members and 600 gyms across the UK, Denmark, and Switzerland, PureGym has established itself as one of the world’s leading value fitness operators. Since its founding in 2008, the company has appealed to a broad customer base by offering flexible, affordable, and high-quality fitness facilities. Key to its success is a customer-centric approach—emphasizing no fixed-term contracts, 24/7 access, and a welcoming environment that prioritizes the member experience.

To stay ahead in a competitive market increasingly focused on value-for-money memberships, PureGym continues to invest in understanding what drives member engagement and satisfaction. By leveraging innovative technologies and listening to customer feedback, the company aims to enhance the gym experience while staying true to its mission: inspiring a healthier world by providing affordable access to the benefits of being healthy.

In this post, we dive into an analysis of PureGym reviews to uncover insights about member sentiment, key topics of discussion, and areas for potential improvement. Using data cleaning techniques, sentiment analysis, and topic modeling, we aim to better understand how members perceive the PureGym experience—and how that feedback can shape the future of fitness.

Note: I can’t make the data available due to a non-disclosure agreement (NDA).

  • Business Context
  • Jupyter Notebook
  • Report

📊 PureGym Customer Review Analysis

Natural Language Processing | Data Science | Business Insights

Overview:

As part of a data-driven initiative to understand customer experience, I analyzed over 39,000 real-world reviews from Google and Trustpilot for PureGym, one of the largest value fitness operators globally. The goal was to uncover actionable insights using advanced NLP techniques, sentiment and emotion analysis, and topic modeling.

🔧 Tools & Technologies

Python · Pandas · BERTopic · Gensim LDA · Regex · Matplotlib · Seaborn · LLMs (Phi Model) · WordCloud

🧠 Key Features & Deliverables

  • Data Cleaning & Preparation: Processed and cleaned two large datasets, standardizing text, handling nulls, and preparing structured inputs for NLP pipelines.
  • Exploratory Data Analysis: Identified frequently used terms and visualized themes using word clouds and frequency plots.
  • Topic Modeling: Applied BERTopic and LDA to extract dominant themes from both positive and negative reviews, helping to surface customer pain points.
  • Emotion Detection: Filtered reviews containing strong negative emotions (e.g., anger), and ran specialized topic modeling to understand root causes of dissatisfaction.
  • Location-Based Review Mapping: Tracked negative reviews by gym location to help identify operational hotspots.
  • LLM Integration: Used PHI Model to:
    • Automatically summarize and extract topics from individual reviews
    • Generate actionable suggestions for improving customer experience

✅ Impact

  • Delivered a detailed insights report summarizing patterns in customer sentiment, emotional trends, and location-based feedback.
  • Provided strategic recommendations to improve service quality, member engagement, and operational efficiency based on real customer voices.
  • Demonstrated the business value of combining NLP with LLMs for scalable, AI-powered customer insight.

Leave a Reply