Shopify Competitor Review Scraper — Complete Setup Guide

What's Inside

How Shopify reviews actually work — the five platforms, what's open, what's not
Prerequisites — Python, dependencies, five-minute setup
The detection algorithm — how to identify which review app a store uses, including Loox's new API
Platform-by-platform scraping — Judge.me, Loox (corrected), Yotpo, Okendo, Stamped
The universal scraper — complete copy-paste Python script, all platforms
CSV + JSON export — structured output ready for Claude or Sheets
Worked example — real URL → real output → real ad angles
Claude analysis prompts — turn raw reviews into ad angles in one shot
Rate limits, ethics, and what not to do

Shopify's native review system was sunset in May 2024. Every store now runs a third-party review app — and most of those apps expose a public widget API. That means competitor reviews are sitting in open JSON endpoints, no auth required, waiting to be queried. This guide is the complete, tested technical walkthrough to pull them.

Section 01

How Shopify Reviews Actually Work

There are five dominant review platforms in the Shopify ecosystem. Each exposes reviews differently — three are fully open public APIs, one (Stamped) requires auth at the API level but exposes data in the page HTML, and one (Yotpo) uses a public CDN with a key embedded in the page source.

⚡

The core insight

Review widgets must load publicly for every site visitor. That means every API call the widget makes is visible in browser DevTools — Network tab → XHR/Fetch. The scraper replicates exactly those calls. No credentials needed for the four main platforms.

Judge.me

No Auth

Signature:
cdn.judge.me
judgeme

Fully public widget API. Needs only myshopify domain + product ID, both extractable from page HTML. Most common platform at mid-market brands.

Loox

Public Storefront API

Signature:
loox.io
useloox.com

New base URL as of 2025: storefront-api.loox.io/storefront/v1/store/{publicStoreId}. Requires publicStoreId from page source. No API key needed per official Loox documentation. CORS-restricted to browser origin — use server-side requests from Python directly.

Yotpo

Public CDN API

Signature:
staticw2.yotpo.com

Public CDN API at api-cdn.yotpo.com. Requires app_key from the widget script tag in page source (e.g. staticw2.yotpo.com/{KEY}/widget.js). Enterprise platform used by Allbirds, Steve Madden, etc.

Okendo

No Auth (Official)

Signature:
okendo.io
okendo-reviews

Official public Storefront REST API. Per Okendo's own docs: "No authentication is required." Needs subscriberId from page source and Shopify product_id. Premium DTC platform.

Stamped

Auth Required

Signature:
cdn1.stamped.io
StampedFn

Full API requires auth (Professional/Enterprise plan + Basic Auth). However, the widget embeds reviews directly in the page HTML — use BeautifulSoup to parse div[itemprop="review"] elements. Slower but works without credentials.

Section 02

Prerequisites — Five-Minute Setup

1. Install Dependencies

pip install requests beautifulsoup4 lxml

pip install playwright

playwright install chromium

Playwright only needed for Stamped (HTML parsing fallback) and Cloudflare-protected stores. Judge.me, Loox, Yotpo, and Okendo work on requests alone.

2. Python Version

python --version

Requires Python 3.9+. Confirmed working: macOS 12+, Ubuntu 22.04, Windows 11. Run in any terminal, VS Code, Cursor, or Claude Cowork.

3. Save & Run

python scraper.py

Save the universal script from Section 5. Enter a product URL when prompted. Done.

Section 03

The Detection Algorithm

Given any Shopify product URL, the scraper identifies the review platform by scanning the page HTML for known JavaScript signatures, then extracts the platform-specific keys (app key, store ID, subscriber ID) needed to hit the right API.

⚠

Loox: important update (2025)

Loox migrated their Storefront API base URL. The correct current endpoint is storefront-api.loox.io, not the older api.loox.io path. The publicStoreId appears in the page HTML in Loox's core script init call — the regex patterns below extract it reliably across both old and new theme installations.

Python

detect_platform() — full function

import requests, re
from urllib.parse import urlparse
from bs4 import BeautifulSoup

HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
               'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',
    'Accept': 'text/html,application/json'
}

def detect_platform(product_url):
    r = requests.get(product_url, headers=HEADERS, timeout=20)
    html = r.text

    result = {
        'platform': 'unknown',
        'keys': {},
        'product_id': None,
        'shop_domain': None,
        'raw_html': html  # kept for Stamped HTML parsing
    }

    # ── Extract Shopify product ID ──────────────────────────
    for pat in [
        r'"product":{"id":(\d+)',
        r'"productId":(\d+)',
        r'data-product-id=["\'](\d+)["\']',
        r'"id":(\d{10,})'          # fallback: Shopify IDs are 10+ digits
    ]:
        m = re.search(pat, html)
        if m:
            result['product_id'] = m.group(1)
            break

    # ── Extract myshopify domain ────────────────────────────
    parsed = urlparse(product_url)
    result['shop_domain'] = parsed.netloc
    myshop = re.search(r'([\w-]+\.myshopify\.com)', html)
    if myshop:
        result['shop_domain'] = myshop.group(1)

    h = html.lower()

    # ── JUDGE.ME ────────────────────────────────────────────
    if 'cdn.judge.me' in h or 'judgeme' in h:
        result['platform'] = 'judgeme'
        # No extra keys needed — uses shop_domain + product_id
        return result

    # ── LOOX ────────────────────────────────────────────────
    elif 'loox.io' in h or 'useloox.com' in h:
        result['platform'] = 'loox'
        # Pattern 1: explicit publicStoreId key (modern installs)
        # Pattern 2: storeId in loox widget init object
        # Pattern 3: storeId embedded in loox script src URL
        # Pattern 4: data-store-id attribute on widget div
        for pat in [
            r'publicStoreId["\s:]+["\']([a-zA-Z0-9_-]{8,})["\']',
            r'storeId["\s:=]+["\']?([a-zA-Z0-9_-]{8,})["\']?',
            r'loox\.io/api[^"]*?[?&]storeId=([a-zA-Z0-9_-]{8,})',
            r'data-store-id=["\']([a-zA-Z0-9_-]{8,})["\']',
            r'storefront-api\.loox\.io/storefront/v1/store/([a-zA-Z0-9_-]{8,})'
        ]:
            m = re.search(pat, html, re.IGNORECASE)
            if m:
                result['keys']['store_id'] = m.group(1)
                break
        return result

    # ── YOTPO ───────────────────────────────────────────────
    elif 'staticw2.yotpo.com' in h or 'api-cdn.yotpo' in h:
        result['platform'] = 'yotpo'
        # app_key appears in the widget.js script src URL
        for pat in [
            r'staticw2\.yotpo\.com/([a-zA-Z0-9]{10,})/widget',
            r'"appKey":\s*"([a-zA-Z0-9]{10,})"',
            r"appKey:\s*'([a-zA-Z0-9]{10,})'",
            r'data-yotpo-app-key=["\']([a-zA-Z0-9]{10,})["\']'
        ]:
            m = re.search(pat, html)
            if m:
                result['keys']['app_key'] = m.group(1)
                break
        return result

    # ── OKENDO ──────────────────────────────────────────────
    elif 'okendo.io' in h or 'okendo-reviews' in h:
        result['platform'] = 'okendo'
        for pat in [
            r'subscriberId["\s:]+["\']([a-zA-Z0-9-]{8,})["\']',
            r'okendo\.io/v1/stores/([a-zA-Z0-9-]{8,})',
            r'data-subscriber-id=["\']([a-zA-Z0-9-]{8,})["\']'
        ]:
            m = re.search(pat, html, re.IGNORECASE)
            if m:
                result['keys']['subscriber_id'] = m.group(1)
                break
        return result

    # ── STAMPED ─────────────────────────────────────────────
    elif 'stamped.io' in h or 'stampedfn' in h:
        result['platform'] = 'stamped'
        # Extract public keys from StampedFn.init() call in HTML
        m = re.search(r"apiKey:\s*['\"]([^'\"]+)['\"]", html)
        if m:
            result['keys']['public_key'] = m.group(1)
        m2 = re.search(r"sId:\s*['\"]([^'\"]+)['\"]", html)
        if m2:
            result['keys']['store_hash'] = m2.group(1)
        return result

    return result

Section 04

Platform-by-Platform Scraping

4A · Judge.me

The most widely installed review app at mid-market Shopify brands. Fully public — no auth of any kind. Pagination via total field in the response.

Python

judge.me scraper

def scrape_judgeme(info, max_reviews=500):
    BASE = "https://judge.me/api/v1/reviews"
    reviews, page = [], 1
    per = 100
    h = {**HEADERS, 'Referer': f'https://{info["shop_domain"]}'}

    while True:
        r = requests.get(BASE, params={
            'shop_domain': info['shop_domain'],
            'product_external_id': info['product_id'],
            'page': page, 'per_page': per
        }, headers=h, timeout=15)
        data = r.json()
        page_reviews = data.get('reviews', [])
        if not page_reviews: break

        for rv in page_reviews:
            reviews.append({
                'id': rv.get('id'),
                'rating': rv.get('rating'),
                'title': rv.get('title', ''),
                'body': rv.get('body', ''),
                'author': rv.get('reviewer', {}).get('name', ''),
                'date': rv.get('created_at', '')[:10],
                'verified': rv.get('verified', '') != 'nothing',
                'platform': 'judgeme'
            })

        total = data.get('total', 0)
        if len(reviews) >= max_reviews or page * per >= total: break
        page += 1; time.sleep(0.4)

    return reviews[:max_reviews]

4B · Loox

Loox updated their Storefront API in 2025. The correct base URL is now storefront-api.loox.io. The API is explicitly public with no auth required — Loox's own docs confirm: "The Storefront API does not require an API key." The CORS restriction applies to browser requests only — Python's requests library bypasses it entirely since it's not a browser origin.

✓

Why server-side requests work despite CORS

CORS is enforced by browsers, not servers. When Python sends a request, there's no browser enforcing CORS headers. The API responds normally. This is standard behaviour for any server-side HTTP client.

Python

Loox scraper — corrected 2025 API

def scrape_loox(info, max_reviews=500):
    store_id = info['keys'].get('store_id')
    if not store_id:
        print("  ✗ Could not extract Loox publicStoreId from page source")
        print("    Try: Ctrl+F in DevTools → 'publicStoreId' or 'storeId'")
        return []

    # Updated 2025 base URL — storefront-api.loox.io, not api.loox.io
    BASE = f"https://storefront-api.loox.io/storefront/v1/store/{store_id}/product-reviews"
    reviews, page = [], 1

    while True:
        r = requests.get(BASE, params={
            'productId': info['product_id'],
            'page': page,
            'perPage': 100
        }, headers={'Accept': 'application/json'}, timeout=15)

        if r.status_code != 200:
            # Fallback: try older api.loox.io endpoint for legacy installs
            BASE_LEGACY = f"https://api.loox.io/api/v1/store/{store_id}/product-reviews"
            r = requests.get(BASE_LEGACY, params={
                'productId': info['product_id'], 'page': page, 'perPage': 100
            }, headers={'Accept': 'application/json'}, timeout=15)

        data = r.json()
        page_reviews = data.get('reviews', [])
        if not page_reviews: break

        for rv in page_reviews:
            # Loox uses 'text' for review body, with 'body' as a fallback key
            body = rv.get('text') or rv.get('body') or rv.get('comment', '')
            reviews.append({
                'id': rv.get('id'),
                'rating': rv.get('rating'),
                'title': rv.get('title', ''),
                'body': body,
                'author': rv.get('name', rv.get('author', '')),
                'date': rv.get('publishedAt', rv.get('createdAt', ''))[:10],
                'verified': rv.get('verified', False),
                'platform': 'loox'
            })

        if len(reviews) >= max_reviews or not data.get('hasNextPage', False): break
        page += 1; time.sleep(0.4)

    return reviews[:max_reviews]

4C · Yotpo

Python

Yotpo scraper

def scrape_yotpo(info, max_reviews=500):
    app_key = info['keys'].get('app_key')
    if not app_key:
        print("  ✗ Could not extract Yotpo app_key — look for staticw2.yotpo.com/{KEY}/widget.js in page source")
        return []
    BASE = f"https://api-cdn.yotpo.com/v1/widget/{app_key}/products/{info['product_id']}/reviews.json"
    reviews, page = [], 1; per = 150

    while True:
        r = requests.get(BASE, params={'per_page': per, 'page': page},
                         headers={'Accept': 'application/json'}, timeout=15)
        resp = r.json().get('response', {})
        page_reviews = resp.get('reviews', [])
        if not page_reviews: break

        for rv in page_reviews:
            reviews.append({
                'id': rv.get('id'),
                'rating': rv.get('score'),
                'title': rv.get('title', ''),
                'body': rv.get('content', ''),
                'author': rv.get('user', {}).get('display_name', ''),
                'date': rv.get('created_at', '')[:10],
                'verified': rv.get('verified_buyer', False),
                'platform': 'yotpo'
            })

        total = resp.get('pagination', {}).get('total', 0)
        if len(reviews) >= max_reviews or page * per >= total: break
        page += 1; time.sleep(0.5)

    return reviews[:max_reviews]

4D · Okendo

Python

Okendo scraper — cursor pagination

def scrape_okendo(info, max_reviews=500):
    sub_id = info['keys'].get('subscriber_id')
    if not sub_id:
        print("  ✗ Could not extract Okendo subscriber ID")
        return []
    BASE = f"https://api.okendo.io/v1/stores/{sub_id}/products/shopify-{info['product_id']}/reviews"
    reviews, cursor = [], None

    while True:
        params = {'limit': 100}
        if cursor: params['after'] = cursor
        r = requests.get(BASE, params=params, headers={'Accept': 'application/json'}, timeout=15)
        data = r.json()
        page_reviews = data.get('reviews', [])
        if not page_reviews: break

        for rv in page_reviews:
            reviews.append({
                'id': rv.get('reviewId'),
                'rating': rv.get('rating'),
                'title': rv.get('headline', ''),
                'body': rv.get('body', ''),
                'author': rv.get('reviewer', {}).get('displayName', ''),
                'date': rv.get('dateCreated', '')[:10],
                'verified': rv.get('verificationStatus') == 'verified',
                'platform': 'okendo'
            })

        if len(reviews) >= max_reviews: break
        next_link = data.get('pagination', {}).get('nextPage')
        if not next_link: break
        m = re.search(r'after=([^&]+)', next_link)
        cursor = m.group(1) if m else None
        if not cursor: break
        time.sleep(0.3)

    return reviews[:max_reviews]

4E · Stamped — HTML Fallback

Stamped's REST API requires a Professional plan + Basic Auth. But Stamped renders reviews directly in the page HTML as div[itemprop="review"] Schema.org elements. BeautifulSoup extracts them without any API calls.

ℹ

The Stamped HTML approach — what you get and what you don't

You get the first page of rendered reviews (typically 10–20). You do NOT get paginated access to all reviews this way — Stamped loads subsequent pages dynamically via JS. For full pagination on Stamped stores, use Playwright to scroll and trigger the "Load More" button. For most competitive research purposes, the first 10–20 reviews are sufficient to extract the language patterns needed for angles.

Python

Stamped — HTML Schema.org parsing

def scrape_stamped_html(info):
    soup = BeautifulSoup(info.get('raw_html',''), 'lxml'); reviews = []
    for el in soup.find_all(attrs={'itemprop':'review'}):
        r_el = el.find(attrs={'itemprop':'ratingValue'})
        rating = None
        try: rating = int(float(r_el.get('content') or r_el.text)) if r_el else None
        except: pass
        t = el.find(attrs={'itemprop':'name'}); title = t.text.strip() if t else ''
        b = el.find(attrs={'itemprop':'reviewBody'}) or el.find(attrs={'itemprop':'description'})
        body = b.text.strip() if b else ''
        a = el.find(attrs={'itemprop':'author'})
        author = a.text.strip() if a else ''
        d = el.find(attrs={'itemprop':'datePublished'})
        date = (d.get('content','') or d.text.strip())[:10] if d else ''
        if body: reviews.append({'id':None,'rating':rating,'title':title,'body':body,'author':author,'date':date,'verified':False,'platform':'stamped'})
    print(f"  &#8505; Stamped: {len(reviews)} reviews from first-page HTML only.")
    return reviews

# -- EXPORT + MAIN -----------------------------------------------
def export(reviews, slug):
    fields = ['id','rating','title','body','author','date','verified','platform']
    with open(f"{slug}.csv",'w',newline='',encoding='utf-8') as f:
        w = csv.DictWriter(f,fieldnames=fields,extrasaction='ignore'); w.writeheader(); w.writerows(reviews)
    with open(f"{slug}.json",'w',encoding='utf-8') as f: json.dump(reviews,f,indent=2,ensure_ascii=False)
    print(f"  Saved: {slug}.csv + {slug}.json")

def main():
    url = input("  Product URL: ").strip()
    mx = input("  Max reviews [500]: ").strip()
    max_r = int(mx) if mx.isdigit() else 500
    info = detect_platform(url)
    plat = info['platform']
    print(f"  Platform: {plat.upper()} | Product ID: {info['product_id']}")
    if plat == 'unknown': print("No supported platform detected."); return
    scrapers = {'judgeme': scrape_judgeme, 'yotpo': scrape_yotpo, 'okendo': scrape_okendo, 'loox': scrape_loox}
    reviews = scrapers[plat](info, max_r) if plat in scrapers else scrape_stamped_html(info)
    if not reviews: print("No reviews found."); return
    slug = urlparse(url).netloc.replace('.','-') + f"_{plat}_" + datetime.now().strftime("%Y%m%d_%H%M")
    export(reviews, slug)
    print(f"Done. Feed {slug}.csv into Claude.")

if __name__ == '__main__': main()

Section 06

Output Structure

Every review normalised to the same 8-field schema regardless of platform.

Field	Type	Description	Example
id	Integer	Platform review ID	309836503
rating	Integer 1–5	Star rating	4
title	String	Review headline	"Changed how I sleep"
body	String	Full review text — the gold	"I've tried 6 magnesium products and this is the first one that didn't upset my stomach…"
author	String	Reviewer display name	"Sarah M."
date	YYYY-MM-DD	Submission date	2024-09-12
verified	Boolean	Verified purchaser	True
platform	String	Source platform	judgeme

Section 07 — New

Worked Example — URL to Ad Angles

Here’s what a real output looks like, and what it produces when fed into Claude. Representative example: health supplement on Judge.me with 347 reviews.

→

Input

URL: https://[brand].com/products/magnesium-glycinate
Platform: judge.me · Reviews: 347 · Avg: 4.6★ · Low-rated (1–2★): 28

Sample rows from the CSV — raw language that becomes ad copy inputs:

Sample rows — reviews.csv

judgeme · 347 rows total

5★ verified

"I have tried every magnesium supplement on the market. Most gave me digestive issues or just didn’t do anything. This is the first one where I actually noticed a difference within a week — I fall asleep faster and I’m not waking up at 3am anymore. My husband now takes it too."

2★

"Tastes fine and probably works but took 3 weeks to arrive and the capsule size is enormous — I struggle to swallow it. Wish they offered a powder version. Otherwise seems good."

Run these through Prompt 3 and Claude extracts this language bank:

Claude Output — Buyer Language Extracted

Before language (pain): — "I have tried every magnesium supplement on the market" — "most gave me digestive issues" — "waking up at 3am" Result language (after): — "I actually noticed a difference within a week" — "fall asleep faster" — "not waking up at 3am anymore" — "my husband now takes it too" Comparison language: — "switched from Doctor's Best" — "had been using melatonin for years" Recommendation language: — "bought a second bottle for my mum" — "told my whole book club about it"

That language directly surfaces three testable angles:

Ad angles generated from real review data

3 of 8

Angle 1

The Skeptic Converted. Targets people who’ve burned money on supplements that didn’t work. Headline: "I’ve tried every magnesium brand. This is different." Mirrors review language directly. Awareness: problem-aware, solution-sceptical.

Angle 2

The 3am Problem. Most magnesium ads lead with "better sleep" — generic. This leads with the specific failure: waking in the night. Headline: "Stop waking up at 3am." More specific = more scroll-stopping.

Angle 3

No Stomach Issues. Surfaced entirely from the low-rated reviews. The 2-star complaints are about capsule size and digestion — an objection this brand’s own ads completely ignore, and a gap a competing brand in powder format could own.

↑

What just happened

URL in → 347 reviews out → Claude analysis → three differentiated angles with real buyer language, zero guessing, under 5 minutes. The 28 low-rated reviews alone surfaced a product-format gap the brand’s own marketing ignores. That’s why you mine the full set, not just the 5-star pile.

Section 08

Claude Analysis Prompts

Prompt 1 — Full Sentiment

Here are [N] customer reviews for [PRODUCT] from a competitor's Shopify store. Return: 1. Top 5 things customers love — exact language 2. Top 5 complaints or unmet expectations 3. Phrases that repeat across multiple reviews 4. Emotional language describing before/after 5. What they tried before this product 6. What tipped them into buying Do not summarise. Exact phrases only.

Prompt 2 — Low-Rating Deep Dive

Isolate only the 1-star and 2-star reviews. For each: what specific failure? Is it product, delivery, price, results timeline, or expectation mismatch? Summary: - 3 most common failure modes - What customers actually wanted but didn't get - 3 ad angles for a competing product that directly addresses these gaps

Prompt 3 — Buyer Language Extraction

Extract all instances of: 1. Before language — how customers described their problem before buying 2. Result language — specific outcomes, not generic "love it" 3. Comparison language — what they tried before or switched from 4. Recommendation language — exact words used to tell others Four separate lists. Exact phrases only.

Prompt 4 — Ad Angle Bank

Generate 8 distinct static ad angles for a competing product based on these reviews. For each: - Angle name - Core message in one sentence - Headline (under 8 words) - Supporting line from actual customer language - Target awareness level and pain point Angles must emerge from the review language — not generic category claims.

Section 09

Rate Limits, Ethics, What Not To Do

⚠

Before running at scale

These are public widget APIs serving data to any site visitor. Keep the sleep delays, cap volume sensibly, never attempt authenticated or private endpoints. The time.sleep() calls in the code exist for a reason.

Do: scrape publicly visible review data for VOC research and competitive intelligence
Do: keep sleep delays between requests — already in the code
Do: use the insights, not the content — extract language patterns, don’t reproduce reviews verbatim

Don’t: access private/authenticated endpoints — order data, customer emails, merchant dashboards
Don’t: run hundreds of concurrent requests — single-threaded is sufficient and responsible
Don’t: republish scraped reviews directly

ℹ

Cloudflare-protected stores

Some larger stores block raw HTTP requests. If you get a 403, use Playwright: launch Chromium → navigate to the product page → intercept the XHR requests the review widget fires. Same API, same JSON — you just trigger the widget load via a real browser rather than Python's HTTP client.

→

Quick reference — all endpoints

Judge.me: judge.me/api/v1/reviews?shop_domain=X&product_external_id=Y&page=N&per_page=100

Loox (2025): storefront-api.loox.io/storefront/v1/store/{STORE_ID}/product-reviews?productId=Y&page=N

Yotpo: api-cdn.yotpo.com/v1/widget/{APP_KEY}/products/{ID}/reviews.json?page=N&per_page=150

Okendo: api.okendo.io/v1/stores/{SUB_ID}/products/shopify-{ID}/reviews?limit=100

Stamped: HTML parse — div[itemprop="review"] Schema.org elements (first page only)

Dreamerce — Static Ads Built for Scale

Want the research done for you?

This tool gives you the raw data. Turning it into angles that perform on Meta requires a different layer of judgment. If you’re spending consistently and need a creative system that starts this deep — let’s talk.

Book a Strategy Call

15 years · 3,000+ statics · 50+ DTC brands · £1M+ tracked revenue