What's Inside
- How Shopify reviews actually work — the five platforms, what's open, what's not
- Prerequisites — Python, dependencies, five-minute setup
- The detection algorithm — how to identify which review app a store uses, including Loox's new API
- Platform-by-platform scraping — Judge.me, Loox (corrected), Yotpo, Okendo, Stamped
- The universal scraper — complete copy-paste Python script, all platforms
- CSV + JSON export — structured output ready for Claude or Sheets
- Worked example — real URL → real output → real ad angles
- Claude analysis prompts — turn raw reviews into ad angles in one shot
- Rate limits, ethics, and what not to do
Shopify's native review system was sunset in May 2024. Every store now runs a third-party review app — and most of those apps expose a public widget API. That means competitor reviews are sitting in open JSON endpoints, no auth required, waiting to be queried. This guide is the complete, tested technical walkthrough to pull them.
How Shopify Reviews Actually Work
There are five dominant review platforms in the Shopify ecosystem. Each exposes reviews differently — three are fully open public APIs, one (Stamped) requires auth at the API level but exposes data in the page HTML, and one (Yotpo) uses a public CDN with a key embedded in the page source.
Review widgets must load publicly for every site visitor. That means every API call the widget makes is visible in browser DevTools — Network tab → XHR/Fetch. The scraper replicates exactly those calls. No credentials needed for the four main platforms.
cdn.judge.me
judgeme
loox.io
useloox.com
staticw2.yotpo.com
okendo.io
okendo-reviews
cdn1.stamped.io
StampedFn
Prerequisites — Five-Minute Setup
1. Install Dependencies
Playwright only needed for Stamped (HTML parsing fallback) and Cloudflare-protected stores. Judge.me, Loox, Yotpo, and Okendo work on requests alone.
2. Python Version
Requires Python 3.9+. Confirmed working: macOS 12+, Ubuntu 22.04, Windows 11. Run in any terminal, VS Code, Cursor, or Claude Cowork.
3. Save & Run
Save the universal script from Section 5. Enter a product URL when prompted. Done.
The Detection Algorithm
Given any Shopify product URL, the scraper identifies the review platform by scanning the page HTML for known JavaScript signatures, then extracts the platform-specific keys (app key, store ID, subscriber ID) needed to hit the right API.
Loox migrated their Storefront API base URL. The correct current endpoint is storefront-api.loox.io, not the older api.loox.io path. The publicStoreId appears in the page HTML in Loox's core script init call — the regex patterns below extract it reliably across both old and new theme installations.
import requests, re
from urllib.parse import urlparse
from bs4 import BeautifulSoup
HEADERS = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/json'
}
def detect_platform(product_url):
r = requests.get(product_url, headers=HEADERS, timeout=20)
html = r.text
result = {
'platform': 'unknown',
'keys': {},
'product_id': None,
'shop_domain': None,
'raw_html': html # kept for Stamped HTML parsing
}
# ── Extract Shopify product ID ──────────────────────────
for pat in [
r'"product":{"id":(\d+)',
r'"productId":(\d+)',
r'data-product-id=["\'](\d+)["\']',
r'"id":(\d{10,})' # fallback: Shopify IDs are 10+ digits
]:
m = re.search(pat, html)
if m:
result['product_id'] = m.group(1)
break
# ── Extract myshopify domain ────────────────────────────
parsed = urlparse(product_url)
result['shop_domain'] = parsed.netloc
myshop = re.search(r'([\w-]+\.myshopify\.com)', html)
if myshop:
result['shop_domain'] = myshop.group(1)
h = html.lower()
# ── JUDGE.ME ────────────────────────────────────────────
if 'cdn.judge.me' in h or 'judgeme' in h:
result['platform'] = 'judgeme'
# No extra keys needed — uses shop_domain + product_id
return result
# ── LOOX ────────────────────────────────────────────────
elif 'loox.io' in h or 'useloox.com' in h:
result['platform'] = 'loox'
# Pattern 1: explicit publicStoreId key (modern installs)
# Pattern 2: storeId in loox widget init object
# Pattern 3: storeId embedded in loox script src URL
# Pattern 4: data-store-id attribute on widget div
for pat in [
r'publicStoreId["\s:]+["\']([a-zA-Z0-9_-]{8,})["\']',
r'storeId["\s:=]+["\']?([a-zA-Z0-9_-]{8,})["\']?',
r'loox\.io/api[^"]*?[?&]storeId=([a-zA-Z0-9_-]{8,})',
r'data-store-id=["\']([a-zA-Z0-9_-]{8,})["\']',
r'storefront-api\.loox\.io/storefront/v1/store/([a-zA-Z0-9_-]{8,})'
]:
m = re.search(pat, html, re.IGNORECASE)
if m:
result['keys']['store_id'] = m.group(1)
break
return result
# ── YOTPO ───────────────────────────────────────────────
elif 'staticw2.yotpo.com' in h or 'api-cdn.yotpo' in h:
result['platform'] = 'yotpo'
# app_key appears in the widget.js script src URL
for pat in [
r'staticw2\.yotpo\.com/([a-zA-Z0-9]{10,})/widget',
r'"appKey":\s*"([a-zA-Z0-9]{10,})"',
r"appKey:\s*'([a-zA-Z0-9]{10,})'",
r'data-yotpo-app-key=["\']([a-zA-Z0-9]{10,})["\']'
]:
m = re.search(pat, html)
if m:
result['keys']['app_key'] = m.group(1)
break
return result
# ── OKENDO ──────────────────────────────────────────────
elif 'okendo.io' in h or 'okendo-reviews' in h:
result['platform'] = 'okendo'
for pat in [
r'subscriberId["\s:]+["\']([a-zA-Z0-9-]{8,})["\']',
r'okendo\.io/v1/stores/([a-zA-Z0-9-]{8,})',
r'data-subscriber-id=["\']([a-zA-Z0-9-]{8,})["\']'
]:
m = re.search(pat, html, re.IGNORECASE)
if m:
result['keys']['subscriber_id'] = m.group(1)
break
return result
# ── STAMPED ─────────────────────────────────────────────
elif 'stamped.io' in h or 'stampedfn' in h:
result['platform'] = 'stamped'
# Extract public keys from StampedFn.init() call in HTML
m = re.search(r"apiKey:\s*['\"]([^'\"]+)['\"]", html)
if m:
result['keys']['public_key'] = m.group(1)
m2 = re.search(r"sId:\s*['\"]([^'\"]+)['\"]", html)
if m2:
result['keys']['store_hash'] = m2.group(1)
return result
return result
Platform-by-Platform Scraping
4A · Judge.me
The most widely installed review app at mid-market Shopify brands. Fully public — no auth of any kind. Pagination via total field in the response.
def scrape_judgeme(info, max_reviews=500):
BASE = "https://judge.me/api/v1/reviews"
reviews, page = [], 1
per = 100
h = {**HEADERS, 'Referer': f'https://{info["shop_domain"]}'}
while True:
r = requests.get(BASE, params={
'shop_domain': info['shop_domain'],
'product_external_id': info['product_id'],
'page': page, 'per_page': per
}, headers=h, timeout=15)
data = r.json()
page_reviews = data.get('reviews', [])
if not page_reviews: break
for rv in page_reviews:
reviews.append({
'id': rv.get('id'),
'rating': rv.get('rating'),
'title': rv.get('title', ''),
'body': rv.get('body', ''),
'author': rv.get('reviewer', {}).get('name', ''),
'date': rv.get('created_at', '')[:10],
'verified': rv.get('verified', '') != 'nothing',
'platform': 'judgeme'
})
total = data.get('total', 0)
if len(reviews) >= max_reviews or page * per >= total: break
page += 1; time.sleep(0.4)
return reviews[:max_reviews]
4B · Loox
Loox updated their Storefront API in 2025. The correct base URL is now storefront-api.loox.io. The API is explicitly public with no auth required — Loox's own docs confirm: "The Storefront API does not require an API key." The CORS restriction applies to browser requests only — Python's requests library bypasses it entirely since it's not a browser origin.
CORS is enforced by browsers, not servers. When Python sends a request, there's no browser enforcing CORS headers. The API responds normally. This is standard behaviour for any server-side HTTP client.
def scrape_loox(info, max_reviews=500):
store_id = info['keys'].get('store_id')
if not store_id:
print(" ✗ Could not extract Loox publicStoreId from page source")
print(" Try: Ctrl+F in DevTools → 'publicStoreId' or 'storeId'")
return []
# Updated 2025 base URL — storefront-api.loox.io, not api.loox.io
BASE = f"https://storefront-api.loox.io/storefront/v1/store/{store_id}/product-reviews"
reviews, page = [], 1
while True:
r = requests.get(BASE, params={
'productId': info['product_id'],
'page': page,
'perPage': 100
}, headers={'Accept': 'application/json'}, timeout=15)
if r.status_code != 200:
# Fallback: try older api.loox.io endpoint for legacy installs
BASE_LEGACY = f"https://api.loox.io/api/v1/store/{store_id}/product-reviews"
r = requests.get(BASE_LEGACY, params={
'productId': info['product_id'], 'page': page, 'perPage': 100
}, headers={'Accept': 'application/json'}, timeout=15)
data = r.json()
page_reviews = data.get('reviews', [])
if not page_reviews: break
for rv in page_reviews:
# Loox uses 'text' for review body, with 'body' as a fallback key
body = rv.get('text') or rv.get('body') or rv.get('comment', '')
reviews.append({
'id': rv.get('id'),
'rating': rv.get('rating'),
'title': rv.get('title', ''),
'body': body,
'author': rv.get('name', rv.get('author', '')),
'date': rv.get('publishedAt', rv.get('createdAt', ''))[:10],
'verified': rv.get('verified', False),
'platform': 'loox'
})
if len(reviews) >= max_reviews or not data.get('hasNextPage', False): break
page += 1; time.sleep(0.4)
return reviews[:max_reviews]
4C · Yotpo
def scrape_yotpo(info, max_reviews=500):
app_key = info['keys'].get('app_key')
if not app_key:
print(" ✗ Could not extract Yotpo app_key — look for staticw2.yotpo.com/{KEY}/widget.js in page source")
return []
BASE = f"https://api-cdn.yotpo.com/v1/widget/{app_key}/products/{info['product_id']}/reviews.json"
reviews, page = [], 1; per = 150
while True:
r = requests.get(BASE, params={'per_page': per, 'page': page},
headers={'Accept': 'application/json'}, timeout=15)
resp = r.json().get('response', {})
page_reviews = resp.get('reviews', [])
if not page_reviews: break
for rv in page_reviews:
reviews.append({
'id': rv.get('id'),
'rating': rv.get('score'),
'title': rv.get('title', ''),
'body': rv.get('content', ''),
'author': rv.get('user', {}).get('display_name', ''),
'date': rv.get('created_at', '')[:10],
'verified': rv.get('verified_buyer', False),
'platform': 'yotpo'
})
total = resp.get('pagination', {}).get('total', 0)
if len(reviews) >= max_reviews or page * per >= total: break
page += 1; time.sleep(0.5)
return reviews[:max_reviews]
4D · Okendo
def scrape_okendo(info, max_reviews=500):
sub_id = info['keys'].get('subscriber_id')
if not sub_id:
print(" ✗ Could not extract Okendo subscriber ID")
return []
BASE = f"https://api.okendo.io/v1/stores/{sub_id}/products/shopify-{info['product_id']}/reviews"
reviews, cursor = [], None
while True:
params = {'limit': 100}
if cursor: params['after'] = cursor
r = requests.get(BASE, params=params, headers={'Accept': 'application/json'}, timeout=15)
data = r.json()
page_reviews = data.get('reviews', [])
if not page_reviews: break
for rv in page_reviews:
reviews.append({
'id': rv.get('reviewId'),
'rating': rv.get('rating'),
'title': rv.get('headline', ''),
'body': rv.get('body', ''),
'author': rv.get('reviewer', {}).get('displayName', ''),
'date': rv.get('dateCreated', '')[:10],
'verified': rv.get('verificationStatus') == 'verified',
'platform': 'okendo'
})
if len(reviews) >= max_reviews: break
next_link = data.get('pagination', {}).get('nextPage')
if not next_link: break
m = re.search(r'after=([^&]+)', next_link)
cursor = m.group(1) if m else None
if not cursor: break
time.sleep(0.3)
return reviews[:max_reviews]
4E · Stamped — HTML Fallback
Stamped's REST API requires a Professional plan + Basic Auth. But Stamped renders reviews directly in the page HTML as div[itemprop="review"] Schema.org elements. BeautifulSoup extracts them without any API calls.
You get the first page of rendered reviews (typically 10–20). You do NOT get paginated access to all reviews this way — Stamped loads subsequent pages dynamically via JS. For full pagination on Stamped stores, use Playwright to scroll and trigger the "Load More" button. For most competitive research purposes, the first 10–20 reviews are sufficient to extract the language patterns needed for angles.
def scrape_stamped_html(info):
soup = BeautifulSoup(info.get('raw_html',''), 'lxml'); reviews = []
for el in soup.find_all(attrs={'itemprop':'review'}):
r_el = el.find(attrs={'itemprop':'ratingValue'})
rating = None
try: rating = int(float(r_el.get('content') or r_el.text)) if r_el else None
except: pass
t = el.find(attrs={'itemprop':'name'}); title = t.text.strip() if t else ''
b = el.find(attrs={'itemprop':'reviewBody'}) or el.find(attrs={'itemprop':'description'})
body = b.text.strip() if b else ''
a = el.find(attrs={'itemprop':'author'})
author = a.text.strip() if a else ''
d = el.find(attrs={'itemprop':'datePublished'})
date = (d.get('content','') or d.text.strip())[:10] if d else ''
if body: reviews.append({'id':None,'rating':rating,'title':title,'body':body,'author':author,'date':date,'verified':False,'platform':'stamped'})
print(f" ℹ Stamped: {len(reviews)} reviews from first-page HTML only.")
return reviews
# -- EXPORT + MAIN -----------------------------------------------
def export(reviews, slug):
fields = ['id','rating','title','body','author','date','verified','platform']
with open(f"{slug}.csv",'w',newline='',encoding='utf-8') as f:
w = csv.DictWriter(f,fieldnames=fields,extrasaction='ignore'); w.writeheader(); w.writerows(reviews)
with open(f"{slug}.json",'w',encoding='utf-8') as f: json.dump(reviews,f,indent=2,ensure_ascii=False)
print(f" Saved: {slug}.csv + {slug}.json")
def main():
url = input(" Product URL: ").strip()
mx = input(" Max reviews [500]: ").strip()
max_r = int(mx) if mx.isdigit() else 500
info = detect_platform(url)
plat = info['platform']
print(f" Platform: {plat.upper()} | Product ID: {info['product_id']}")
if plat == 'unknown': print("No supported platform detected."); return
scrapers = {'judgeme': scrape_judgeme, 'yotpo': scrape_yotpo, 'okendo': scrape_okendo, 'loox': scrape_loox}
reviews = scrapers[plat](info, max_r) if plat in scrapers else scrape_stamped_html(info)
if not reviews: print("No reviews found."); return
slug = urlparse(url).netloc.replace('.','-') + f"_{plat}_" + datetime.now().strftime("%Y%m%d_%H%M")
export(reviews, slug)
print(f"Done. Feed {slug}.csv into Claude.")
if __name__ == '__main__': main()
Output Structure
Every review normalised to the same 8-field schema regardless of platform.
| Field | Type | Description | Example |
|---|---|---|---|
| id | Integer | Platform review ID | 309836503 |
| rating | Integer 1–5 | Star rating | 4 |
| title | String | Review headline | "Changed how I sleep" |
| body | String | Full review text — the gold | "I've tried 6 magnesium products and this is the first one that didn't upset my stomach…" |
| author | String | Reviewer display name | "Sarah M." |
| date | YYYY-MM-DD | Submission date | 2024-09-12 |
| verified | Boolean | Verified purchaser | True |
| platform | String | Source platform | judgeme |
Worked Example — URL to Ad Angles
Here’s what a real output looks like, and what it produces when fed into Claude. Representative example: health supplement on Judge.me with 347 reviews.
URL: https://[brand].com/products/magnesium-glycinate
Platform: judge.me · Reviews: 347 · Avg: 4.6★ · Low-rated (1–2★): 28
Sample rows from the CSV — raw language that becomes ad copy inputs:
Run these through Prompt 3 and Claude extracts this language bank:
Before language (pain): — "I have tried every magnesium supplement on the market" — "most gave me digestive issues" — "waking up at 3am" Result language (after): — "I actually noticed a difference within a week" — "fall asleep faster" — "not waking up at 3am anymore" — "my husband now takes it too" Comparison language: — "switched from Doctor's Best" — "had been using melatonin for years" Recommendation language: — "bought a second bottle for my mum" — "told my whole book club about it"
That language directly surfaces three testable angles:
URL in → 347 reviews out → Claude analysis → three differentiated angles with real buyer language, zero guessing, under 5 minutes. The 28 low-rated reviews alone surfaced a product-format gap the brand’s own marketing ignores. That’s why you mine the full set, not just the 5-star pile.
Claude Analysis Prompts
Here are [N] customer reviews for [PRODUCT] from a competitor's Shopify store. Return: 1. Top 5 things customers love — exact language 2. Top 5 complaints or unmet expectations 3. Phrases that repeat across multiple reviews 4. Emotional language describing before/after 5. What they tried before this product 6. What tipped them into buying Do not summarise. Exact phrases only.
Isolate only the 1-star and 2-star reviews. For each: what specific failure? Is it product, delivery, price, results timeline, or expectation mismatch? Summary: - 3 most common failure modes - What customers actually wanted but didn't get - 3 ad angles for a competing product that directly addresses these gaps
Extract all instances of: 1. Before language — how customers described their problem before buying 2. Result language — specific outcomes, not generic "love it" 3. Comparison language — what they tried before or switched from 4. Recommendation language — exact words used to tell others Four separate lists. Exact phrases only.
Generate 8 distinct static ad angles for a competing product based on these reviews. For each: - Angle name - Core message in one sentence - Headline (under 8 words) - Supporting line from actual customer language - Target awareness level and pain point Angles must emerge from the review language — not generic category claims.
Rate Limits, Ethics, What Not To Do
These are public widget APIs serving data to any site visitor. Keep the sleep delays, cap volume sensibly, never attempt authenticated or private endpoints. The time.sleep() calls in the code exist for a reason.
- Do: scrape publicly visible review data for VOC research and competitive intelligence
- Do: keep sleep delays between requests — already in the code
- Do: use the insights, not the content — extract language patterns, don’t reproduce reviews verbatim
- Don’t: access private/authenticated endpoints — order data, customer emails, merchant dashboards
- Don’t: run hundreds of concurrent requests — single-threaded is sufficient and responsible
- Don’t: republish scraped reviews directly
Some larger stores block raw HTTP requests. If you get a 403, use Playwright: launch Chromium → navigate to the product page → intercept the XHR requests the review widget fires. Same API, same JSON — you just trigger the widget load via a real browser rather than Python's HTTP client.
Judge.me: judge.me/api/v1/reviews?shop_domain=X&product_external_id=Y&page=N&per_page=100
Loox (2025): storefront-api.loox.io/storefront/v1/store/{STORE_ID}/product-reviews?productId=Y&page=N
Yotpo: api-cdn.yotpo.com/v1/widget/{APP_KEY}/products/{ID}/reviews.json?page=N&per_page=150
Okendo: api.okendo.io/v1/stores/{SUB_ID}/products/shopify-{ID}/reviews?limit=100
Stamped: HTML parse — div[itemprop="review"] Schema.org elements (first page only)
This tool gives you the raw data. Turning it into angles that perform on Meta requires a different layer of judgment. If you’re spending consistently and need a creative system that starts this deep — let’s talk.
Book a Strategy Call15 years · 3,000+ statics · 50+ DTC brands · £1M+ tracked revenue