What Is Scrapy Fingerprinting?
Scrapy fingerprinting refers to the techniques websites use to identify Scrapy-based web scrapers —
and the counter-techniques scrapers use to avoid detection. Every HTTP request your Scrapy spider makes carries
fingerprint signals: headers, TLS handshake characteristics, request patterns, and behavioural markers that
distinguish bot traffic from human browsers.
In 2026, anti-bot systems like Cloudflare, DataDome, PerimeterX, and Akamai Bot Manager are sophisticated enough to
detect Scrapy requests within milliseconds. Understanding and managing your scraper’s fingerprint is the difference
between successful data collection and endless 403 responses.
How Websites Fingerprint Scrapy Requests
Layer 1: HTTP Headers
The simplest fingerprinting layer. Scrapy’s default headers are a dead giveaway:
# Default Scrapy User-Agent (immediately detected)
User-Agent: Scrapy/2.11.0 (+https://scrapy.org)
# Missing headers that real browsers always send
Accept: text/html,application/xhtml+xml,...
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Layer 2: TLS Fingerprinting (JA3/JA4)
This is the layer most Scrapy users don’t know about. When your spider connects via HTTPS, the TLS handshake reveals:
- JA3 hash: A fingerprint of cipher suites, TLS extensions, and elliptic curves your client
supports - HTTP/2 fingerprint: SETTINGS frame, window size, and priority tree
- Python’s default TLS stack produces a JA3 hash that doesn’t match any real browser
| Client | JA3 Hash | Detection Result |
|---|---|---|
| Chrome 120 | cd08e31494f9531f560d64c695473da9 | ✅ Looks like a browser |
| Firefox 121 | b32309a26951912be7dba376398abc3b | ✅ Looks like a browser |
| Python requests | bf09a91d7f0e10a4a0b69e8f5c2d7a32 | ❌ Instantly flagged as bot |
| Scrapy (default) | Similar to Python requests | ❌ Instantly flagged as bot |
Layer 3: Behavioural Fingerprinting
- Request timing: Scrapy makes requests at machine speed — real humans have variable delays
- Navigation patterns: Scrapy typically crawls breadth-first or depth-first systematically —
humans browse randomly - Missing assets: Scrapy doesn’t load CSS, images, JS, or fonts — real browsers do
- Cookie handling: Scrapy’s default cookie jar doesn’t replicate browser cookie behaviour
- Referrer chains: Scrapy often visits pages without appropriate referrer headers
Layer 4: JavaScript Fingerprinting
Anti-bot systems inject JavaScript challenges that check:
- Whether JavaScript executes at all (Scrapy doesn’t render JS)
- Browser APIs like
navigator.webdriver,window.chrome - Canvas and WebGL fingerprints
- Automation framework signatures
Scrapy Anti-Fingerprinting Techniques
1. Realistic Headers
# settings.py — Set realistic default headers
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Cache-Control': 'max-age=0',
'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
'Sec-Ch-Ua-Mobile': '?0',
'Sec-Ch-Ua-Platform': '"Windows"',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
}
# Rotate User-Agents with scrapy-fake-useragent
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
}
FAKEUSERAGENT_PROVIDERS = [
'scrapy_fake_useragent.providers.FakeUserAgentProvider',
]
2. TLS Fingerprint Spoofing
# Use curl_cffi or tls-client to spoof TLS fingerprint
# Install: pip install curl_cffi
from curl_cffi.requests import Session
class TLSSpoofMiddleware:
def process_request(self, request, spider):
session = Session(impersonate="chrome120")
response = session.get(
request.url,
headers=dict(request.headers),
timeout=30
)
# Convert to Scrapy response
return HtmlResponse(
url=request.url,
status=response.status_code,
headers=dict(response.headers),
body=response.content,
request=request
)
3. Human-Like Request Timing
# settings.py — Add realistic delays
DOWNLOAD_DELAY = 2 # Minimum 2 seconds between requests
RANDOMIZE_DOWNLOAD_DELAY = True # Random delay 0.5x-1.5x
# AutoThrottle for adaptive speed
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 3
AUTOTHROTTLE_MAX_DELAY = 10
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Custom random delays in spider
import random
import time
class MySpider(scrapy.Spider):
def parse(self, response):
# Random human-like delay
time.sleep(random.uniform(1.5, 4.5))
yield scrapy.Request(next_url, callback=self.parse_detail)
4. Proxy Rotation
# settings.py — Use rotating residential proxies
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
'myproject.middlewares.RotatingProxyMiddleware': 100,
}
# Custom rotating proxy middleware
import random
class RotatingProxyMiddleware:
def __init__(self):
self.proxies = [
'http://user:pass@proxy1.example.com:8080',
'http://user:pass@proxy2.example.com:8080',
'http://user:pass@proxy3.example.com:8080',
]
def process_request(self, request, spider):
request.meta['proxy'] = random.choice(self.proxies)
5. JavaScript Rendering
# Use Scrapy-Playwright for JS-heavy sites
# Install: pip install scrapy-playwright
# settings.py
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
# In spider
class JSSpider(scrapy.Spider):
def start_requests(self):
yield scrapy.Request(
url='https://target-site.com',
meta={
'playwright': True,
'playwright_include_page': True,
'playwright_page_methods': [
PageMethod('wait_for_selector', 'div.content'),
],
}
)
async def parse(self, response):
page = response.meta['playwright_page']
await page.close()
yield {'title': response.css('h1::text').get()}
Advanced Anti-Detection Strategies
Fingerprint Consistency
A common mistake is rotating fingerprint elements independently (random User-Agent but consistent JA3). Anti-bot
systems check for consistency:
| Check | What Must Match | Inconsistency Detection |
|---|---|---|
| UA + TLS | Chrome UA must have Chrome JA3 | Python JA3 with Chrome UA → banned |
| UA + Headers | Sec-Ch-Ua must match UA version | Chrome/120 UA with Chrome/118 hints → flagged |
| Language + Timezone + IP | US proxy should have en-US language | JP proxy with en-US timezone → suspicious |
| Platform + Screen | Mobile UA with desktop screen → flagged | Mismatched platform signals |
Session-Based Scraping
Instead of stateless requests, maintain sessions that behave like real browsing:
- Visit the homepage before hitting product pages
- Load the search page before accessing search results
- Include referer headers that match the navigation flow
- Accept and send cookies naturally through the session
Scrapy vs Headless Browser for Anti-Detection
| Factor | Scrapy (HTTP-only) | Scrapy + Playwright | Full Browser Profile |
|---|---|---|---|
| Speed | ⭐⭐⭐⭐⭐ Very fast | ⭐⭐⭐ Moderate | ⭐⭐ Slower |
| Resource usage | Minimal | Moderate (headless browser) | High (full browser) |
| Anti-detection | ⭐⭐ Basic (headers only) | ⭐⭐⭐⭐ Good (real browser) | ⭐⭐⭐⭐⭐ Best (real fingerprint) |
| JavaScript rendering | ❌ None | ✅ Full | ✅ Full |
| TLS fingerprint | ❌ Python (detectable) | ✅ Real browser | ✅ Real browser |
| Scale | 1000s concurrent | 10s-100s concurrent | 1s-10s concurrent |
For sites with strong anti-bot protection, consider using isolated browser profiles through platforms like cloud browser profiles which provide real browser fingerprints that anti-bot systems can’t
distinguish from genuine users.
Testing Your Scraper’s Fingerprint
Fingerprint Testing Sites
- bot.sannysoft.com — Checks common automation indicators
- browserleaks.com — Comprehensive fingerprint display
- ja3er.com — Shows your JA3 TLS fingerprint
- httpbin.org/headers — Shows your request headers
- pixelscan.net — Full fingerprint consistency analysis
How Send.win Helps You Master Scrapy Fingerprinting
Send.win makes Scrapy Fingerprinting simple and secure with powerful browser isolation technology:
- Browser Isolation – Every tab runs in a sandboxed environment
- Cloud Sync – Access your sessions from any device
- Multi-Account Management – Manage unlimited accounts safely
- No Installation Required – Works instantly in your browser
- Affordable Pricing – Enterprise features without enterprise costs
Try Send.win Free – No Credit Card Required
Experience the power of browser isolation with our free demo:
- Instant Access – Start testing in seconds
- Full Features – Try all capabilities
- Secure – Bank-level encryption
- Cross-Platform – Works on desktop, mobile, tablet
- 14-Day Money-Back Guarantee
Ready to upgrade? View pricing plans starting at just $9/month.
Quick Self-Test Script
import scrapy
class FingerprintTestSpider(scrapy.Spider):
name = 'fingerprint_test'
start_urls = ['https://httpbin.org/headers']
def parse(self, response):
import json
headers = json.loads(response.text)
for key, value in headers['headers'].items():
self.logger.info(f'{key}: {value}')
# Check for missing browser headers
expected = ['Accept', 'Accept-Language', 'Sec-Fetch-Dest',
'Sec-Ch-Ua', 'Sec-Fetch-Mode']
for h in expected:
if h not in headers['headers']:
self.logger.warning(f'⚠️ Missing header: {h}')
Frequently Asked Questions
Why does my Scrapy spider get blocked even with rotating proxies?
Proxies only change your IP fingerprint. If your TLS fingerprint (JA3), header fingerprint, and behavioural patterns
still look like a bot, anti-bot systems will block you regardless of IP. You need to address all fingerprint layers
— headers, TLS, timing, and optionally JavaScript rendering.
Is Scrapy good enough for heavily protected sites?
For sites with Cloudflare, DataDome, or PerimeterX, plain Scrapy (HTTP requests only) is usually insufficient. You
need either Scrapy + Playwright for JS rendering and real browser fingerprints, or a TLS-spoofing library like
curl_cffi. For the most protected sites, isolated browser profiles with genuinely unique fingerprints are the most reliable approach.
Can I change Scrapy’s JA3 fingerprint?
Not directly — JA3 is determined by the TLS library (OpenSSL via Twisted in Scrapy’s case). To change it, use
curl_cffi with browser impersonation, or switch to Scrapy-Playwright which uses a real browser for the TLS
handshake.
How do I know if I’m being fingerprinted?
Signs include: consistent 403/429 errors despite proxy rotation, CAPTCHAs appearing on every request, responses
containing JavaScript challenges but no content, and receiving different (simpler) page content than what a browser
sees.
Should I use headless Chrome instead of Scrapy?
For heavily protected sites, yes — a headless browser has a real TLS fingerprint, executes JavaScript, and looks much
more like a genuine browser. But Scrapy is 10-100x faster and lighter for sites without aggressive anti-bot
protection. The best approach is often Scrapy for simple sites and Scrapy + Playwright for protected ones.
Conclusion
Mastering Scrapy fingerprinting requires addressing every detection layer: HTTP headers, TLS
fingerprints, request timing, and JavaScript execution. The strongest scrapers in 2026 combine Scrapy’s efficiency
with browser-grade fingerprints — either through TLS spoofing libraries, Scrapy-Playwright integration, or cloud
browser profiles that provide genuine browser sessions.
Start with realistic headers and timing, add TLS fingerprint spoofing for moderately protected sites, and use full
browser rendering (Scrapy-Playwright or cloud browser profiles via Send.win) for the most heavily
protected targets. The right fingerprint strategy depends on the target site’s defences — there’s no
one-size-fits-all solution.
