eBay价格监控实战：如何用python爬取eBay商品数据？

在跨境电商竞争日趋激烈的今天，掌握实时价格动态已成为卖家制胜的关键。eBay作为全球最大的C2C与B2C电商平台之一，每天承载着数亿件商品的交易。对于运营人员而言，仅靠人工刷新页面来追踪竞品价格，不仅效率低下，还极易错过最佳调价时机。

本文将手把手带你用Python搭建一套完整的eBay价格监控工具，涵盖eBay数据爬取的全流程——从环境搭建、代理配置、数据提取，到自动存储与邮件提醒——并重点解决实际采集中最常见的访问限制与封禁问题，帮助你以最低成本实现稳定、持续的eBay价格监控。

一、eBay数据爬取：有何价值？

对于跨境卖家、电商运营和数据分析人员来说，eBay数据爬取的价值远不止于了解价格，而是：

第一时间获取同类商品的调价信号，避免因定价滞后而流失订单。
通过批量采集销量、评价数量等维度，识别品类风口和潜力爆品。
替代人工录入，批量获取标题、图片、规格等结构化数据，提升选品效率。
结合历史数据，构建价格波动模型，为动态定价策略提供数据支撑。

基于以上场景，本文的目标是：使用Python搭建一个轻量级的eBay价格监控工具，并通过代理IP解决采集过程中常见的访问限制问题，让你的eBay数据爬取任务更稳定、更高效。

二、eBay价格监控：使用Python爬取商品数据

下面我们将分步骤完成整个系统的搭建，每一步均附有可直接运行的Python代码，你只需按顺序执行即可。

1. 工具准备

开始eBay数据爬取之前，请先安装以下依赖库：

pip install requests beautifulsoup4 lxml schedule smtplib

requests 负责发起HTTP请求；beautifulsoup4 + lxml 用于解析HTML；schedule 实现定时任务；内置的 smtplib 模块则用于发送价格提醒邮件，无需额外安装。

2. 获取eBay商品页面

eBay对爬虫有一定的UA检测机制，直接使用默认请求头很容易触发403或CAPTCHA。以下代码通过伪装浏览器请求头来获取页面内容：

import requests

from bs4 import BeautifulSoup

from requests.adapters import HTTPAdapter

from urllib3.util import Retry

def get_ebay_page(url, proxy=None):

# 1. 初始化 Session（自动管理 Cookie）

session = requests.Session()

# 2. 配置重试机制（防止网络偶发性闪断）

retries = Retry(total=3, backoff_factor=1, status_forcelist=[500, 502, 503, 504])

session.mount('http://', HTTPAdapter(max_retries=retries))

session.mount('https://', HTTPAdapter(max_retries=retries))

# 3. 更加逼真的浏览器请求头

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",

"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",

"Accept-Language": "en-US,en;q=0.9",

"Accept-Encoding": "gzip, deflate, br",

"Cache-Control": "max-age=0",

"Connection": "keep-alive",

"Upgrade-Insecure-Requests": "1"

}

session.headers.update(headers)

# 4. 规范化代理格式

proxies = None

if proxy:

# 如果传入的 proxy 没有协议头，自动补齐

if not proxy.startswith(('http://', 'https://')):

proxy = f"http://{proxy}"

proxies = {"http": proxy, "https": proxy}

try:

# 5. 可选：在请求具体商品前，先悄悄访问一下首页拿取基础 Cookie（防爬关键）

# session.get("https://www.ebay.com", proxies=proxies, timeout=5)

# 6. 正式请求商品页面

response = session.get(url, proxies=proxies, timeout=10)

response.raise_for_status()

# 针对 eBay 的特殊检查：有时候状态码是200，但内容其实是安全验证

if "captcha" in response.url or "Robot Check" in response.text:

print("[警告] 被 eBay 机器人检查拦截，请更换代理 IP 或更新 Cookie！")

return None

return response.text

except requests.exceptions.RequestException as e:

print(f"[错误] 请求页面失败: {e}")

return None

# ----- 测试调用 -----

if __name__ == "__main__":

test_url = "https://www.ebay.com/itm/225643445557" # 替换为一个真实的eBay商品链接

html_content = get_ebay_page(test_url)

if html_content:

soup = BeautifulSoup(html_content, "html.parser")

# 尝试打印页面标题看是否获取成功

print("页面标题:", soup.title.text.strip() if soup.title else "未找到标题")

需要注意的是，eBay的页面结构会因商品类别不同而存在差异，建议爬取前先在浏览器开发者工具中确认目标元素的CSS选择器。

3.配置代理

eBay数据爬取中最常见的障碍是平台对频繁请求的IP进行封锁。而使用动态住宅代理IP可以实现多个IP轮转，防止因IP封禁导致爬取任务中止，适合需要长期持续运行的eBay价格监控场景，例如IPFoxy提供的动态住宅IP，可作为专业数据采集团队的代理方案。

下面以IPFoxy动态住宅代理为例，演示如何在 Python 中配置并使用代理。

获取代理信息

通过IPFoxy获取【动态住宅代理】，分别配置州城市、协议类型、会话轮换类型、代理格式等信息，获取到可用于链接的代理连接信息。

在Python中配置代理

将刚刚在IPFoxy复制的代理连接信息粘贴到配置以下配置示例代码中，如代理连接信息是：username:password@gate-us-ipfoxy.io:58688，那么配置代码示例如下：

import urllib.request

if __name__ == '__main__':

proxy = urllib.request.ProxyHandler({

'https': 'username:password@gate-us-ipfoxy.io:58688',

'http': 'username:password@gate-us-ipfoxy.io:58688',

})

opener = urllib.request.build_opener(proxy,urllib.request.HTTPHandler)

urllib.request.install_opener(opener)

content = urllib.request.urlopen('http://www.ip-api.com/json').read()

print(content)

这时，直醒行代码就可以从日志看到出口IP改变，此时信息配置成功，可以进行下一步操作。

4.提取商品信息

页面获取成功后，使用BeautifulSoup解析HTML，提取商品标题、价格和库存状态等核心字段：

import re

from bs4 import BeautifulSoup

def extract_product_info(html):

# 建议默认使用 html.parser，如果安装了 lxml 也可以用 lxml

soup = BeautifulSoup(html, "html.parser")

# 1. 商品标题：精准定位到内部的 span 或是直接提取

title_element = soup.find("h1", class_="x-item-title__mainTitle")

if title_element:

# eBay 经常在 inside 塞入一个类名叫 ux-textspans 的 span

sub_span = title_element.find("span", class_="ux-textspans")

title_text = sub_span.get_text(strip=True) if sub_span else title_element.get_text(strip=True)

else:

title_text = "N/A"

# 2. 当前价格：提取并清洗出纯数字（方便后续比价监控）

price_element = soup.find("div", class_="x-price-primary")

price_text = "N/A"

raw_price = "N/A"

if price_element:

raw_price = price_element.get_text(strip=True) # 例如: "US $24.99" 或 "C $35.00/ea"

# 正则表达式：提取价格中的数字和小数点 (例如 24.99)

price_match = re.search(r'\d+(?:\.\d+)?', raw_price.replace(',', ''))

if price_match:

price_text = price_match.group(0) # 变为纯字符串数字 "24.99"，后续可直接 float(price_text)

# 3. 库存状态：多备用选择器提高容错率

stock_text = "Available" # 默认认为有货

# 尝试您原本的类名

stock_element = soup.find("div", class_="d-quantity__availability")

# 备用方案：eBay 有时会用别名包裹库存，如 "x-quantity__availability" 或 span 标签

if not stock_element:

stock_element = soup.find(class_=re.compile(r".*quantity__availability.*"))

if stock_element:

stock_text = stock_element.get_text(strip=True)

else:

# 检查页面是否包含 "Out of stock" 或 "Sold" 关键字

page_text = soup.get_text()

if "Out of stock" in page_text or "This item is out of stock" in page_text:

stock_text = "Out of Stock"

return {

"title": title_text,

"raw_price": raw_price, # 带有货币符号的原文本，如 "US $24.99"

"clean_price": price_text, # 纯数字，如 "24.99"，监控对比用它

"stock": stock_text,

}

5.保存价格数据

将每次采集到的价格数据追加写入CSV文件，方便后续进行价格趋势分析或导入到Excel/数据库中处理：

import csv

import os

from datetime import datetime

def save_to_csv(data: dict, product_url="N/A", filename="ebay_prices.csv"):

# 1. 定义 CSV 允许的标准字段（严格对应列名）

fieldnames = ["timestamp", "title", "price", "stock", "url"]

# 2. 使用 os.path 更加优雅、安全地判断文件是否存在

file_exists = os.path.exists(filename)

# 3. 构造要写入的目标字典（防止外部传入的 data 缺少字段或多了多余字段）

row_to_write = {

"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),

# 使用 .get(key, default) 容错，防止提取失败时字典少键导致报错

"title": data.get("title", "N/A"),

# 这里兼容我们上一步清洗出的 clean_price 或原本的 price

"price": data.get("clean_price") or data.get("price") or "N/A",

"stock": data.get("stock", "N/A"),

# 将 URL 作为参数显式传入，或者从 data 里取

"url": data.get("url") or product_url

}

# 4. 写入文件

try:

with open(filename, "a", newline="", encoding="utf-8-sig") as f:

# utf-8-sig 可以让 Windows 用户直接双击用 Excel 打开时不会出现中文乱码

writer = csv.DictWriter(f, fieldnames=fieldnames)

# 如果文件是新创建的，先写入表头

if not file_exists:

writer.writeheader()

# 写入数据

writer.writerow(row_to_write)

print(f"[成功] 数据已追加至 {filename}")

except Exception as e:

print(f"[错误] 写入 CSV 文件失败: {e}")

# ----- 模拟实战调用 -----

if __name__ == "__main__":

# 模拟上一步 extract_product_info 传出来的字典（假设没有 url）

mock_parsed_data = {

"title": "iPhone 15 Pro Max 256GB - Unlocked",

"clean_price": "999.99",

"raw_price": "US $999.99",

"stock": "Limited quantity available"

}

target_url = "https://www.ebay.com/itm/123456789"

# 测试保存

save_to_csv(mock_parsed_data, product_url=target_url)

通过timestamp字段记录采集时间，配合价格字段即可还原完整的历史价格曲线，为eBay价格监控提供可靠的数据基础。

6.监控价格变化

在数据保存的基础上，加入价格阈值判断逻辑。当商品价格低于预设目标值时，自动触发提醒流程：

import os

import re

import csv

from datetime import datetime

# ==================== 模拟前几步优化后的配套函数 ====================

# (这里简写作为上下文对照，实际运行时使用我们前几步改好的完整函数)

def get_ebay_page(url, proxy=None):

# 模拟返回，实际使用前文的requests.Session版

import requests

headers = {"User-Agent": "Mozilla/5.0"}

proxies = {"http": proxy, "https": proxy} if proxy else None

res = requests.get(url, headers=headers, proxies=proxies, timeout=10)

return res.text

def extract_product_info(html):

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

title_el = soup.find("h1", class_="x-item-title__mainTitle")

price_el = soup.find("div", class_="x-price-primary")

title = title_el.get_text(strip=True) if title_el else "Unknown Item"

raw_price = price_el.get_text(strip=True) if price_el else "N/A"

# 使用正则提取纯数字，兼容所有国家站点的货币符号

price_match = re.search(r'\d+(?:\.\d+)?', raw_price.replace(',', ''))

clean_price = price_match.group(0) if price_match else "N/A"

return {"title": title, "raw_price": raw_price, "clean_price": clean_price, "stock": "Available"}

def save_to_csv(data: dict, product_url, filename):

file_exists = os.path.exists(filename)

with open(filename, "a", newline="", encoding="utf-8-sig") as f:

writer = csv.DictWriter(f, fieldnames=["timestamp", "title", "price", "stock", "url"])

if not file_exists: writer.writeheader()

writer.writerow({

"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),

"title": data["title"],

"price": data["clean_price"],

"stock": data["stock"],

"url": product_url

})

# ==================== 核心改进后的监控与报警逻辑 ====================

def send_alert(title, current_price, url):

"""

触发提醒流程（可在此处扩展微信、钉钉、邮件等通知）

"""

print("\n" + "="*40)

print(" [价格预警触发] 发现低价商品！")

print(f" 商品名称: {title}")

print(f" 当前价格: {current_price}")

print(f" 购买链接: {url}")

print("="*40 + "\n")

def monitor_price(url, threshold_price, history_file="ebay_prices.csv", proxy=None):

"""

监控价格变化主函数

"""

print(f"[*] 开始检查商品价格... 目标阈值: {threshold_price}")

# 1. 获取页面源码（将 proxy 变为可选参数传入，避免硬编码）

html = get_ebay_page(url, proxy=proxy)

if not html:

print("[错误] 无法获取商品页面，监控跳过。")

return

# 2. 解析商品信息

info = extract_product_info(html)

# 3. 保存至 CSV 记录历史（传入所需的参数）

save_to_csv(info, product_url=url, filename=history_file)

# 4. 安全的价格数值解析：直接使用我们在 extract 阶段清洗出的 clean_price

try:

current_price = float(info["clean_price"])

except (ValueError, TypeError):

print(f"[警告] 无法解析价格数值。原始文本: {info.get('raw_price')}")

return

print(f" 成功获取 -> [{info['title']}] 当前价格: {current_price} (原始: {info['raw_price']})")

# 5. 阈值判断

if current_price <= threshold_price:

send_alert(info["title"], current_price, url)

else:

print(f"[-] 未降至目标价格，继续监控...")

# ----- 实战调用测试 -----

if __name__ == "__main__":

# 可以将您的代理 IP 统一在外部入口配置

MY_PROXY = None # 例如 "http://127.0.0.1:7890"

target_item_url = "https://www.ebay.com/itm/225643445557"

# 设定期望价格为 500

monitor_price(target_item_url, threshold_price=500.0, proxy=MY_PROXY)

7.自动发送价格提醒

配合上一步的monitor_price函数，以下代码实现了通过Gmail SMTP发送邮件提醒的功能，你也可以将其替换为企业微信或Slack Webhook：

import smtplib

from email.mime.text import MIMEText

from email.header import Header

def send_alert(title, price, url,

sender="your@gmail.com", # 替换为您的发件 Gmail 邮箱

receiver="your@gmail.com", # 替换为您的收件邮箱

password="xxxx xxxx xxxx xxxx" # 必须是 Google 账号生成的 16 位"应用专用密码"

):

# 1. 构造结构化的邮件正文 (使用带有超链接的 HTML 格式，比纯文本更直观)

html_body = f"""

<html>

<body>

<h2 style="color: #e53238;"> [eBay 价格预警触发]</h2>

商品名称：{title}

当前价格：{price}

购买链接：<a href="{url}" target="_blank">点击前往 eBay 购买</a>

 

<hr style="border:0; border-top:1px solid #eee;">

此邮件由 eBay 价格监控实战脚本自动发送，请勿直接回复。

</body>

</html>

"""

# 2. 显式指定采用 HTML 格式和 UTF-8 编码

msg = MIMEText(html_body, "html", "utf-8")

# 3. 关键修复：使用 Header 对象对中文标题进行 RFC 2047 规范化编码，彻底防止乱码和发送失败

msg["Subject"] = Header(f"[eBay价格提醒] {title} 已降至 {price}", "utf-8")

msg["From"] = Header(f"eBay 监控机器人 <{sender}>", "utf-8")

msg["To"] = Header(receiver, "utf-8")

# 4. 引入异常捕获，确保因网络或认证失败时，主监控程序不会跟着崩溃

try:

print("[*] 正在连接 Gmail SMTP 服务器...")

with smtplib.SMTP_SSL("smtp.gmail.com", 465, timeout=15) as server:

server.login(sender, password)

server.sendmail(sender, [receiver], msg.as_string())

print(" [成功] 价格提醒邮件已成功发送！")

return True

except smtplib.SMTPAuthenticationError:

print("[错误] 邮件发送失败：登录验证错误！请检查是否使用了正确的『应用专用密码』，而非网盘或谷歌账号的登录密码。")

except smtplib.SMTPConnectError:

print("[错误] 邮件发送失败：无法连接到 Gmail SMTP 服务器，请检查您的国内/国外网络环境或代理设置。")

except Exception as e:

print(f"[错误] 邮件发送时发生未知错误: {e}")

return False

# ----- 测试调用 -----

if __name__ == "__main__":

# 模拟在 monitor_price 中触发阈值后的调用

test_title = "iPhone 15 Pro Max 256GB Unlocked"

test_price = "$899.00"

test_url = "https://www.ebay.com/itm/123456789"

# 注意：运行前请确保替换了上面函数参数中的 sender, receiver 和 password 默认值

# send_alert(test_title, test_price, test_url)

需要注意的是，Gmail需要开启「两步验证」并生成「应用专用密码」才能通过SMTP登录，直接使用账号密码会报认证错误。

8.设置定时监控任务

最后，使用schedule库将以上所有步骤整合成一个自动化循环，让eBay价格监控系统在后台持续运行，无需人工干预。

最终运行建议

测试时：可以先把代码里的 THRESHOLD_PRICE 改得比当前商品价格高一点（比如当前商品 $300，你改成 $400），这样一启动就能立刻触发第一次检查和邮件发送，方便测试邮件功能是否配置正确。
长期挂机：如果你打算在本地电脑上一直挂着它，请确保电脑关闭了“自动睡眠/休眠”功能。

至此，一套完整的eBay数据爬取与价格监控系统已搭建完成。

三、FAQ

Q1：为什么爬取时频繁遇到403或CAPTCHA？

eBay会通过IP频率检测、User-Agent识别和行为特征分析来识别爬虫。解决方案包括：使用住宅代理IP（如IPFoxy）轮换请求IP、设置随机请求间隔（1~5秒）、添加完整的浏览器请求头。严重时可考虑引入Selenium模拟真实浏览器行为。

Q2：eBay商品页面结构更新后，爬虫就失效了怎么办？

eBay会不定期更新页面的HTML结构和CSS类名，导致原有选择器失效。建议：在代码中集中管理所有选择器常量，便于快速定位和修改；同时为关键字段添加多重备用选择器，增强代码的容错性；日常可设置错误日志告警，第一时间发现采集异常。

Q3：如何同时监控多个商品？

将目标商品URL与对应价格阈值整理成列表或配置文件，然后在schedule定时任务中遍历执行monitor_price即可。建议在多个商品之间加入随机延时，并配合代理IP轮换，降低批量请求被封禁的风险。

四、总结

对于跨境卖家和电商运营团队来说，建立稳定的eBay价格监控体系，意味着可以在第一时间响应市场变化、制定更具竞争力的定价策略。同时，借助动态住宅代理IP有效规避封禁，也是保障eBay数据爬取任务长期稳定运行的关键。

如果你希望进一步扩展，可以考虑：接入eBay官方API以获取更稳定的数据源、引入机器学习模型预测价格走势，或将监控范围扩展到Amazon、速卖通等其他平台，构建多平台的综合价格情报系统。

优惠50%

更多资讯

2026 AI自动化采集实战：如何用 Claude Code 进行网络爬虫？

LinkedIn获客实战指南：从养号到精准获客的完整流程

Shopee vs Lazada vs TikTok Shop：2026东南亚跨境电商平台全面对比

Claude Opus 4.8 正式发布：核心升级、实测对比与迁移指南

Facebook解封指南：4种封禁类型及其原因（附对应申诉方法）

查看全部