触发数据收集 API - Bright Data Docs

cURL

curl --request POST \
  --url https://api.brightdata.com/datasets/v3/trigger \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
[
  {
    "url": "https://il.linkedin.com/company/bright-data"
  }
]
'

{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

POST

datasets

trigger

cURL

curl --request POST \
  --url https://api.brightdata.com/datasets/v3/trigger \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
[
  {
    "url": "https://il.linkedin.com/company/bright-data"
  }
]
'

{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

正文

要供抓取器使用的输入。可以作为 JSON 或 CSV 文件提供：

Content-Type

string

Content-Type: application/json输入的 JSON 数组

示例: [{"url":"https://www.airbnb.com/rooms/50122531"}]

Content-Type: multipart/form-data一个 CSV 文件，字段名为 data

示例 (curl): data=@path/to/your/file.csv

Web Scraper 类型

每种抓取器可能需要不同的输入。主要有两种类型的抓取器：

1. PDP

这些抓取器需要 URL 作为输入。PDP 抓取器从网页中提取详细的产品信息，如规格、定价和功能。

2. Discovery

Discovery 抓取器允许您通过搜索、类别、关键词等来探索和发现新实体/产品。

请求示例

`PDP` 以 URL 输入

PDP 的输入格式始终是 URL，指向要抓取的页面。

Sample Request

curl -H "Authorization: Bearer API_TOKEN" -H "Content-Type: application/json" -d '[{"url":"https://www.airbnb.com/rooms/50122531"},{"url":"https://www.airbnb.com/rooms/50127677"}]' "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_ld7ll037kqy322v05&format=json&uncompressed_webhook=true"

基于 `discovery` 方法的 Discovery 输入

Sample Request

curl -H "Authorization: Bearer x2x3fdaaddrer" -H "Content-Type: application/json" -d '[{"keyword":"light bulb"},{"keyword":"dog toys"},{"keyword":"home decor"}]' "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_l7q7dkf244hwjntr0&endpoint=https://webhook-url.com&auth_header=QWxhZGRpbjpPcGVuU2VzYW1l&notify=https://notify-me.com/&format=ndjson&uncompressed_webhook=true&type=discover_new&discover_by=keyword&limit_per_input=10"

discovery 的输入格式可以根据特定的抓取器有所不同。输入可以是：

[{"keyword": "light bulb"},{"keyword": "dog toys"},{"keyword": "home decor"}]

以及更多。了解每个抓取器需要的输入，请参见这里.

授权

Authorization

string

header

必填

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

查询参数

dataset_id

string

必填

Dataset ID for which data collection is triggered.

示例:

"gd_l1vikfnt1wgvvqz95w"

type

enum<string>

Set it to "discover_new" to trigger a collection that includes a discovery phase.

可用选项:

discover_new

discover_by

string

Specifies which discovery method to use. Available options: "keyword", "best_sellers_url", "category_url", "location" and more (according to the specific API). Relevant only for collections that include a discovery phase.

include_errors

boolean

Include errors report with the results.

limit_per_input

number

Limit the number of results per input. Relevant only for collections that include a discovery phase.

必填范围: x >= 1

limit_multiple_results

number

Limit the total number of results.

必填范围: x >= 1

notify

string

URL where the notification will be sent once the collection is finished. Notification will contain snapshot_id and status.

endpoint

string

Webhook URL where data will be delivered.

format

enum<string>

Specifies the format of the data to be delivered to the webhook endpoint.

可用选项:

json,

ndjson,

jsonl,

csv

auth_header

string

Authorization header to be used when sending notification to notify URL or delivering data via webhook endpoint.

uncompressed_webhook

boolean

By default, the data will be sent to the webhook compressed. Pass true to send it uncompressed.

请求体

Only inputs · object[]
Deliver config and inputs · object

{key}

any

响应

200 - application/json

Collection job successfully started

snapshot_id

string

ID of your request that can be used in the next APIs

示例:

"s_m4x7enmven8djfqak"

错误下载快照

⌘I

Documentation Index

​正文

​Web Scraper 类型

​1. PDP

​2. Discovery

​请求示例

​PDP 以 URL 输入

​基于 discovery 方法的 Discovery 输入

授权

查询参数

请求体

响应

正文

Web Scraper 类型

1. PDP

2. Discovery

请求示例

`PDP` 以 URL 输入

基于 `discovery` 方法的 Discovery 输入