Bricks for Scraping Data
Last updated
Last updated
When building automations with PixieBrix, you'll often want to scrape information from a webpage. This could be as simple as pulling the current page's title or URL, or as complex as scraping a consistent field scraped from multiple pages, like the price of an Airbnb listing, a Salesforce case number, or something else entirely.
If you want some metadata about the current page, you can always access that out of the box wherever you trigger a mod.
For instance, you can see in our newly added Quick Bar Action brick we have a Preview panel on the right side with an object called @input. Click the caret icon (‣) before that, and you'll see everything you have access to.
Click the copy icon next to a field, and it will copy the path to your clipboard, allowing you to reference it later in other bricks!
If you want to scrape information from elements that are on the page, you'll need to use our Extract from Page brick.
With this brick, you'll be able to read data associated with any specific elements on the page, you just need to pass it through the right jQuery selectors, and PixieBrix can even help you with that!
🚨Pro tip: PixieBrix can often guess the right selector, but sometimes it can't. You might need to use the Elements tab of your Chrome Dev tools to comb through an element and find the best class that works consistently across multiple pages. For more information, read this article or watch this video.
If you want to extract data from a table on a page, use the Table Reader brick. Provide a selector that contains the table and the output with contain the records from the table.
To send each item to another source, add the For-Each Loop brick after the Table Reader brick and specify the array of records from the Table Reader output.
If you have many items in your records, you may need to add a Wait/Sleep brick inside the loop to prevent rate limiting from the API receiving the data.
If you're unsure how to pick selectors, we've got another brick you can use. With the Extract from Page using AI brick, you can pass a section of a page to ChatGPT and ask it to find the right property for you rather than figure it out yourself.
Here are a few things to keep in mind before trying this out.
1) You can't pass on the whole body of a page or extremely large containers. It's too much data for ChatGPT to handle in one request. Try selecting the smallest element you can.
2) Click Add Item in the Properties field and then type the property you're searching for and see if AI can find it for you, like this!
If you're interested in learning more, read the docs for the Extract from Page with AI brick.