Retrieving Attributes from Elements
Last updated
Last updated
When scraping elements on a page, in some cases you may want to scrape something other than text from an element, such as the value of a specific attribute like an ahref, aria-label, or id.
Start with the Extract from Page brick, and if that doesn't work for your element, explore the Traverse Elements route.
Before adding either of these bricks, you'll need a starter brick for triggering your mod. Learn more about starter bricks in Types of Mods.
Prefer to watch? The below video covers the content on this page:
Click the + button in the Brick Actions panel below your starter brick. Search for the Extract from Page brick, and hover over the brick to click the blue Add button.
In the Selector field, use jQuery or CSS selectors to specify the element you'd like to target. If you don't know the selector, you can use the green mouse to click the element on the screen and PixieBrix will apply selectors.
Below the Selector field, you'll find a dropdown for Extract. If PixieBrix has successfully identified an element based on the Selector provided, you'll be able to choose from Text, Element, or specific attributes from the elements.
Select Element
if you want to extract nested properties.
Run your mod and go to the Output tab of the Data Panel while the Extract from Page brick is selected.
You should see a data object with your targeted attribute. Click the copy icon to copy the path and reference the value in another brick.
You'd likely only use this approach if you are unable to find the attribute via Extract from Page, or if you're unable to find the specific element via CSS or jQuery selectors.
Click the + button in the Brick Actions panel to search for the Traverse Elements brick. Hover over the brick and click the blue Add button.
In the selector field, you can manually type the CSS or jQuery selectors for the element, or click the green mouse button to select an area on the screen.
If you are having trouble finding the element, you may need to use the traversal property to find related elements to an element that you can successfully find.
Run the mod to generate an output for the Traverse Elements brick. Check the output in the Data Panel on the far right panel, and open the @transformed
object, then the elements
array and copy the path of the element reference. By default, the pathname will be @transformed.elements[0].
Click the + button in the Brick Actions panel below the Traverse Elements brick and search for the HTML element reader brick. Hover over the brick, and click the blue Add button.
Go to the > Advanced Options and update the Target Root Mode field to Element
.
Once you select Element, the Target Element field appears just below. Paste the path to the selected element from the Traverse Brick that you copied from Step 2.
Run the mod once more and view the output from the HTML element reader brick. You should have access to all attributes of that element.
Copy the path of the desired attribute, and you can reference the attribute value in another brick.
If you wanted to reference the href
attribute and open that link in another tab, you could add an Open a tab brick and pass the @element.attrs.href
value. In this case, you'd need to set the domain name before the path, so you would use Text Templating to preset the domain and then append the path, like this: