Retrieving Attributes from Elements
Last updated
Was this helpful?
Last updated
Was this helpful?
When scraping elements on a page, in some cases you may want to scrape something other than text from an element, such as the value of a specific attribute like an ahref, aria-label, or id.
Start with the Extract from Page brick, and if that doesn't work for your element, explore the Traverse Elements route.
Prefer to watch? The below video covers the content on this page:
In the Selector field, use jQuery or CSS selectors to specify the element you'd like to target. If you don't know the selector, you can use the green mouse to click the element on the screen and PixieBrix will apply selectors.
Below the Selector field, you'll find a dropdown for Extract. If PixieBrix has successfully identified an element based on the Selector provided, you'll be able to choose from Text, Element, or specific attributes from the elements.
You should see a data object with your targeted attribute. Click the copy icon to copy the path and reference the value in another brick.
In the selector field, you can manually type the CSS or jQuery selectors for the element, or click the green mouse button to select an area on the screen.
Go to the > Advanced Options and update the Target Root Mode field to Element
.
Once you select Element, the Target Element field appears just below. Paste the path to the selected element from the Traverse Brick that you copied from Step 2.
Run the mod once more and view the output from the HTML element reader brick. You should have access to all attributes of that element.
Copy the path of the desired attribute, and you can reference the attribute value in another brick.
Click the + button in the below your starter brick. Search for the Extract from Page brick, and hover over the brick to click the blue Add button.
Run your mod and go to the Output tab of the while the Extract from Page brick is selected.
Click the + button in the to search for the Traverse Elements brick. Hover over the brick and click the blue Add button.
Run the mod to generate an output for the Traverse Elements brick. Check the output in the on the far right panel, and open the @transformed
object, then the elements
array and copy the path of the element reference. By default, the pathname will be @transformed.elements[0].
Click the + button in the below the Traverse Elements brick and search for the HTML element reader brick. Hover over the brick, and click the blue Add button.
If you wanted to reference the href
attribute and open that link in another tab, you could add an Open a tab brick and pass the @element.attrs.href
value. In this case, you'd need to set the domain name before the path, so you would use to preset the domain and then append the path, like this: