Comment on page
Retrieving Attributes from Elements
When scraping elements on a page, in some cases you may want to scrape something other than text from an element, such as the value of a specific attribute like an ahref, aria-label, or id.
Start with the Extract from Page brick, and if that doesn't work for your element, explore the Traverse Elements route.
Prefer to watch? The below video covers the content on this page:
In the Selector field, use jQuery or CSS selectors to specify the element you'd like to target. If you don't know the selector, you can use the green mouse to click the element on the screen and PixieBrix will apply selectors.
Below the Selector field, you'll find a dropdown for Extract. If PixieBrix has successfully identified an element based on the Selector provided, you'll be able to choose from Text, Element, or specific attributes from the elements.
Elementif you want to extract nested properties.
You should see a data object with your targeted attribute. Click the copy icon to copy the path and reference the value in another brick.
You'd likely only use this approach if you are unable to find the attribute via Extract from Page, or if you're unable to find the specific element via CSS or jQuery selectors.
In the selector field, you can manually type the CSS or jQuery selectors for the element, or click the green mouse button to select an area on the screen.
If you are having trouble finding the element, you may need to use the traversal property to find related elements to an element that you can successfully find.
Run the mod to generate an output for the Traverse Elements brick. Check the output in the Data Panel on the far right panel, and open the
@transformedobject, then the
elementsarray and copy the path of the element reference. By default, the pathname will be @transformed.elements.
Go to the > Advanced Options and update the Target Root Mode field to
Once you select Element, the Target Element field appears just below. Paste the path to the selected element from the Traverse Brick that you copied from Step 2.
Run the mod once more and view the output from the HTML element reader brick. You should have access to all attributes of that element.
Copy the path of the desired attribute, and you can reference the attribute value in another brick.
If you wanted to reference the
hrefattribute and open that link in another tab, you could add an Open a tab brick and pass the
@element.attrs.hrefvalue. In this case, you'd need to set the domain name before the path, so you would use Text Templating to preset the domain and then append the path, like this: