Retrieving Attributes from Elements

When scraping elements on a page, in some cases you may want to scrape something other than text from an element, such as the value of a specific attribute like an ahref, aria-label, or id.

Start with the Extract from Page brick, and if that doesn't work for your element, explore the Traverse Elements route.

Before adding either of these bricks, you'll need a starter brick for triggering your mod. Learn more about starter bricks in Types of Mods.

Prefer to watch? The below video covers the content on this page:

Using Extract from Page brick

1. Add the Extract from Page brick.

Click the + button in the Brick Actions panel below your starter brick. Search for the Extract from Page brick, and hover over the brick to click the blue Add button.

2. Specify the selector of the element you want to scrape.

In the Selector field, use jQuery or CSS selectors to specify the element you'd like to target. If you don't know the selector, you can use the green mouse to click the element on the screen and PixieBrix will apply selectors.

3. Use the Extract field to specify what you'd like to scrape.

Below the Selector field, you'll find a dropdown for Extract. If PixieBrix has successfully identified an element based on the Selector provided, you'll be able to choose from Text, Element, or specific attributes from the elements.

Select Element if you want to extract nested properties.

4. Run the mod to access the value in the output.

Run your mod and go to the Output tab of the Data Panel while the Extract from Page brick is selected.

You should see a data object with your targeted attribute. Click the copy icon to copy the path and reference the value in another brick.

Using Traverse Elements + HTTP Element Reader bricks

You'd likely only use this approach if you are unable to find the attribute via Extract from Page, or if you're unable to find the specific element via CSS or jQuery selectors.

1. Use the Traverse Elements brick to specify the element you'd like to access.

Click the + button in the Brick Actions panel to search for the Traverse Elements brick. Hover over the brick and click the blue Add button.

In the selector field, you can manually type the CSS or jQuery selectors for the element, or click the green mouse button to select an area on the screen.

If you are having trouble finding the element, you may need to use the traversal property to find related elements to an element that you can successfully find.

2. Access the output

Run the mod to generate an output for the Traverse Elements brick. Check the output in the Data Panel on the far right panel, and open the @transformed object, then the elements array and copy the path of the element reference. By default, the pathname will be @transformed.elements[0].

3. Use the HTML Element Reader brick to target the specified element.

Click the + button in the Brick Actions panel below the Traverse Elements brick and search for the HTML element reader brick. Hover over the brick, and click the blue Add button.

Go to the > Advanced Options and update the Target Root Mode field to Element.

Once you select Element, the Target Element field appears just below. Paste the path to the selected element from the Traverse Brick that you copied from Step 2.

4. View object with element's attributes in the HTML element reader brick output.

Run the mod once more and view the output from the HTML element reader brick. You should have access to all attributes of that element.

Copy the path of the desired attribute, and you can reference the attribute value in another brick.

Referencing the attribute value

If you wanted to reference the href attribute and open that link in another tab, you could add an Open a tab brick and pass the @element.attrs.href value. In this case, you'd need to set the domain name before the path, so you would use Text Templating to preset the domain and then append the path, like this:

https://www.linkedin.com/{{@element.attrs.href}}

Last updated