Imagine you open the same website page every day. This page has 20 different tables of data but you only need to look at one.
This data is critical and every hour that data changes
You have to monitor it constantly and possibly use the data for another task.
In this tutorial, we’re going to read data from a table automatically on click, we’re going to then show it in a sidebar.
Great for hacking time off repetitive tasks and have data available at a glance constantly.
1. Open the Page Editor
Let’s navigate to this page https://en.wikipedia.org/wiki/Table_(information) and open the PixieBrix Page Editor.
2. Add Sidebar Panel brick
The first step after opening the Page editor is to add the Sidebar Panel brick.
Click the green Add button from the mod menu and add it.
Now we can trigger this sidebar from opening when navigating on the page
We’re going to configure the brick like this:
- Name:
Wikipedia Table Parser
- Heading:
Parsed data
Leave the other fields as they. The whole brick should look like this:
3. Add the Table Reader brick
Now the fun part of this tutorial, we’re going to select the table from the page that we want to use as the source for the data which we will be displayed in the sidebar.
Let’s add a new brick called Table Reader: this brick allows us to Read data from an HTML table.
<table><tr><td>
etc.
👉 Read more about HTML tablesThe output of this brick is by default @data → leaving it like this but we’re going to use this in another step later.
We’re going to pick a selector for this brick: You can use the blue arrow button to visually select the selector...or go the manual way and inspect the page by hand.
I chose the latter, and I figured out that if I use table:nth-child(15)
I can select the entire table on the page.
:nth-child()
pseudo-class finds and returns elements based on the position they hold in a group.4. Add the HTML Renderer
In this last step, we’re going to parse the data we got from the Table Reader with the HTML Renderer brick
Let’s go ahead and add this brick by clicking the Add a brick button from the mod overview panel
In this brick, we’re going to:
- Retrieve the extracted data read by the previous brick to the sidebar
- Iterate through the records using Nunjucks templating language
- Display the table using an HTML table in the sidebar
Enter the below code in the html field for this brick
I will take a moment to explain what the above code does since it’s not super obvious:
- First I am iterating through the variable
@table.fieldNames
.fieldNames
is the variable name from the table-data parsed by the previous brick. I do this because I want to create a column for each field-name, using the<tr>
HTML element. - Second, I extract every row from
@table.records
I know the property names of this object are: First name
Last name
Age
- Third, I add an HTML style to the table and the cell
I can extract these values and place them in a row in groups of 3 wrapped in a row a single row <tr>
, in 3 table cells each <td>
Save the mod, then refresh the page and click the PixieBrix sidebar button and you should be able to see the data we extracted by scraping the values of the HTML table all automatically.
Conclusion
In this tutorial we went over how to scrape data automatically from a page, specifically tabular data.
We then rendered the same data in an HTML table that we coded by hand programmatically from the data we had scraped.
We showed this data in a side-bar to quickly identify it and manipulate using Nunjucks and HTML!
Thank you for reading this far and hoping you’ve learned something new!