DOM (via JQuery)
The JQuery interface closely matches the JQuery API for selecting and extracting information.
Additional Resources
Example
reader:
type: jquery
selectors:
# simple selector that extracts text
propertyName: "#propertyHeader .propertyName"
# Set the renderedText flag to retain spacing. <br> elements are also
# replaced with line breaks. This is useful, e.g., for parsing addresses
# as the line break distinguishes between the street name and the city name
anotherProperty:
selector: "#propertyAddress"
renderedText: true
# the contents flag can be either "text" or "comment"
propertySize:
selector: ".specList:has(h3:contains('Property Information')) ul li:contains('Units')"
contents: text
# you can instead extract an attribute on the html element instead
# of the text
propertyCompany:
selector: "#propertyHeader img.logo"
# grab the alt attribute of the image, because it contains the name of the company
attr: alt
# nested information can be extracted using a selector and a find entry
agent:
selector: "#contactSection"
find:
name: .agentFullName
phone: .phoneNumber
availability:
selector: tr.rentalGridRow
# set multi to true to get an array of value
multi: true
find:
maxrent:
# you can also extract the data attribute of an element
data: maxrent
beds:
data: beds
React
Supports React 16+
Example
reader:
type: react
# JQuery selector for the react component
selector: ".details-page-container #ds-container .ds-home-details-chip"
# Optionally, the prop to read from the component
rootProp: property
# optional amount of time in milliseconds to wait for the component
# to become available on the page
waitMillis:
# optional number of HTML elements to traverse upward to try to find a
# the element with a corresponding React component
traverseUp:
Ember.js
The Ember.js reader reads the state of an Ember.js component
reader:
type: emberjs
# The JQuery selector for the ember component
selector: ".pv-contact-info"
Composite Readers
A reader for a foundation (kind=extensionPoint
) can be a reader id, or a combination of readers. Providing a mapping of readers will assign the output of a reader to a property:
reader:
apartment: apartments.com/property-reader
document: "@pixiebrix/document-context"
Providing an array of readers will use all the readers, with readers appearing later overriding properties read by the earlier readers.
reader:
- apartments.com/property-reader
- "@pixiebrix/document-context"
The styles can also be combined, with an array of readers being assigned to a property and vice versa:
reader:
- apartments.com/property-reader
- document: "@pixiebrix/document-context"