DOM (via JQuery)
The JQuery interface closely matches the JQuery API for selecting and extracting information.
reader: type: jquery selectors: # simple selector that extracts text propertyName: "#propertyHeader .propertyName" # Set the renderedText flag to retain spacing. <br> elements are also # replaced with line breaks. This is useful, e.g., for parsing addresses # as the line break distinguishes between the street name and the city name anotherProperty: selector: "#propertyAddress" renderedText: true # the contents flag can be either "text" or "comment" propertySize: selector: ".specList:has(h3:contains('Property Information')) ul li:contains('Units')" contents: text # you can instead extract an attribute on the html element instead # of the text propertyCompany: selector: "#propertyHeader img.logo" # grab the alt attribute of the image, because it contains the name of the company attr: alt # nested information can be extracted using a selector and a find entry agent: selector: "#contactSection" find: name: .agentFullName phone: .phoneNumber availability: selector: tr.rentalGridRow # set multi to true to get an array of value multi: true find: maxrent: # you can also extract the data attribute of an element data: maxrent beds: data: beds
Supports React 16+
reader: type: react # JQuery selector for the react component selector: ".details-page-container #ds-container .ds-home-details-chip" # Optionally, the prop to read from the component rootProp: property # optional amount of time in milliseconds to wait for the component # to become available on the page waitMillis: # optional number of HTML elements to traverse upward to try to find a # the element with a corresponding React component traverseUp:
The Ember.js reader reads the state of an Ember.js component
reader: type: emberjs # The JQuery selector for the ember component selector: ".pv-contact-info"
A reader for a foundation (
kind=extensionPoint) can be a reader id, or a combination of readers. Providing a mapping of readers will assign the output of a reader to a property:
reader: apartment: apartments.com/property-reader document: "@pixiebrix/document-context"
Providing an array of readers will use all the readers, with readers appearing later overriding properties read by the earlier readers.
reader: - apartments.com/property-reader - "@pixiebrix/document-context"
The styles can also be combined, with an array of readers being assigned to a property and vice versa:
reader: - apartments.com/property-reader - document: "@pixiebrix/document-context"