Web Scraper

Updated 12 hours ago by Karan

Introduction

In Byteline you can easily integrate a Web Scraper to extract content and data from a website. You will understand below how to extract data from any website with the help of using its underlying HTML to specify the elements to extract.

Create

For this document, we are assuming a flow is initiated with a simple HTTP trigger node. But you can use a web scraper node with any trigger. For more detail, you can check How to create your first flow design.

Step -1: Select the Web Scraper node from the select node window.

Select node

Configure

Step -1: Click on the Edit button to open the Web Scraper node configuration window.

Edit button

You can scrape scalar and array data in two different ways with the help of a tool in the black background. In both ways, you need to have the HTML element's XPath from the web page being scraped.

Step -2: Enter the URL of the website to be scraped in the Scraper URL text box.

URL

Open the website with the URL in your Chrome browser.

Scalar Content

Scalar is a single value that you want to scrape from a website. A scalar may be a string, integer, boolean, date, etc.

Step -1: Select the Scalar Element you want to parse and right-click on it.

Scalar element

Step -2: Select the Inspect option to launch the developer's tool.

Confirm if the same element selected on the website is also getting highlighted in the developer's tool.

Inspect

Step -3: Right-click on the selected element and select the Copy option.

Copy

Step -4: Select the option copy Full XPath.

Copy Full X path

Step -5: Paste the Full XPath in the XPath text-box.

Paste FullX Path

Step -6: Click on Add Scalar Field button in the black background.

Add Scalar Field

Step -7: Enter the Field name in the text box.

Field Name

The field name must be without any space.

Array Content

Array content is a collection of homogeneous data items stored in a systematic arrangement, usually in rows and columns.

Step -1: Select the Array Element you want to parse and right-click on it.

Array

Step -2: Select the Inspect option to launch the developer's tool.

Inspect Array Element

Confirm if the same element selected on the website is also getting highlighted in the developer's tool.

Step -3: Right-click on the selected element and select the Copy option.

Copy

Step -4: Select the option copy Full XPath.

Copy FullXPath

Step -5: Enter the Full XPath in the XPath text-box.

Paste XPath

Step -6: Click on Add Array Field button in the black background. This will break the pasted XPath into multiple parts to extract each repeating element for the specified XPath. Iterable XPath is the repeating part of the XPath and will end with [*]. In addition to the iterable path, the XPath of the field will also be shown. This field will be extracted from the iterable path. If you add more array fields, the tool will automatically figure out if it's under the same iterable path. If yes, then it will add another field under the iterable path.

Add Array Field

Step -7: Enter the Array name in the text-box. This array name will be the JSON key name for the extracted data, so please make sure you don't use space in this name.

Array Name

In the Iterable XPath text-box, the element's parent XPath will be fetched automatically.

Step -8: Enter the selected element's Field name in the text-box. This is a JSON field name under the JSON field of the iterable path. So again don't use a space.

Array Field Name

The field name must without any space.

Step -9: Click on the Save button to save the node configuration.

Run

Step -1: Click on the Deploy button to deploy the flow created.

Deploy

Step -2: Click on the Run button to run the flow.

Run

Step -3: Click on the i (more information) button on the top-right corner of the Web Scrapper node to check the data content extracted.

Information Button

The output window will appear as shown below:

Output

Your Web Scraper node has been configured successfully. Feel free to connect us for any doubt. Develop fast!


How did we do?