Web Scraper

Updated 1 week ago by Karan

Introduction

You can easily use the Web Scraper node from Byteline to extract content and data from a website. In this documentation, you will understand how to extract data from any website using its underlying HTML to specify the elements you want to extract. We will use the Byteline Web Scraper Chrome Extension for configuring the data to be scraped.

Web scraper can extract elements like text, links, rich text, and images from the website.

For this documentation, we are assuming a flow is initiated with a simple scheduler node, but it is to be noted that you can use a web scraper node with any trigger. For more details, you can check How to Create your First Flow Design.

Follow the steps outlined below to extract data from any website. 

Video Tutorial

These instructions are also available in a Youtube Video.

Configure

Step 1: Select the Web Scraper node from the Select Node window. 

Step 2: Click on the Edit button to open the Web Scraper node configuration window. 

Step 3: Launch the website you want to scrape in a separate tab of your browser to copy its URL. For this documentation, we are scraping prices of cryptocurrencies from Coinbase.  

Step 4: Enter the Website URL you want to scrape in the Web Scraper URL field in the Byteline console.  

Step 5: Download and install the chrome extension of Byteline in your browser. 

Download the Byteline Web Scraper Chrome Extension from here.

Click on the puzzle piece-shaped extension button on the top right corner of the interface. 

After that, click on the Pin button as shown below to pin the extension to your browser.  

Step 6: Click on the toggle button to enable the Byteline extension. Once enabled, you will see a blue border on the page.  

Step 7: Double click on the element you want to scrape and select the option according to your selection. We have selected the cryptocurrency entry - 'Bitcoin' to scrape. Hence, we are selecting the option 'Text.' 

Step 8: Select the option 'Single Element' if you want to copy a single element or 'Repeating Element' if you want to scrape repeating elements.

In this case, we are selecting 'Repeating Element' as we want to scrape a table.

Step 9: Switch to the Byteline console and click on the 'Paste from the Chrome Extension' button, and the console will automatically paste the copied value of the element in the XPath field.  

Step 10: Once you click the 'Paste from the Chrome Extension' button, it creates a value based on the field you have selected from the website you want to scrape.

Give a name to the array.

Step 11: Specify the name for the column you want to scrape.

Step 12: Now, switch back to the Coinbase tab to copy another element you want to scrape. We have selected the cryptocurrency ticker symbol - 'BTC' to scrape.

Step 13: Click on the 'Paste from the Chrome Extension' button and give a name to the field you are scraping.

Step 14: Repeat steps 12 and 13 to scrape the prices column of Coinbase.

Step 15: Click on the Save button.  

Deploy

After the configuration of the flow, you will need to deploy it by clicking on the Deploy button on the top right corner of the interface. 

Run

Run the created flow by clicking on the Test Run button on the top right corner of the interface.

Now, click on the 'i' (more information) button on the top-right corner of the Web Scraper node to check the data content extracted. 

You will see an output window as illustrated below: 

Your Web Scraper node has been configured successfully. If you have any doubts, feel free to connect with us. Develop fast!


How did we do?