Web Scraper

Updated 1 month ago by Karan

Introduction

In Byteline, you can easily use its Web Scraper to extract content and data from a website. In this document, you will understand how to extract data from any website with the help of using its underlying HTML to specify the elements to extract. We will use the Byteline Web Scraper Chrome extension for configuring the data to be scraped.

Web scraper can extract text, links, and rich text from the website.

For this document, we are assuming a flow is initiated with a simple HTTP trigger node. But you can use a web scraper node with any trigger. For more detail, you can check How to create your first flow design.

Follow the steps below to extract text from the website. 

Video Tutorial

These instructions are also available in a Youtube Video.

Configure

Step 1: Select the Web Scraper node from the select node window. 

Step 2: Click on the Edit button to open the Web Scraper node configuration window. 

Step 3: Launch the website you want to scrape in a separate tab of your browser to copy its URL. For this documentation, we are scraping coinbase.com prices. 

Step 4: Enter the Website URL you want to scrape in the Web Scraper URL box.  

Step 5: Download and install the chrome extension of Byteline in your browser. 

Click on the puzzle piece-shaped extension button on the top right corner of the interface. 

After that, click on the Pin button as shown below to pin the extension to your browser.  

Step 6: Click on the toggle button to enable the Byteline extension. Once enabled, you will see a blue border on the page.  

Step 7: Double click on the element you want to scrape and click on the COPY Text. We have selected the cryptocurrency entry - ‘Bitcoin’ to scrape. 

Step 8: Switch to the Byteline console and paste the copied value of the element in the XPath text box.  

Step 9: Click on the Add Array Field button. This one uses the array option, as we are scraping the coins table, which has data for multiple coins. If scraping a single field, "Add Scalar Field" button should be used.

Step 10: You will see new array fields. Assign a name to the array. Any name without spaces is allowed.

Step 11: Enter the field name. Any name without spaces is allowed.

Step 12: Now, switch back to the coinbase.com tab to copy another element you want to scrape. We have selected the cryptocurrency symbol - ‘BTC’ to scrape. 

Step 13: Paste the copied element in the XPath text box and repeat the instructions outlined under Step 8 to Step 12. 

Step 14: Repeat the instructions outlined under Step 7 to Step 12 to scrape the prices column. 

As we are adding an array field, we don’t need the scalar field. Before saving the data, delete the scalar field by clicking on the bin button, as shown below.

Step 15: Click on the Save button.  

Deploy

Step 16: After the configuration of the flow, you need to deploy it by clicking on the Deploy button on the top right corner of the interface. 

Run

Step 17: Now, run the created flow by clicking on the Play button on the top right corner of the interface.

Step 18: Click on the i (more information) button on the top-right corner of the Web Scraper node to check the data content extracted. 

You will see an output window as illustrated below: 

Your Web Scraper node has been configured successfully. Feel free to connect us for any doubt. Develop fast!


How did we do?