Customer Feedback Collection Platform Development: Of API And Data Harvesting Tools

Apiko
5 min readNov 28, 2017
Morpheus offers Neo to make up his mind (from “Matrix”, the movie)

Getting online reviews for your business is not a piece of cake task to do. Listening to what your customers say has always been the best way to enhance the project you’re working on. That’s the importance of collecting online reviews.

Almost all world-known services develop integrations with other services with the help of API. But API is not always enough. Then the technology called data scraping/harvesting is applied.

In this article, we consider a real example of when and why to choose API or data harvesting for different types of integrations. We’ll also show why it’s better to look for API and its documentation when thinking of how to manage online reputation the better way.

Tough choice

Note that using data harvesting is quite a gentle task to handle. You might face some legal issues. We’ve mentioned some of the troubles in our previous article on the importance of online reputation management and getting online reviews for your business: Collecting Customer Reviews Data From The Web As A Feature For Your Marketplace Project.

Our team has helped to develop one customer feedback collection and review management platform that gathers reviews for the proper customer review management. For this task, we’ve applied both, API integrations and data harvesting tools about which we’re going to tell you in details below.

Before deciding what to utilize, you should define the difference between using a web data extraction tool and API integration.

The working scheme behind web data extraction tool is that you can collect specific/required information by selecting data elements you need to extract (specific data selectors, like date selectors) from the HTML webpage. This method is used when the service you need to integrate with doesn’t provide developers with public API or is incomplete or absent.

API, instead, provides public information, that you can find on the websites, needed to integrate a particular service’s functionality into your product. Our recommendation is if API contains all the data you need and is well-documented, prefer it over web data extraction tools. Later we’ll explain why. Note also that some organisations may grant the complete API access to the larger companies only.

In case you haven’t checked this resource out. Grab an ultimate API documentations collection: Devdocs.io

During the development of customer feedback collection platform there are several significant challenges you can come across:

  1. The first and foremost is ‘under-the-hood’ change of the service. In the result, web data extraction tool might stop working or return incorrect data. The reason may lie in page’s structure, resource’s URL, or any technical, even the smallest, alteration.
    For instance, when we were developing the integration with one service, its date selectors had changed. We’ve received review data but with the “undefined” status in the database, meaning that we’ve received no date data at all.
  2. In theory, we can write date info in the database manually in the string format. In case the selector changes, the wrong date info will be written in the database. As the result, we’ll receive reviews information with the incorrect posting date data.
  3. Web data extraction tools might simply stop working. One of the probable reasons is that a certain service ‘bans’ your IP if there are too many requests to its servers from you. In this case you have to think about how to overcome the issue. The possible solutions are the management of certain proxy servers and redirections to these proxy servers, etc.

But why do we insist on utilizing API in the end ?

Public API, supported by the official service you need to integrate with, represents the following advantages:

  • First of all, it’s supported by the service’s official devsteam. In case of any smallest change under the hood, you’re immediately informed about it.
  • The documentation may include best practices and specific details on how to conduct the integration properly.
  • It’s a nice accelerator for the workflow if there’s all the information and data regarding the integration process.

Data harvesting tools come in handy, though, when the API and its documentation is absent or lacks the details you need for a smooth integration. For instance, a certain number of reviews along with the textual content might go missing. In this case, we have no other choice than to use web data scraping tools.

To make customer reviews data collection possible, we’ve utilized Node-Horseman (a PhantomJS-based technology). The chosen solution supports Promises which is slightly more comfortable to use when comparing it to coding with Callbacks.

You might be interested in reading this amazing post by IBM’s developers on what’s the difference between Node.js promises and callbacks and the benefits behind using promises: Promises in Node.js — An Alternative to Callbacks.

As of today, the official support of Phantom JS has stopped. Currently we have Headless Chrome instead for devs to utilize.

To develop user interface we’ve used React Native. We needed the user interface part of the app to store the data so that the product could work in the offline mode. To do this, we’ve used Redux Persist npm package and embedded it into the app’s architecture.

Let’s take a look at the very bones of the customer feedback collection and review management app:

startScraper function is the “entry point” for the data harvesting tool. In the current function, you specify which URL to open. You can also authorize here if necessary. Here you point out the selectors to certain key buttons to get more specific details from the web page which contain all the info needed for the data harvesting. Moreover, you call a close function for the scraper after it has stopped working.

The next one is the scrape general function, which operates two other functions getReviews and saveReviews .

The last one is getReviews . The function works directly from the HTML page and based on the specified selectors. It selects the required data from the page and inserts it to the reviews’ array. Consequently, the array gets returned and is saved by means of saveReviews function (which is operated by scrape function, mentioned above).

Summary

We would like to round up facts and state that looking for API and its documentation should be the very first task. Only when the chances are short or API lacks with the required data, utilize web data extraction tools to get online reviews for your business.

Originally published at medium.com on November 28, 2017.

--

--

Apiko

Apiko is a software development company that enters markets with digital businesses, using a solid process and clever strategies.