Brand new news aggregators are popping up on the digital media landscape every day. These companies are trying to capitalize on humanity’s insatiable appetite for news and information.
And with so many potential customers to target, it’s expected that the number of news aggregators will only keep growing in the future. But how do news aggregators work exactly?
Below, we’ll explain just what news aggregators are, why they are so popular, and how the technology behind it works. Let’s have a closer look at these exciting sources of the latest scoops!
What is a News Aggregator?
In a nutshell, a news aggregator pulls together news items from loads of different websites for you, so you can have it all in one place.
In its simplest form, a news aggregator is just a continuous feed that shows news items – generally shortened to just the first one or two paragraphs as a snippet – copy-pasted from other websites.
The snippet will contain a link to the actual article, so if you’re interested, you can just click through to read the full story on the site of the source.
Most news aggregators further curate the news as well. This involves things like placing specific stories in the spotlight or collecting different news stories together to form a content stream around one particular piece of information.
How and Why People Use News Aggregators
Quite frankly, to save them time.
People want to know all the news that’s relevant and interesting to them, but they don’t want to have to search for it themselves all over the web. News aggregators fix that problem.
By telling aggregators like Pulse and Flipboard what interests you and what topics you’d like to read about, they know what relevant news stories to gather for you. As such, it offers a fully customized and personal news-feed experience that’s unique to you.
Moreover, most of these aggregators push information to users, which means they notify you as soon as they find a new news item for you. This means you never have to search for news yourself again, as it’s automatically delivered to you.
Partly due to this personalized experience, news content aggregators have now become so popular that they have started to outperform traditional news outlets.
How News Aggregators Work
To feed you all these stories, news aggregators first need to collect that data from somewhere. And that process begins with web crawling techniques.
Web Crawling and RSS feeds
It all starts with data gathering, which is done with the help of RSS feeds and web crawlers. Many news sites have an RSS feed, from which news content aggregators can gather (snippets of) news article information.
Alternatively, a news aggregator can use a web crawler (also called robot or spider) to crawl news websites to gather entire web pages. Such web crawlers can also browse through search engines (like Google or Bing) to gather information directly from there. Most news aggregators either build custom solutions or use third party APIs to acquire data from Google News and other sources in a timely fashion.
Data Extraction and Web Scraping
The gathered data is then extracted from the source by the web crawler. During this web scraping process (also called data harvesting), generally, only the parts of information that are useful to the news aggregator are scraped and extracted.
Typical pieces of information that are extracted are the news headline, the header image, the lead paragraph, the author of the article, and perhaps the first one or two paragraphs of the body content.
Information Clustering and Categorization
Once the news aggregator has crawled the web and scraped the data, it needs to determine how to present this data to the reader. It does this by first clustering and categorizing articles based on related topics or events described in the articles.
The aggregator then uses a common numerical statistic, like term frequency-inverse document frequency (TF*IDF), to interpret what is written in the news articles in order to categorize the articles accordingly.
At this stage, the news aggregator has found many related news articles for you. The next step is to summarize these articles to get the most representative information from the entire batch of articles.
This way, the news aggregator serves you only the most useful and “best” news information according to your preferences and news consumption needs.
Visualizing the Data
The final step is to visualize and present the acquired data in a way that is easily consumable by the reader. For example, this means showing only the general topics first, after which you then see related articles that are part of this topic cluster.
The Technology Behind News Aggregators
And that is the process behind news aggregators and how they work. The end goal of such aggregators is to continuously feed you with information, but only with information you actually find relevant and interesting to consume. By employing web crawling, web scraping, and data curation techniques they.