newspaper | rss + article scraping

Date |
Feb 2019
Technologies |
python
·
flask
·
heroku
·
feedparser
·
rss
·

Open & free access to subscription-based regional newspaper through RSS + article scraping.

The diariovasco is a regional newspaper from Gipuzkoa, in Spain. The online version of the newspaper was open-access until 2017, but since then an over-priced subscription is necessary. Even though we usually read the national news through other mediums, we were missing lots of info about local matters. That is the reason why I set out to provide a open and free access to this regional newspaper.

Fortunately the articles are indexed through RSS every 30 minutes. The articles can be accessed, but the content is blocked by a pop-up, so I use a simple article scraper (Newspaper3k) for extrating the title, subtitle, main image and text. Finally, a simple Flask webpage was built to allocate the content and deployed on Heroku.