| News Aggregator, 2011
Notepad++, Perl

Hacking the mainstream online media

Setting the agenda is now a common phrase in discussions of politics and public opinion. This phrase summarizes the continuing dialogue and debate in every community, from local neighbourhoods to the international arena, over what should be at the centre of public attention and action. In most of these dialogues the mass media have a significant and sometimes controversial role. (Maxwell McCombs, 2004)

'AgendaBuilding' is a news aggregator specialized on the German online news services provided by the publishing houses and TV stations. Instead of analyzing user preferences on the basis of blogs, forums and social networks (like Frank Westphal is doing with rivva), the service examines the pure thematic placement and development of topics in the mainstream media. The intention is to allow an objective look at the news factors and relevance criteria as well as to counteract the perceived flood of information by demonstrating a (more or less) consonant reporting.

All data is supplied automatically by a Type Two Requester, also called a spider. This program requests the front pages of several websites, then requests the top story URLs that appeared there. Subsequently, parts of the HTML code are isolated by means of regular expressions. The relevant components date, headline, abstract and content, to be precise are written into a database so they are available for various analyses later. In the future, a graph is going to illustrate the rise and fall of different topics.

The project is not yet completed, because I'm still experimenting with different clustering algorithms.