Friday, 6 September 2019

Using The Cloud To Explore The Linguistic Patterns Of Half A Trillion Words Of News Homepage Hyperlinks

Getty Images

When it comes to using “big data” to analyze the patterns of society, the conversation inevitably turns to social media. Yet the world’s news media actually offers larger and richer insights into global events, narratives, beliefs and emotions if one has the right tools to explore it. What would it look like to convert a year and a half of homepage hyperlinks totaling more than half a trillion words from worldwide news front pages in 110 languages into unigram and bigram ngram datasets with just three SQL queries, an open source language detector, one script and the power of the cloud and Google’s BigQuery platform? What new insights could we learn about what the world has been paying attention to, the linguistic patterns of the world’s journalists and the power of the cloud to analyze language? Read more

