ER&L 2014 — Mining and Application of Diverse Cultural Perspectives in User-Generated Content

“Coal Mining in Brazil” by United Nations Photo

Speaker: Brent Hecht, Ph.D., University of Minnesota

His research is at the junction of human/computer interaction, geography, and big data. What you won’t see here is libraries.

Wikipedia has revolutionized computing in two ways: getting users to a large repository of knowledge and by being hugely popular with many people. In certain cases it has become the brains of modern computing.

However, it’s written by people who exist in a complex cultural context: region, gender, religion, etc. The cultural biases of Wikipedia have an impact on related computer processes. Librarians and educators are keenly aware of the caveats of Wikipedia such as accuracy and depth, but we also need to think about cultural bias.

Wikipedia exists in a large number of languages. Not much has been understood about the relationships between them until recently. Computer scientists assume that larger language editions are supersets of smaller ones and conceptually consistent across them. Social scientists know that each cultural communities defines things differently, and will cover unique sets of concepts. Several studies have looked into this.

A vast majority of concepts appear in only one language. Only a fraction of a percent are in all languages.

If you only read the English article about a concept that has an article in at least one other language edition, you are missing about 29% of the information that you could have gotten if you could read that other article.

Some of the differences are due to cultural factors. Each language edition will have a bias towards countries where the language is prominent.

What can we do if we take advantage of this? Omnipedia tries to break down the language silos to provide a diverse repository of world knowledge, and highlights the similarities and differences between the versions. The interface can be switched to display and search in any of the 25 languages covered.

Search engines are good for closed informational requests and navigational queries, but not so great for exploratory search. Atlasify tries to map concepts to regions. When the user clicks on entities in the map, it will display (in natural language) the relationship between the query and the location. They know this kind of mapping doesn’t work for every concept, but the idea of mapping search query concepts can be applied to other visualizations like the periodic tables or congressional seat assignments.

Bear in mind, though, that all of these tools are sensitive to the biases of their data sources. If they use only the English Wikipedia, they can miss important pieces, or worse, perpetuate the cultural biases.