Female Genital Mutilation and Search Engine Optimization: how are Google's data voids harming you?
Updated: Jan 28
In 2015, Dylann Roof walked into a church in South Carolina and opened fire, killing nine. All nine victims were black. Days after the shooting, a website registered to his name was found, and his online manifesto was uncovered: Roof was a white supremacist whose racially motivated crimes and extremist ideologies can largely be traced back to his interaction with data voids on Google.
‘Data voids’ refer to the lack of credible and relevant data associated with search queries online. The term, coined by danah boyd and Michael Golebiewski, denotes a gap in the information available about a given topic, person, idea, etc–like missing pieces of a puzzle. Usually, data voids result in search engines returning irrelevant, inaccurate, or outdated information on the term being searched.
For instance, boyd and Golebiewski give the example of the term “Harrold-Oklaunion”, a small town in Texas with a population of roughly 500 people. Aside from a few algorithmically-generated results on the weather, there is little information on this town. But sometimes, these data voids speak to more than just the lack of data available, and can have damaging ramifications – as is evidenced by the Roof example.
In his manifesto, Roof detailed how he Googled “black on white crimes” and found “pages upon pages of brutal black on white murders” that “awakened” him to the supposed injustice and aggression white people faced at the hands of black people (NPR). boyd and Golebiewski's research showed that this was actually the result of a targeted campaign by white supremacists to manipulate and fill a data void they had uncovered. In fact, there was so little information on the search term ‘black on white crimes’ precisely because there were so few documented instances of black on white crimes.
In this article, I draw on boyd and Golbiewski’s work to showcase troubling data voids in the Egyptian context. While they focus on how individuals can manipulate search engines to spread hate or misinformation, I explore how data voids can act as a mirror to injustices that are so tightly woven in our society. First, let’s backtrack a little and get you acquainted with the inner-workings of the algorithms that drive our search engines today.
History and present of search algorithms
When Google's founders developed the first search algorithm, they envisioned a near-objective content ranking system that would make search engines less susceptible to manipulation. Technology and information scholars at the time warned that rankings are inherently subjective, and that automated rankings would be no less susceptible to subjectivity than human ones.
Search engines like Google work by ‘crawling’ the internet for information, and creating a repository. Developers are faced with the feat of designing systems that will structure and organise that large corpus in intuitive ways, so that when we conduct a search, the results returned to us are as close as possible to what we’re looking for. Distinct terms that are related to one another - like ‘break’ and ‘vacation’ - are identified as belonging to the same “topological link structure…through [the use of] statistical probabilities and models derived from the data” (boyd and Golbiewski, 2019: 10). Through that same statistical reasoning, similar terms that are distinct in meaning - like ‘Metro' the Egyptian supermarket and ‘metro’ the underground - are identified as belonging to different topological link structures.
But the act of ascribing meaning and correlation to data is, in and of itself, a subjective act: a machine learning system is only as good as the individual(s) driving it. Without parameters - defined by the creators of these systems and the users producing content within them - search engines would be entirely unable to understand the meanings of words. The Search Engine Optimisation (SEO) industry, for example, undermines efforts to make access to information equal by exploiting the ranking system. Individuals and organisations are able to reverse engineer these systems so that their content is more likely to show up first.
A key feature of most search algorithms today is their ability to predict what you’re searching for, and to make suggestions for related queries. They employ the same topological linking and statistical reasoning to make these predictions: how have other users completed that query, which content did they then interact with? For these reasons, strange or problematic auto-suggestions can help indicate the presence of a data void. Because search engines operate on probability, they won’t get it right 100% of the time. For example, what might have started as a simple search for the metro’s timetable might end with you placing an order from Metro the supermarket.
This is a benign example. As you’ll come to see though, my experience with the autosuggest feature was far from benign. Google has been criticised time and time again for its predicted search terms, which sometimes return racist, sexist, and xenophobic suggestions. The company has responded (mostly) swiftly by integrating a “report inappropriate predictions'' function, fine-tuning the algorithm’s re-iterative ability to identify ‘problematic’ content, and calling in human judgment in instances where the lines may be blurry. But, with over 4.8 billion webpages on the internet for search engines to crawl and index, troubling content is likely to fall through the cracks.
Data void at play
When I first learnt about data voids in 2018, I was curious to see if I would be able to find any myself. I decided to experiment with Google in Arabic. Most of the problematic English search suggestions that Google had come under fire for in the past had been removed by the company; but what about their Arabic counterparts? I began my search with “…ليه الست”, which translates to “why is a woman…”. One of the top autosuggestions at the time read, “ليه الست المصرية نكدية” which roughly translates to “why are Egyptian women such nags?”.
The corresponding results to this query were, as one might expect, filled with ‘funny ha-ha’ content surrounding the stereotype that Egyptian women are nags that love drama. What surprised me was that Google hadn’t preemptively prevented this type of misogynistic content from popping up as an autosuggestion the way it had for English searches.
Flash-forward to 2022, I wondered whether the company had addressed the issue. I began my search in the same way I had in 2018: "…ليه الست”. While “ليه الست المصرية نكدية” was no longer one of the suggestions (yay!), the top suggestion was instead “ليه الست بتتخن بعد الجواز”, which translates to “why do women get fat after marriage” (boo). I continued, interested to see what other types of misogynistic content might be recommended to me:
No longer a joke
This time, I searched “…ليه البنات” (why are girls…). The results were shocking. The top recommended search was “ليه البنات بتطاهر”, which translates to “why are girls circumcised”. Unlike the previous two, this suggestion isn’t one that you can argue could be taken with a grain of salt: it’s a real question, with very real and devastating effects. Female genital mutilation (FGM) continues to be a major challenge in Egypt, with reports estimating that 7 in 10 Egyptian women between the ages of 15-49 have undergone the practice.
The first result shown to me was a featured snippet that outlined the benefits of circumcision. The article was published in 2009; by no means a “relevant, up-to-date, or credible” piece of content.
Interestingly, we can see a (poor) attempt by Google at employing a topological link structure to synthesise search terms. In this instance, when I searched “ليه البنات بتطاهر”, the featured snippet I mentioned was titled “أسباب تختين البنات في مصر”, translating to “the reasons behind circumcising girls in Egypt”. But the lexical difference here is extremely important.
Academics and activists alike have been pushing for a change in the terminology from ‘تطهير’ (implying that this act has something to do with cleanliness) and ‘تختين’ (implying that female circumcision is similar to male circumcision) to ‘تشويه الأعضاء التناسلية للأنثى’ - FGM. Rather than dress up this dangerous and unfounded practice as something that is somehow positive, the use of FGM calls it for exactly what it is: mutilation.
So it’s both interesting and disappointing to see that Google’s search engine synthesised the two terms tat’heer and takhteen in this context, but failed to create a link to FGM.
This is a clear example of a dangerous data void at work - both on a structural level for the search technology, and on a societal level where damaging social perceptions and biases are being reproduced online. Depending on the motivations of the individual conducting the search, the ramifications can be devastating: a parent on the fence about circumcising their daughter might come across the featured article, and decide to circumcise her after reading the (entirely unfounded) claims the article makes. The demographics of users searching this topic reveal that the majority come from governorates where reports indicate that FGM is still very widely practiced: Dakahlia, Al Qaliyubia, Al Sharqia, and Menofia.
In much the same way that Dylann Roof’s experience with data voids led to him committing mass murder, it isn’t farfetched to imagine that an unsure parent, predisposed to following cultural norms, might be swayed in the wrong direction by the information they come across on Google.
This specific data void is the result of a lack of quality information surrounding the topic. But the return and highlight of these results is also the reflection of a clear failure on Google’s end: the company’s inability to proactively employ recursive, multilingual algorithms that are able to accurately structure etymological databases in a language other than English. It’s interesting to see the strides made in Google search functions in English, and how they still seem to lag behind in different languages (so much for “decolonising AI”). This is due, in part, to the data voids we speak of. Of all the online content being crawled by search engines like Google, only 1.3% is in Arabic - the 5th most spoken language in the world.
Search engines play vitally important roles in our everyday lives: they are the gateway to a seemingly infinite world of information about ourselves, our societies, and the world around us. The power a search engine like Google - and technology at large - has in reflecting and reinforcing problematic stereotypes and perpetuating racial, social, political (the whole -al spectrum really) biases cannot be understated, and should not be overlooked. Google has set out on a bold and ambitious mission to “organise the world’s information and make it universally accessible and useful.” The example I outlined serves as a stark reminder of the importance of critically interrogating our practices online, the type of content we produce and interact with, and the accountability we expect from the tech and big data companies that claim to control our access to “the world’s information”.
Michael Golebiewski and danah boyd, “Data voids: where missing data can easily be exploited,” (Data & Society, 2019).
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” (Stanford: Stanford InfoLab Technical Report, 1999). http://ilpubs.stanford.edu:8090/422/.
Lucas D. Introna and Helen Nissenbaum, “Shaping the Web: Why the Politics of Search Engines Matters,” (The Information Society, 16 no. 3; 2000).