Rumors About Data <Gossip> Demistifying Data, Essay, 2021
Algorithms have a nasty habit of spreading rumours. With their capacity to save copious amounts of deleterious information about people, their ability to reinforce stereotypes, perpetuate bias, and label targeted individuals as dangerous has determined the dominant narrative when considering machine learning technologies and artificial intelligence (AI). Google autosuggestions enforce sexist and racist ideas of what “beauty” and “professionalism” look like1; Google Reverse Image Search labels female-identifying persons as “racy”2; intelligent control systems at international airports survey travellers to detect “mal-intent” through biometric cues marking them as suspicious3. And while we know intelligence comes from a small, homogeneous group of people who embed bias in their work, the confidence in machine learning technology goes unchallenged.
How are the algorithms misled and who can better guide them? Someone has to define the language of technology, as the data that trains machine learning (which algorithms are shaped from) comes from human output.
In an article titled “White Hunt” (2017), writer Hannah Black revisited Silvia Federici’s Caliban and the Witch (1998) and acknowledged that despite its contemporary, trivial reputation, gossip has always been a secret language of friendship and resistance, particularly between women. The figure of the gossip brings news, warnings, and shared knowledge, having the capacity to move between public and private worlds.4 The gossip can afford complexity and context, passing information that reflects relational truths while controlling its dissemination.
There is a myth of objective truth in the realm of artificial intelligence: the data that trains AI is often either incomplete, compromised, or lacks understanding for the complexity inherent in the subjects it is hoping to recognize. How can the gossip be a productive contributor when confronting the myth of objectivity and AI technologies? If gossip reveal the conditions of data, perhaps we can deviate from relying on data as a proxy for objective truth.
During London Science Fiction Research Community’s 2020 Beyond Borders Keynote Lecture, user experience designer Florence Okoye unpacks the layers of mythology embedded within the infrastructure of information—exposing traditions of erasure in narratives of worldbuilding, particularly during eras of so-called great technological progress. Taking the sugar plantation map as an artifact of modernist colonial science, Okoye reveals that the details of human labour and death are not visible from the stories of mechanical innovation. Instead, mechanical wizardry was written into history books, while the existence of Black and Indigenous slaves was either removed or recodified. These foundations of colonial science endorse a knowable habit of deleting bodies which is reflected in today’s machine learning technologies, specifically as they relate to surveillance and facial recognition.5 This vanishing effect breaks down the belief system in the power of data, where the suspicion of its accuracy grows.
This uncertainty towards data’s accuracy is considered in Mimi Onuoha’s durational series The Library of Missing Datasets (2016-2018), which draws attention to the empty spaces within areas where large amounts of data is being collected. Onuoha’s website features an incomplete list of missing datasets, including titles like:
- Sales and prices in the art world (and relationships between artists and gallerists)
- People excluded from public housing because of criminal records
- Trans people killed or injured in instances of hate crime (note: existing records are notably unreliable or incomplete)
- Poverty and employment statistics that include people who are behind bars
- Muslim mosques/communities surveilled by the FBI/CIA6
Through her research, Onuoha identifies reasons as to why datasets like these don’t exist, which are common throughout the practice of data collection. A major contributor is that data in the categories described above repels easy quantification, and we prioritize collecting data that fits our models of collection. Emotions, for example, are difficult to quantify as data, but nevertheless are influential variables that contribute to the formation of algorithms. Through these investigations, the word “missing” is loaded and multi-dimensional, suggesting both a lack and desire. Onuoha’s work confirms that when it comes to algorithms, there will always be this tension between the apparent precision of machine learning technologies and a vast, incalculable cavity that’s obvious in available datasets.7
Themes surrounding missing data and quantification are considered through cyberfeminist artist and scholar Tiara Roxanne’s data visualization work. Roxanne introduces the term data colonization, which is a concept that combines the extractive practices of historic colonization with the abstract quantification methods of machine learning. Through an ongoing project titled They are. We are. I am (2020), she traces the patterns of violence associated with western categorization, revealing a history of Indigenous peoples being apathetically categorized as a demographic placeholder or being provided a colonizer’s name—creating generations of underrepresented and newly coded bodies. In reviewing contemporary census information collected to generate StatsCanada’s Aboriginal People Survey datasets, the categories offered to reflect First Nations people living off-reserve, Métis, and Inuit living in Canada do not mirror their customs, identities, or vernacular.8
In the exhibition Lurking Variable, artist Bahareh Khoshooee evaluated the inaccuracy inherent in AVATAR technologies used in international airports and their capacities for recognizing suspicious persons and being authorities for objective truths. A conceptual strategy for the exhibition was imagining how we can train artificial intelligence to experience empathy—particularly during situations relating to facial recognition and predictive policing. Khoshooee explains that in order to accomplish this, AI needs to understand outside context and relational information surrounding the data it's absorbing—like trauma or lived experiences. This kind of proximate data is considered a lurking variable, which can affect the interpretation of a relationship between two variables but is not considered during the analysis overall. The lurking variable can falsely identify or hide a relationship between two variables, but when considered can provide context for the correlations between forms of data.9
The desire for context and creating relational information is a role for gossip. What is determined is that the acknowledgement of difference has never been a priority in data collection methodologies. As the collector of what is displaced, the gossip knows the state of the terms and the collective experience.
There are no secret ways of telling stories, but valid ways of exchanging information. With a commitment to social justice and making data relational, I propose community-led verification of knowledge to support a peer-to-peer decentralization for data governance. These systems can generate networks of trust to build structures of verifiability that we can all agree to, but are conversational, malleable, and reflective of multiplicities. I don’t know how to do this... but am trying to imagine possibilities.
How can we conjure functional opacity to demystify data?
Can we decode the hidden?
How can we generate whisper networks?
1 Noble, Safiya Umoja. Algorithms of Oppression: How Search Engines Reinforce Racism. New York, New York University Press, 2018.
2 Granata, Yvette. “Yvette Granata: Data-Slime & Xenovision.” Foreign Objekt Presentation Series: The New Nasty, August 20, 2020, https://www.youtube.com/watch?v=Tqbdtijzpfg
3 Khoshooee, Bahareh. “Lurking Variable.” Baxter St at the Camera Club, November 11 - December 16, 2020, www.baxterst.org/events/lurking-variable
4 Black, Hannah. “Witch-hunt: Gossip has always been a secret language of friendship and resistance between women.” TANK Magazine, issue 70, spring 2017.
5 Okoye, Florence. “Keynote Lecture 2.” Beyond Borders, September 2020, London Science Fiction Research Community, www.youtube.com/watch?v=6Ou0VhIeUmc
6 Onuoha, Mimi. “On Missing Data Sets: An Incomplete List of Missing Data Sets.” MimiOnuoha GitHub, github.com/MimiOnuoha/missing-datasets
7 Onuoha, Mimi. “On Missing Data Sets: Overview.” MimiOnuoha GitHub, github.com/MimiOnuoha/missing-datasets
8 Roxanne, Tiara. “They Are. We Are. I Am.” Blackwood Gallery, 2021, www.blackwoodgallery.ca/publications/sduk/mediating/they-are-we-are-i-am.
9 Khoshooee, Bahareh, Khan, Nora. N. “In Conversation: Bahareh Khoshooee and Nora N. Khan”, Baxter St at the Camera Club. December 12, 2020. www.baxterst.org/in-conversation-bahareh-khoshooee-and-nora-n-khan
Emily Fitzpatrick is a curator and arts administrator who holds a master’s degree in Curatorial Studies from the John H. Daniels Faculty of Architecture, Landscape and Design at the University of Toronto. Recent curatorial work involves temporary public art projects rooted in social practice and feminist perspectives on digital sustainability and survival. She has extensive experience working within Toronto’s artist-run centres and public institutions including Images Festival, the Art Museum at the University of Toronto, Gendai Gallery, and Art Metropole. She has contributed to such publications as Canadian Art, C Magazine, Peripheral Review, and CAROUSAL Magazine. Emily is currently the Artistic Director at Trinity Square Video and a member of the curatorial collective Aisle 4.