Session 7

Dongchen Hou: The Brokenness in Mediation: Speech-to-Text Technology in Chinese

When technological media become omnipresent and function perfectly, their presence goes unnoticeable. The popularity of speech-to-text technology testifies the technological convenience to human life: it can smoothly transfer vocal information to a visual, textual mode of representation. Speech-to-text technology, though only emerged in the recent decade, follows a historical genealogy that contains other mediating technologies such as stenography and typewriter. Stenographers and typists took the mediating position in transcribing audial information into visual written texts, either in handwritten or typed forms. With the development of Artificial Intelligence (AI) and machine deep learning, Google, Amazon Alexa, OS Siri, and Microsoft have developed technologies that can achieve simultaneous transcription as did by stenographers and typists. It is undeniable that speech-to-text technologies have significantly changed human communicative modes with both humans and machines in general, however, the technology also poses challenges to users in different linguistic and social contexts. This research juxtaposes speech-to-text technology with other mediating technologies, including stenography and typewriter, to explore the ontological significance of broken media/mediation. Morador (2015) defines a technology that is “broken” as “those activities, directed towards the satisficing of human wants that are intended to produce changes in the material world that either do not manage to satisfy these wants or do not produce changes in the material world, or both.” (17). This definition, however, fails to depict and understand the non-human side of the picture. In this paper, I draw on Harman’s object-oriented ontology (2018), which provides another picture in the significance of non-human objects in discourse. I argue that mediated technologies gain presence and significance in the “brokenness.” In other words, only when technologies break do they appear. I look at how digital technologies, after its total replacement of the human mediator, such as stenographer and typists, function in mediating different modalities of communication. This research examines, to be specific, when speech-to-text technology breaks, and to whom it is broken. With a strong alphanumeric mechanism built in the technology in its invention, the brokenness is prominent and more visible in the context of non-alphabetic linguistic practices, such as the Chinese, or translingual practices. Incorporating Chinese writing into techno- linguistic modernity was never easy in history due to the Chinese language’s “incompatibility” with alphabets and alphanumeric technologies (Mullaney 2017; Tsu 2014; Hou, 2019). In the Chinese language context, speech-to-text technologies are often “broken”: either the technology cannot accurately identify the correct pronunciation of Chinese words, or wrong Chinese characters or expressions are chosen in texts, or the voice-recognition fails to capture non- standard, dialect Chinese. By so doing, this research aims to provide an alternative picture in understanding the significance of brokenness in media technology with a general ontological concern, exemplified in speech-to-text problems in the context of the Chinese language.


Valeria Lopez Torres: On glitches as indicators of authenticity in AI-generated images 

Ever since technology was first introduced into daily life, people have learned to co-exist with technological error. Thus, the cultural significance of these errors—also called ​glitches​ has shifted along with technology, as they have been manipulated for various purposes—seeking and embracing them to create art; introducing them intentionally to improve interaction; or working endlessly towards their eradication. While one definition of glitch is related to a “minor malfunction” (Merriam-Webster Dictionary), another definition relates to authenticity, as in “a false or spurious electronic signal” (idem). This paper examines the cultural role of the glitch as a marker of authenticity and as a device to aid viewers in discerning truth from fakery in a context in which sophisticated algorithms are able to generate highly-realistic (and convincing) images of human-looking faces through the use of Artificial Intelligence (AI). As these AI-generated images enter the current visual landscape, viewers must arm themselves against deception, thus looking for glitches, which usually manifest as irregularities in the background and with skin texture, interruption of patterns, and unexplained blobs, among other inconsistencies. For the computer scientist who strives to refine these technologies to generate more convincing and realistic images, glitches are undesirable. Conversely, for the lay viewer who is confronted for the first time by these hyper-realistic AI-generated images, glitches are desirable, as they allow for verification of their origin and authenticity (where authentic=human​whileu​ nauthentic=computer-generated​). In an image-driven society,​ where the boundaries between the virtual and non-virtual blur, and traditional notions of truth and reality are challenged, these technologies are reminders of the exciting possibilities of their positive applications, as well as the dangers of potential misuses, particularly in light of the emergence of ​deep fakes ​(videos in which AI algorithms are used to show someone saying or doing something that they did not in fact do or say). ​The ethical consequences of these images and their potential uses are largely related to surveillance, control, and ultimately, power. Thus, d esigners, scientists, artists and scholars alike must think seriously about h​ ow technological progress might enable higher degrees of control in detriment of human rights and freedom. Similarly, e​ducators must ponder how to respond to the many ethical issues brought about by these technologies, and teach emerging designers how to engage, view, and work critically with these images as they permeate our visual cultural landscape.

Zach Whalen: The 2020 Presidential Debate But Only The Parts Where Someone Breathes Loudly Or Sighs

“This is the hard hitting debate that the government doesn’t want you to see.”- X Caines (7 months ago)

“This makes me feel like I can’t breathe.”- echilcote423 (3 weeks ago (edited))

The quotes above are two comments representing the (currently) 2,293 comments of a video — a video I created in October 2020 that currently has around 250,000 views on YouTube ( This video lasts for one minute and thirty-five seconds, and other than a very brief title, it consists of exactly what the title describes: breathing from mostly Donald Trump and Joe Biden during one of their debates leading up to the 2020 presidential election. I created the supercut with a Python program called videogrep, and uploaded it to the YouTube channel that I use mostly for uploading lectures and tutorials for my classes. The video languished until some time in early April 2021 when — for reasons I don’t fully understand — Youtube began recommending it to certain viewers and the view count started ticking up rapidly. What does this video’s popularity reveal about algorithmically hegemonic taste silos and the current state of Youtube shitposting memes? What role does technological literacy play in viewers’ interest in the video? And, more importantly, how does this video refract or resonate with the figure of “breathing” in the midst of a global pandemic respiratory disease and while Black men are still being killed by police with their last words “I can’t breathe” galvanizing national and global protests? In this presentation, I will demonstrate the method I used to create the video and argue that it fits into a particular category of Youtube videos that emphasize arbitrary technological interventions (for example, “The bee movie” but only when they say “E” or “‘All Star’ by Smashmouth but every word is someBODY” ). Next, although the much-criticized recommendation engine in Youtube is a black box, the analytics provided for my video suggest some inferences that can be made about who that engine perceives as the audience for this video. Finally, I will use the demographic information included with those analytics to reflect on the meaning of this video in a cultural context. Ultimately, I feel ambivalent about the “success” of this video, but I believe exploring and reflecting on the implications of its popularity will provide an interesting insight into Youtube as a slice of contemporary digital culture.