When I was a deaf kid growing up in the 1990s, I had two recurring fantasies. One was that more hearing people would learn American Sign Language. The other was that, one day, the whole world would be captioned, just like TV shows and movies. I imagined pulling on sleek sci-fi glasses, and voilĂ : The tangle of spoken words around me would unravel into beautiful, legible, written English.
The second of my childhood reveries came back to me recently when I sat down in a quiet on-campus atrium at Harvard with Alex and Marilyn Westner, the co-founders of the Boston-area start-up Xander, who had invited me to chat over coffee after seeing me quoted in a newspaper article about their company’s augmented-reality live-captioning glasses. They slid a bulky prototype across the table, and I put the glasses on my face. Immediately, written words scrolled across a translucent digital box above my right eye.
“How does that feel?” I saw the captioned words right after Alex uttered them. Because I have always watched videos with closed captions on, my initial thought was that he’d stepped out of a TV screen to talk to me.
Wow, I thought, feeling our conversation shifting away from lipreading—which, as I’ve explained elsewhere, isn’t really “reading”—and toward something closer to actual reading.
Although this was my first time trying captioned glasses—a still-nascent form of augmented-reality technology that companies such as XRAI Glass and Google are also competing to develop—I’ve been watching for years now as the possibilities of a live-captioned world have been advancing. Look around and you’ll find automated captions everywhere—on YouTube and Instagram Reels, on Google Meet and Zoom and FaceTime. Like other AI-generated tools, these captions are not perfect, and they aren’t an accessibility silver bullet, but for some uses they’ve gotten surprisingly good. In my discussion with the Xander founders, we mainly stuck to topics about how the glasses worked—a tightly-focused conversation is typically easier to follow—but live captions did ease the guesswork of chatting with my two hearing coffee companions.
Anyone who has turned on automated captions over the past decade knows that accuracy isn’t always their strong suit. I’ve hopped on Zoom lectures and seen opaque walls of text without punctuation and technical vocabulary butchered beyond recognition. I’ve gone to church without an interpreter, where I fixed my eyes on a live-captioning app that plunged me into non sequiturs about the “Cyanide Desert” (no wonder those Israelites were so unhappy), or about Abraham using his “phone” (instead of his son?) as a sacrifice to the “Clearview Lord” (whoever that might be). After those sermons ended, my head throbbed. I couldn’t help but think of all the people scattered after the fall of Babel, scrambled into all their varying languages. Like those ancients, we must remember that technological innovation, by itself, cannot transport us to the heavens. We must still choose when and how to use it.
For a while, like Rikki Poynter and many other deaf advocates, I associated auto-captions with #craptions—that is, captions so bad that they were less likely to tell a comprehensible story than to make the user unleash streams of profanity. (And with good reason: Sometimes nonobscene dialogue appears on-screen as starred-out curse words.) I’d always been able to request professional human-generated Communication Access Realtime Translation services for school and work events, and I cringed every time a naive hearing companion mentioned auto-generated captions. That was a sign that they didn’t understand how low the quality of those captions were.
When I started graduate school in 2015, I saw an academic administrator rightly apologize in front of a large assembly after she’d played a Harry Potter video clip for us during orientation. She’d forgotten to check whether the dialogue was accessible to everyone in the audience, and she might have assumed that the YouTube auto-captions would be just as good as the captions that accompanied the original video.
They weren’t. Harry and Ron and Hermione soon fell into such streams of cursing and nonsense that one would have thought they’d been bewitched.
While I sank in my seat, the hearing students burst into collective laughter at the bungled captions. To her credit, the administrator promptly stopped the video. She expressed regret to me and my ASL interpreter in the front row. Then she reprimanded the others: “How would you like to have this for your access?”
The room fell silent. The administrator had identified a fundamental lack of communicative equity. At least it’s better than nothing—this was often what hearing people told me about auto-captions, but what was I supposed to do, settle for scraps? I, too, found some of the errors funny, but I mostly thought of them as garbage.
By the beginning of the pandemic, though, my relationship with auto-captioning had begun to shift. Stuck at home and dealing with physical isolation and the masks that made lipreading impossible, I sighed when some hearing friends suggested that I try speech-transcription apps and auto-captioned video calls. I remember logging tentatively into Google Meet for the first time, unsure if I would see something like my old dream of beautiful written captions or their mangled cousin.
Two of my hearing friends, who sign a little but not much, entered the video chat. One said, “Hey, Rachel, it’s so good to see you.”
The caption read, “Hey, Rachel, it’s so good to see you.”
Whoa.
We continued, relieved to see one another’s faces again. The captions still had some errors, but they largely kept up. I sensed that the game had just changed.
During the pandemic, I videochatted blissfully with deaf and signing friends—captions were unnecessary—but I also felt freer to join spontaneous chats with non-signing hearing people. Auto-captions became an unexpected lifeline. I used them for informal work and social calls, and I saw them appear with greater accuracy across more online content. At the same time, more hearing people around me started regularly using captions for watching movies, TV shows, and videos. This captioned life was suddenly everywhere.
Deaf and disabled people have always been supreme life hackers, and I have learned to embrace auto-captions as an everyday communication-hacking tool. I love them for smaller discussions, where my online companions and I revel in the mutual act of shaping meaning. We stop for clarification. We gesture or type to one another in the chat box. The speech-transcription technology still struggles with specialized vocabulary and certain voices, including my own deaf voice—but, at their best, the captions can transform piecemeal exchanges into lively, coherent, easily legible paragraphs.
High-quality auto-captioning, as wondrous as it can be, does not automatically create access. Not all deaf people prefer to encounter conversations through captions, for one thing. Communicating through ASL, for many of us, is still easier and allows for far greater expressive fluency. And take the auto-captions out into the wide and noisy world, into larger professional events or lectures or multiperson interactions, and they can quickly turn precarious. We’ll turn on the live captions for you! hearing people say. But people who don’t rely on those captions for comprehension might not realize how often they still leave some of us stranded in the Cyanide Desert. Interpretation by human professionals is by no means obsolete.
So when I went to test the Xander glasses, I had my doubts about how well they would work. I also wondered how I might opt to use such a device in my own multilayered communicative life. Research by Xander, Google, and other companies invites us to consider how “accessibility” tech often enters and shapes the mainstream: More widespread use of captions and auxiliary text could benefit not just hard-of-hearing and late-deafened people, but also anyone else who savors the multisensory pleasures of seeing (rather than just listening to) spoken dialogue.
My first conversation with captioned glasses did feel like something out of the movies. I kept shaking my head in wonder at the captions floating in the air before me. “This is so cool,” I kept saying. Other deaf and hard-of-hearing users have expressed similar enthusiasm, noting that reading captioned conversations felt more intuitive and enjoyable than fighting to lipread or straining to hear sounds garbled by hearing aids.
Yet using captioned glasses involved its own active considerations. Every time I nodded, the captions jumped around. My vision got a bit blurry. I held my head absurdly still, trying to adjust my retinas to take in the captions and my companions at the same time. The Xander founders asked me about how clear and useful the captions were, where they were appearing on the lenses, and how large they were. I felt very aware of how much practice I still needed, of how the captioned life awaiting us may never be as straightforward as toggling something on or off with a device.
Furthermore, our immediate environment was more conducive to using the captioned glasses than the typical coffee shop or classroom would be. We had chosen a quiet spot with little background noise and few distractions. Perhaps one day improved language-processing software will be able to cut through overlapping chatter. Or perhaps, just like in my other principal childhood fantasy, more people will learn ASL and we won’t have to—but in the meantime I noted how our conversational setting affected the ways we communicated. Because it always does. I’d mentally toggled myself into English-speaking mode for the afternoon, and I also knew that using these glasses depended on my ability and willingness to do such a thing. I enjoyed talking with the Xander co-founders about speech, ASL, sound engineering, and the joys and complications of language, but I also felt grateful later that weekend to plunge into signing gatherings with deaf friends, sans glasses and caption-reading and text-scrolling. Both types of conversations felt meaningful, but for different reasons.
Our sleek sci-fi present offers no panaceas, even though technological advances such as automated captions bear immense promise for bridging our physiological differences. To use these forms of technology well, we must also consider what communicative equity can look like in different circumstances. I still dream of beautiful written captions, but I also believe they can be part of something much bigger: a social world more attuned to the deeply human need to be part of the conversation, and more cognizant of the variety of ways in which each of us can uncover linguistic meaning.