We all believed VR and AR experiences would be the next big thing in social, but it turns out that good old-fashioned audio is where the real buzz is!
Audio chat rooms have exploded in popularity over the last year, and it is quickly becoming the hottest new way to share and consume content. The pandemic has fundamentally altered our lives, so it’s understandable why audio platforms are so popular – people are tired of visual stimulation, endless Zoom calls, watching Netflix, reading, and responding to emails. Audio is easy to consume, places less burden on the listener to engage fully at all times, which means you can keep it in the background while you’re doing something else.
Live audio chat rooms such as those on Clubhouse and Discord are especially popular since they allow users to hear the live, unfiltered thoughts of celebrities and entrepreneurs, talk to friends or strangers, or just sit back and listen. It feels like a combination of a live podcast and a group phone call, and you can make it feel as personal or as distant as you like. The discussions start spontaneously but disappear as soon as they are over. In a way, this experience evokes the fluidity and impermanence of an IRL conversation, which makes it compelling and captivating.
Furthermore, major platforms are making inroads and developing their own audio products: Twitter launched Spaces, Spotify acquired Locker Room, and now Facebook and LinkedIn are working on their own version of it. All of these platforms are laser focused on attracting creators by providing avenues for monetisation and incredible content development toolkits. This indicates the appeal is here to stay and is influencing product development across the tech ecosystem.
There are incredible examples of creativity sparked by this newly rediscovered medium, important discussions about society, remote work, parenting, climate change, mental health, and more. We could hear Elon Musk providing commentary1 on whether humans will be able to live on Mars and asking the CEO of a major share trading company why their app prevented users from buying Gamestop stock. Facebook’s CEO Mark Zuckerberg was interviewed by Casey Newton on Discord2 where he announced major audio product launches in the next 3-6 months.
As more users flock to these communities, there will inevitably be bad actors creating and sharing content that violates the community’s guidelines.
Some of social media’s problems have already reached audio platforms – such as antisemitism3, bullying, misogyny, disinformation, and, in some cases, coordinated harmful behavior that encourages real-world violence.
In light of how tough it has been to get content moderation “right”, it’s not surprising people are concerned about the challenges that live audio presents from a moderation and community safety perspective. Balancing issues of free speech, safety, and censorship is no easy task, but it is crucial in ensuring the long-term success of these platforms.
Interpreting context and intent in an audio environment will be challenging for AI technologies, which either match findings to already known harmful content or predict whether content will be in violation based on learned patterns.
Audio can be a significant source of misinformation and coordinated harmful behavior, as was evident during the Capitol Riot of January 6th, 2021. However, automated moderation solutions have traditionally focussed on text and visual content. We’ve also seen podcasts used to spread misinformation and thinly veiled incitement to violence. Some bad actors have become so expert at walking the line to avoid sanctioning by platforms that the phenomenon even has a name: “lawful but awful.”
And how do you even start moderating live, ephemeral conversations at scale?
Usually, groups that engage in co-ordinated harmful behavior first congregate in more obscure platforms such as 8Kun or 4chan. They create meme content, establish “battle plans,” and organize themselves around the goal of sowing as much disruption and harm as possible. Once topics or targets are selected, the content is shared and amplified on major social media platforms and, more recently, on audio-first platforms. These tried and tested tactics are designed to spread misinformation, attack individuals or groups online, and inspire real-world violence in the most egregious cases.
Moderating these communities requires a different approach. There will be no recorded content to remove in most situations, and it will be difficult to document violations.
Providing robust moderation tools and training for both creators and community moderators will be essential in keeping discussions civil. Platforms should create safe forums for moderators to exchange best practices and discuss approaches to dealing with abusive users.
Another option is to set up a text-based side channel, so moderators can reach out to reported users and warn them, give them a chance to correct their behavior, and remain in the room.
3rd party developers should be encouraged and enabled to build tools and add-ons that allow users and moderators to customize their experiences. This approach has certainly worked well for Twitch to the extent that they have adopted several of the developer community’s features into their core product.
The Reporting Flow should be intuitive since it is crucial to react quickly during live broadcasts. Users should have the option to report problematic chat rooms and users. Certain thresholds could help determine the severity of the situation, such as the number of reports, reports from trusted sources, and AI flags. At that point, a moderator can caution or remove bad actors and – in extreme situations – shut down a room entirely.
Sometimes, content goes beyond simple violation of the community’s guidelines and poses a potential threat to real-world safety. Audio platforms need well-defined policies on passing on this information to law enforcement and clear boundaries around what information can be shared and when.
Content Incident Protocols, such as those created by the Global Internet Forum to Counter Terrorism4, could provide valuable learnings and best practices in this space.
Podcasts and recorded audio messages can be moderated by transcribing the recordings and passing the text through an NLP filter, which further categorizes the content into themes and flags certain words or phrases. This technology can help to review content at scale and indicate a problematic fragment by pinpointing it in the recording.
Like any other form of content moderation, audio moderation is prone to false positives and grey areas, which require human oversight, continued revision, and refining of guidelines. Whether we are talking about live or recorded content, a human moderator needs to evaluate tone, determine intent, and decide whether a piece of content is appropriate within the broader context of a conversation.
The ability to process and access a text transcript can help prioritize and drive some lower complexity automation. Still, nothing replaces the experience of hearing someone’s actual voice to understand the magnitude of the spoken word and to capture hidden nuance and intent.
We believe audio-first platforms need a tech-enabled human-in-the-loop solution to help their moderators make better decisions, improve community safety, and prevent real-world harm.
At TaskUs we go beyond content moderation to deliver policy development, workflow & enforcement strategy, tooling assessments, and UX design consultation to help our clients solve these complex challenges.
To support and protect our moderators we have developed a comprehensive psychological health and safety program, guided by the practice of evidence-based psychology and grounded in neuroscience.
Learn more and get in touch here.