Facebook's AI Safety Chief Reveals the Future of Content Moderation
The challenges of content moderation have plagued online platforms for years, but the rapid advancement of artificial intelligence, particularly large language models (LLMs), has dramatically escalated the stakes. Brett Levenson, former head of business integrity at Facebook (now Meta) and now CEO of Moonbounce, understands this better than most. His journey from believing technology could “fix” Facebook’s content issues to realizing the problem was far more systemic has led to a groundbreaking approach – “policy as code” – and a $12 million funding round for his new venture. This article delves into Levenson’s insights, the rise of Moonbounce, and the evolving landscape of AI safety.
The Limitations of Traditional Content Moderation
When Levenson joined Facebook in 2019, following the Cambridge Analytica scandal, he initially believed that improved technology could solve the platform’s content moderation woes. He quickly discovered a more fundamental issue: the human element. Human reviewers were tasked with memorizing extensive, often machine-translated, policy documents – sometimes exceeding 40 pages – and making split-second decisions on flagged content.
“It was kind of like flipping a coin, whether the human reviewers could actually address policies correctly, and this was many days after the harm had already occurred anyway,” Levenson told GearTech. This reactive approach, relying on delayed human intervention, proved unsustainable, especially in the face of sophisticated and well-resourced malicious actors. The emergence of AI chatbots has only exacerbated the problem, leading to concerning incidents like chatbots offering harmful advice to vulnerable users and AI-generated imagery bypassing safety filters.
From Facebook to Moonbounce: The Birth of "Policy as Code"
Levenson’s frustration with the limitations of traditional content moderation sparked the idea of “policy as code.” This innovative concept involves transforming static policy documents into executable, updatable logic directly integrated with enforcement mechanisms. Instead of relying on human interpretation of lengthy documents, the rules themselves become actively enforced by the system. This insight became the foundation for Moonbounce, which officially launched with $12 million in funding co-led by Amplify Partners and StepStone Group.
How Moonbounce Works: A Real-Time Safety Layer
Moonbounce provides an additional safety layer for platforms and AI companies wherever content is generated, whether by users or by AI. The company has developed its own large language model (LLM) capable of analyzing policy documents, evaluating content in real-time (within 300 milliseconds), and taking appropriate action. This action can range from slowing down content distribution for human review to immediately blocking high-risk material. This proactive approach contrasts sharply with the reactive methods of the past.
Target Verticals: Where Moonbounce is Making an Impact
Currently, Moonbounce focuses on three key verticals:
- User-Generated Content Platforms: Including dating apps, where safety and trust are paramount.
- AI Companion Companies: Developers building AI characters and companions that require robust safety protocols.
- AI Image Generators: Platforms creating AI-generated imagery, addressing concerns about harmful or inappropriate content.
Moonbounce is already processing over 40 million daily reviews and serving more than 100 million daily active users. Notable customers include AI companion startup Channel AI, image and video generation company Civitai, and character roleplay platforms Dippy AI and Moescape.
Safety as a Product Differentiator
Levenson believes that safety can be a competitive advantage. “Safety can actually be a product benefit,” he stated in an interview with GearTech. “It just never has been because it’s always a thing that happens later, not a thing you can actually build into your product.” By integrating safety directly into the product experience, companies can build trust with users and differentiate themselves in a crowded market. Tinder, for example, has reportedly seen a 10x improvement in detection accuracy by utilizing LLM-powered services like Moonbounce.
The Investor Perspective: Why Amplify Partners Backed Moonbounce
Lenny Pruss, general partner at Amplify Partners, highlighted the growing importance of real-time safety guardrails in the age of AI. “Content moderation has always been a problem that plagued large online platforms, but now with LLMs at the heart of every application, this challenge is even more daunting,” Pruss said in a statement. “We invested in Moonbounce because we envision a world where objective, real-time guardrails become the enabling backbone of every AI-mediated application.”
The Increasing Legal and Reputational Risks
AI companies are facing increasing scrutiny and potential legal liabilities. Incidents involving chatbots providing harmful advice, including encouraging suicidal ideation, and AI image generators being used to create nonconsensual explicit content have raised serious concerns. These failures in internal safety measures are becoming a significant liability. As a result, companies are increasingly seeking external solutions to bolster their safety infrastructure.
“We’re a third party sitting between the user and the chatbot, so our system isn’t inundated with context the way the chat itself is,” Levenson explained. “The chatbot itself has to remember, potentially, tens of thousands of tokens that have come before…We’re solely worried about enforcing rules at runtime.” This focused approach allows Moonbounce to provide a more efficient and effective safety layer.
Iterative Steering: A Proactive Approach to Harm Reduction
Moonbounce, led by Levenson and his former Apple colleague Ash Bhardwaj, is continually innovating. Their next major focus is “iterative steering,” a capability developed in response to tragic events like the 2024 suicide of a 14-year-old Florida boy who became fixated on a Character AI chatbot. Instead of simply blocking harmful topics, iterative steering aims to intercept conversations and redirect them, modifying prompts in real-time to encourage more supportive and helpful responses from the chatbot.
“We hope to be able to add to our actions toolkit the ability to steer the chatbot in a better direction to, essentially, take the user’s prompt and modify it to force the chatbot to be not just an empathetic listener, but a helpful listener in those situations,” Levenson said.
The Future of Moonbounce: Acquisition or Independence?
When asked about a potential acquisition by a company like Meta, bringing his work on content moderation full circle, Levenson acknowledged the potential synergies. He also emphasized his fiduciary duty to his investors. “My investors would kill me for saying this, but I would hate to see someone buy us and then restrict the technology,” he said. “Like, ‘Okay, this is ours now, and nobody else can benefit from it.’” This suggests a desire to maintain the accessibility and impact of Moonbounce’s technology, ensuring it remains a force for good in the evolving landscape of AI safety.
The future of content moderation is undoubtedly intertwined with the advancements in AI. Moonbounce, under Levenson’s leadership, is poised to play a crucial role in shaping that future, offering a proactive and scalable solution to the challenges of ensuring safety in an increasingly AI-driven world. The company’s “policy as code” approach represents a significant shift from reactive measures to a preventative framework, ultimately aiming to build a safer and more trustworthy online experience for everyone.