Over the past few days, you would have seen a flurry of publications on the European Union’s Artificial Intelligence Act entering into force on 2 August 2024. In the eyes of many this represents a landmark effort to regulate AI, emphasising safety, transparency and ethical use. It also reflects a global trend toward comprehensive AI governance, influencing regulatory frameworks worldwide and underscores a perceived need for robust oversight to ensure responsible and safe AI innovation and deployment.
In the context of such safety efforts, you may have come across the concept of Red Teaming. But did you know there are also Green Teams? This edition of Guaranteed Dissent will cover Green Teams, why they might be useful, what advantage they have and what argues against them.
I. Red Teaming
Red Teaming, a well-known practice in cybersecurity and involves simulating attacks on systems to identify and mitigate vulnerabilities. By adopting an adversarial approach, Red Teams can uncover and close off weaknesses that might otherwise be exploited by malicious actors. This proactive defensive strategy is seen as crucial for enhancing the security and resilience of various technologies, including AI. Effective Red Teaming is often expensive and time consuming.
In the context of generative AI Red Teaming could employ techniques to bypass content filters, such as prodding the AI through a sequence of specific prompts or ‘misleading’ the model by making false statements around the context. An example could be telling the model that it is already 2050 and it is permissible to reproduce a particular piece of literary work because the copyright protection period has already expired, or instructing the model to produce the recipe for making a harmful substance, because the question is asked in a ‘safe lab and testing environment only’.
II. Green Teaming
Green Teaming in the context of AI on the other hand has emerged more recently as a novel approach proposed by researchers from the University of Minnesota and Carnegie Mellon in 2023. The definition is somewhat in flux. The basic idea of Green Teaming, in contrast to the adversarial nature of Red Teaming, is that it focuses on bypassing content filters in generative AI models to unlock beneficial applications, which is then fed back to the team that builds the model. Rather than seeking vulnerabilities for harm, Green Teaming aims to reveal the positive potential of restricted content. For its ability to highlight and address the limitations of current AI content moderation practices the concept has gained limited traction.
Techniques and Applications
Green Teaming employs techniques similar to Red Teams to bypass content filters, enabling the creation of context-specific content that serves positive purposes. In the paper the researchers highlight three use cases to illustrate the potential: Suicide prevention support training (by create realistic scenarios), debugging training (by creating buggy code) and social activism (by using AI tools for social critique and activism).
Benefits, Potential and Ethics
Applied in the right way Green Teaming can offer significant advantages by enabling beneficial applications that might otherwise be blocked by overly cautious content filters and moderation. It’s of course a very liberal approach to technology, because it starts by asking the question ‘What good can come of this?’ rather than assuming many bad outcomes will be AI’s consequence and cataloguing them comprehensively to make sure they are prevented at source.
The approach of Green Teaming is of course significantly more difficult than sending in a team that essentially sensors what the employees of the model provider believe to be acceptable or not. The entire debate around Google’s Gemini is an example of such DPO (Direct Preference Optimisation) over-reach: Essentially Gemini refused to render historically accurate images that – for obvious reasons – would not conform to today’s generally accepted diversity, equity and inclusion ideas.
The Green Team needs to have to freedom to explore creatively and ‘tinker’. The biggest hurdle to Green Teaming is probably that it is hard to measure its outcomes. After all, it is much easier to count the number of ‘attack-vectors’ that have been prevented, than comprehensively listing out the positive creative benefits unleashed.
The entire topic also needs to be put into context with AI doom scenarios that some paint (https://intelligence.org/about/). As mentioned in previous posts (http…), at this time, I am not particularly worried about AI taking over the world any time soon. However, in a setup, where a new technology poses significant destructive potential the burden of proof of its safety should rest with those who want to introduce it and build a business out of it, rather than the other way around.
Historically, of course, humanity hasn’t had a particularly good track record of introducing only safe technologies into the public: We added lead to gasoline as an anti-knocking agent which cost the US alone over 800 million IQ points collectively, brought to market drugs that lead to severe birth defects in new-borns and plastered the planet with PFAS, a group of synthetic ‘forever chemicals’ resistant to water, heat, grease and oil. They have been found in air, water, soil and even the human bloodstreams and their removal from the environment at the same rate as they are being added is virtually impossible in economic terms, at a 20 to 7,000 trillion USD per year.
While all these examples are alarming, they do not prescribe exactly how to go about ensuring product safety. Ironically, attempting to list out cases of abuse we need to prevent is exactly what doomers argue will necessarily fail in delivering safe AI. It would be a never-ending game of catch-up and no one will successfully define a behaviour and values catalogue exhaustively. For this very reason I would like to see more Green Teaming efforts to see what new insights this approach can deliver.
III. Conclusion
Green Teaming offers a powerful addition to the AI safety techniques toolkit. It addresses different aspects of AI safety and seeks to balance concerns and potential alike. While Red Teaming is fixated on malignant actors, Green Teaming focuses on the importance of context as defining whether or not a particular AI application is safe and positive. Together, these methodologies complement each other well in the quest for AI safety, encouraging responsible innovation that benefits society as a whole, outside of regulation.