Technology & Innovation

How Do OpenAI’s Efforts To Make GPT-4 “Safer” Stack Up Against The NIST AI Risk Management Framework?

05.11.23 | 9 min read | 文字Liam Alexander&Divyansh Kaushik

3月,Openai释放了GPT-4,这是最近AI进展的另一个里程碑。这是Openai迄今为止最先进的模型,它已经被广泛部署给数百万用户,并且businesses, with the potential for drastic effects acrossa range of industries

But before releasing a new, powerful system like GPT-4 to millions of users, a crucial question is:“我们怎么知道这个系统是安全,值得信赖和可靠的,可以被释放?”Currently, this is a question that leading AI labs are free to answer on their own–for the most part. But increasingly, the issue has garnered greater attention as many have become worried that the current pre-deployment risk assessment and mitigation methods like those done by OpenAI are insufficient to prevent potential risks, including the spread of misinformation at scale, the entrenchment of societal inequities, misuse by bad actors, and catastrophic accidents.

这种担忧是最近的一个核心open letter, signed by several leading machine learning (ML) researchers and industry leaders, which calls for a 6-month pause on the training of AI systems “more powerful” than GPT-4 to allow more time for, among other things, the development of strong standards which would “ensure that systems adhering to them are safe beyond a reasonable doubt” before deployment. There’s a lot of disagreement over this letter, from experts who竞争这封信的基本叙述, to others who think that the pause is“一个可怕的主意” because it would unnecessarily halt beneficial innovation (not to mention that it would be impossible to implement). But almost all of the participants in this conversation tend to agree, pause or no, that the question of how to assess and manage risks of an AI system before actually deploying it is an important one.

这里寻找指导的自然场所是国家标准与技术研究院(NIST), which released itsAI Risk Management Framework(AI RMF) and an相关的剧本in January. NIST is leading the government’s work to set technical standards and consensus guidelines for managing risks from AI systems, and一些citeits standard-setting work as a potential basis for future regulatory efforts.

In this piece we walk through both what OpenAI actually did to test and improve GPT-4’s safety before deciding to release it, limitations of this approach, and how it compares to current best practices recommended by the National Institute of Standards and Technology (NIST). We conclude with some recommendations for Congress, NIST, industry labs like OpenAI, and funders.

OpenAI在部署GPT-4之前做了什么?

OpenAI claims to have taken several steps to make their system “safer and more aligned”. What are those steps? OpenAI describes these in theGPT-4 “system card,”一份文件,概述了OpenAI在部署之前如何管理和减轻GPT-4的风险。这是该过程的简化版本:

这够了吗?

尽管Openai表示,它们通过上述过程大大降低了不希望的模型行为的速率,但实施的控件并不强大,缓解不良模型行为的方法仍然是漏水和不完美的。

OpenAI did not eliminate the risks they identified. The system card documents numerous failures of the current version of GPT-4, including一个例子,便同意“生成一个程序计算ing attractiveness as a function of gender and race.”

Current efforts to measure risks also need work, according to GPT-4 red teamers. TheAlignment Research Center(ARC) which assessed these models for “emergent” risks这么说“到目前为止,我们已经进行的测试不足以出于多种原因,但是我们希望随着AI系统变得更加有能力,评估的严格性将扩大。”另一位GPT-4红色团队Aviv Ovadya说:“如果红色的GPT-4教会了我任何东西,那就是红色的团队是不够的。”Ovadya建议使用未来的剥离前风险评估工作来改善“violet teaming,”in which companies identify “how a system (e.g., GPT-4) might harm an institution or public good, and then support the development of tools using that same system to defend the institution or public good.”

Since current efforts to measure and mitigate risks of advanced systems are not perfect, the question comes down to when they are “good enough.” What levels of risk are acceptable? Today, industry labs like OpenAI can mostly rely on their own judgment when answering this question, but there are many different standards that could be used. Amba Kak, the executive director of theAI Now Institute,建议一个更严格的标准,认为监管机构应要求AI公司“证明他们在发布系统之前不会造成任何伤害”。要满足这样一种标准的,新的,更系统的风险管理和测量方法。

How did OpenAI’s efforts map on to NIST’s Risk Management Framework?

NIST’sAI RMF Coreconsists of four main “functions,” broad outcomes which AI developers can aim for as they develop and deploy their systems: map, measure, manage, and govern.

Framework users canmap的overall context in which a system will be used to determine relevant risks that should be “on their radar” in that identified context. They can then措施identified risks quantitatively or qualitatively, before finally管理的m, acting to mitigate risks based on projected impact. Thegovernfunction is about having a well-functioning culture of risk management to support effective implementation of the three other functions.

回顾OpenAI的过程,然后再发布GPT-4,我们可以看到他们的动作如何与RMF核心中的每个功能保持一致。这并不是说Openai在其工作中应用了RMF。我们只是在尝试评估他们的努力如何与RMF保持一致。

Some of the specific actions described by OpenAI are also laid out in the剧本。这Measure 2.7 functionhighlights “red-teaming” activities as a way to assess an AI system’s “security and resilience,” for example.

NIST’s resources provide a helpful overview of considerations and best practices that can be taken into account when managing AI risks, but they are not currently designed to provide concrete standards or metrics by which one can assess whether the practices taken by a given lab are “adequate.” In order to develop such standards, more work would be needed. To give some examples of current guidance that could be clarified or made more concrete:

So, across NIST’s AI RMF, while determining whether a given “outcome” has been achieved could be up for debate, nothing stops developers from going above and beyond the perceived minimum (and we believe they should). This is not a bug of the framework as it is currently designed, rather a feature, as the RMF“不开处方风险承受能力。”However, it is important to note that more work is needed to establish both stricter guidelines which leading labs can follow to mitigate risks from leading AI systems, and concrete standards and methods for measuring risk on top of which regulations could be built.

Recommendations

有几种方法可以改善前部风险评估和缓解前部系统的标准:可以改善:

国会

NIST

行业实验室

资助者