Technology & Innovation

Openai如何使GPT-4“更安全”堆积在NIST AI风险管理框架上?

05.11.23 | 9分钟阅读 | 文字Liam AlexanderDivyansh Kaushik

3月,Openai释放了GPT-4,这是最近AI进展的另一个里程碑。这是Openai迄今为止最先进的模型,它已经被广泛部署给数百万用户,并且企业, with the potential for drastic effects across一系列行业

但是,在发布一个新的,强大的系统(如GPT-4到数百万用户)之前,一个关键的问题是:“我们怎么知道这个系统是安全,值得信赖和可靠的,可以被释放?”Currently, this is a question that leading AI labs are free to answer on their own–for the most part. But increasingly, the issue has garnered greater attention as many have become worried that the current pre-deployment risk assessment and mitigation methods like those done by OpenAI are insufficient to prevent potential risks, including the spread of misinformation at scale, the entrenchment of societal inequities, misuse by bad actors, and catastrophic accidents.

这种担忧是最近的一个核心open letter,由几位领先的机器学习(ML)研究人员和行业领导者签署,该领导者呼吁与GPT-4“更强大”的AI系统训练6个月,以便为更多时间提供更多时间在部署之前,将“确保遵守他们的系统安全地毫无疑问地安全”的强大标准。这封信有很多分歧,专家竞争这封信的基本叙述, to others who think that the pause is“一个可怕的主意”因为它会不必要地停止有益的创新(更不用说实施是不可能实施的)。但是,在这次对话中,几乎所有参与者都倾向于同意,停顿或否,即如何在实际部署它之前评估和管理AI系统风险的问题是重要的。

这里寻找指导的自然场所是国家标准与技术研究院(nist),发行了AI Risk Management Framework(AI RMF) and an相关的剧本在一月。NIST正在领导政府的工作,以制定技术标准和共识指南,以管理AI系统的风险,以及一些citeits standard-setting work as a potential basis for future regulatory efforts.

In this piece we walk through both what OpenAI actually did to test and improve GPT-4’s safety before deciding to release it, limitations of this approach, and how it compares to current best practices recommended by the National Institute of Standards and Technology (NIST). We conclude with some recommendations for Congress, NIST, industry labs like OpenAI, and funders.

OpenAI在部署GPT-4之前做了什么?

OpenAI claims to have taken several steps to make their system “safer and more aligned”. What are those steps? OpenAI describes these in theGPT-4 “system card,”一份文件,概述了OpenAI在部署之前如何管理和减轻GPT-4的风险。这是该过程的简化版本:

这够了吗?

尽管Openai表示,它们通过上述过程大大降低了不希望的模型行为的速率,但实施的控件并不强大,缓解不良模型行为的方法仍然是漏水和不完美的。

OpenAI did not eliminate the risks they identified. The system card documents numerous failures of the current version of GPT-4, including一个例子它同意“生成一个计划,以使吸引力随性别和种族的函数计算。”

Current efforts to measure risks also need work, according to GPT-4 red teamers. TheAlignment Research Center(ARC) which assessed these models for “emergent” risks这么说“到目前为止,我们已经进行的测试不足以出于多种原因,但是我们希望随着AI系统变得更加有能力,评估的严格性将扩大。”另一位GPT-4红色团队Aviv Ovadya说:“如果红色的GPT-4教会了我任何东西,那就是红色的团队是不够的。”Ovadya建议使用未来的剥离前风险评估工作来改善“violet teaming,”在哪些公司中,公司确定“系统(例如,GPT-4)如何损害机构或公共利益,然后支持使用相同系统来捍卫机构或公共物品的工具的开发。”

Since current efforts to measure and mitigate risks of advanced systems are not perfect, the question comes down to when they are “good enough.” What levels of risk are acceptable? Today, industry labs like OpenAI can mostly rely on their own judgment when answering this question, but there are many different standards that could be used. Amba Kak, the executive director of theAI Now Institute,建议一个更严格的标准,认为监管机构应要求AI公司“证明他们在发布系统之前不会造成任何伤害”。要满足这样一种标准的,新的,更系统的风险管理和测量方法。

Openai的努力如何映射到NIST的风险管理框架?

NIST’sAI RMF核心consists of four main “functions,” broad outcomes which AI developers can aim for as they develop and deploy their systems: map, measure, manage, and govern.

Framework users canmap的overall context in which a system will be used to determine relevant risks that should be “on their radar” in that identified context. They can then措施在定量或定性上确定风险,然后管理的m, acting to mitigate risks based on projected impact. The治理function is about having a well-functioning culture of risk management to support effective implementation of the three other functions.

回顾OpenAI的过程,然后再发布GPT-4,我们可以看到他们的动作如何与RMF核心中的每个功能保持一致。这并不是说Openai在其工作中应用了RMF。我们只是在尝试评估他们的努力如何与RMF保持一致。

Some of the specific actions described by OpenAI are also laid out in the剧本。这Measure 2.7 functionhighlights “red-teaming” activities as a way to assess an AI system’s “security and resilience,” for example.

NIST’s resources provide a helpful overview of considerations and best practices that can be taken into account when managing AI risks, but they are not currently designed to provide concrete standards or metrics by which one can assess whether the practices taken by a given lab are “adequate.” In order to develop such standards, more work would be needed. To give some examples of current guidance that could be clarified or made more concrete:

因此,在NIST的AI RMF中,在确定是否取得的“结果”是否可以进行辩论时,没有什么可以阻止开发人员超越最低限度的最低限度(我们相信他们应该)。这不是当前设计的框架的错误,而是一个功能,因为RMF“不开处方风险承受能力。”However, it is important to note that more work is needed to establish both stricter guidelines which leading labs can follow to mitigate risks from leading AI systems, and concrete standards and methods for measuring risk on top of which regulations could be built.

建议

有几种方法可以改善前部风险评估和缓解前部系统的标准:可以改善:

国会

NIST

行业实验室

资助者