對齊研究中心

對齊研究中心
Alignment Research Center
成立時間	2021年4月
創始人	保羅·克里斯蒂亞諾（英语：Paul Christiano (researcher)）; 貝絲·巴恩斯（Beth Barnes）; Mark Xu
類型	非營利研究機構
法律地位	501(c)(3)免稅公益組織
總部	美国加利福尼亞州柏克萊
目標	人工智慧對齊和安全性研究（英语：AI safety）
網站	alignment.org

對齊研究中心（英語：Alignment Research Center, ARC）是美國的非營利研究機構，致力將人工智慧的行為對齊人類的價值觀和預期利益。^[1]對齊研究中心由美國人工智慧研究實驗室OpenAI前研究員保羅·克里斯蒂亞諾（英语：Paul Christiano (researcher)）創立，專注於識別和理解AI模型的潛在危害。^[2]^[3]

概述

對齊研究中心的使命是確保未來的機器學習系統能夠安全地設計和開發，並造福人類。研究中心由保羅·克里斯蒂亞諾（英语：Paul Christiano (researcher)）和其他研究人員於2021年4月創立，主要研究對人工智慧對齊相關理論的挑戰^[4]，理論的一關鍵在於當人工智慧系統變得愈加先進時，其設計者人類開發的對齊技術可能因此被規避或發現漏洞。^[5]對齊研究中心亦嘗試從理論工作提升至實證研究、相關產業的合作和政策制定。^[6]^[7]

2022年3月，對齊研究中心自開放慈善計畫（英语：Open Philanthropy）獲得26.5 萬美元。^[8]同年，加密貨幣交易平台FTX宣布破產，對齊研究中心表示將歸還其創始人山姆·班克曼-弗里德的FTX基金會（FTX Foundation）所提供的125萬美元捐款。^[9]

2023年3月，美國人工智慧研究實驗室OpenAI請求對齊研究中心協助測試其開發的語言模型GPT-4，評估該模型對權力追求行為的能力和潛在風險。^[10]對齊研究中心評估GPT-4在策略制定、自我複製、資源獲取、伺服器隱匿和網路釣魚操作的能力^[11]。此外，驗證碼問題的解答也是測試的一部分^[12]，而GPT-4透過零工求職平台TaskRabbit（英语：TaskRabbit）雇用人類為其完成這項工作，並在身分遭到懷疑時欺騙受雇者相信雇主（GPT-4）是名視力受損的人類而非機器人。^[13]對齊研究中心確認GPT-4對誘發受限訊息的提示做出不允許反應的機率較GPT-3.5低82％，產生人工智慧幻覺的機率較其低60％。^[14]

參考資料

^ MacAskill, William. How Future Generations Will Remember Us. The Atlantic. 2022-08-16 [2023-04-23]. （原始内容存档于2023-06-08）（英语）.
^ Klein, Ezra. This Changes Everything. The New York Times. 2023-03-12 [2023-04-30]. ISSN 0362-4331. （原始内容存档于2023-08-05）（美国英语）.
^ Piper, Kelsey. How to test what an AI model can — and shouldn't — do. Vox. 2023-03-29 [2023-04-30]. （原始内容存档于2023-06-01）（英语）.
^ Christiano, Paul. Announcing the Alignment Research Center. Medium. 2021-04-26 [2023-04-16]. （原始内容存档于2023-08-07）（英语）.
^ Christiano, Paul; Cotra, Ajeya; Xu, Mark. Eliciting Latent Knowledge: How to tell if your eyes deceive you. Google Docs. Alignment Research Center. 2021-12 [2023-04-16]. （原始内容存档于2023-04-20）（英语）.
^ Alignment Research Center. Alignment Research Center. [2023-04-16]. （原始内容存档于2023-07-18）（英语）.
^ Pandey, Mohit. Stop Questioning OpenAI's Open-Source Policy. Analytics India Magazine. 2023-03-17 [2023-04-23]. （原始内容存档于2023-05-01）（美国英语）.
^ Alignment Research Center — General Support. Open Philanthropy. 2022-06-14 [2023-04-16]. （原始内容存档于2023-04-20）（美国英语）.
^ Wallerstein, Eric. FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations. Wall Street Journal. 2023-01-07 [2023-04-30]. ISSN 0099-9660. （原始内容存档于2023-06-28）（美国英语）.
^ GPT-4 System Card (PDF), OpenAI, 2023-03-23 [2023-04-16], （原始内容存档 (PDF)于2023-04-07）（英语）
^ Edwards, Benj. OpenAI checked to see whether GPT-4 could take over the world. Ars Technica. 2023-03-15 [2023-04-30]. （原始内容存档于2023-04-05）（美国英语）.
^ Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude. Alignment Research Center. 2023-03-17 [2023-04-16]. （原始内容存档于2023-04-05）（英语）.
^ Cox, Joseph. GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human. Vice News Motherboard. 2023-03-15 [2023-04-16]. （原始内容存档于2023-04-10）（英语）.
^ Burke, Cameron. 'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'. Yahoo Finance. 2023-03-20 [2023-04-30]. （原始内容存档于2023-05-04）（美国英语）.

外部連結

對齊研究中心

[1] MacAskill, William. How Future Generations Will Remember Us. The Atlantic. 2022-08-16 [2023-04-23]. （原始内容存档于2023-06-08）（英语）.

[2] Klein, Ezra. This Changes Everything. The New York Times. 2023-03-12 [2023-04-30]. ISSN 0362-4331. （原始内容存档于2023-08-05）（美国英语）.

[3] Piper, Kelsey. How to test what an AI model can — and shouldn't — do. Vox. 2023-03-29 [2023-04-30]. （原始内容存档于2023-06-01）（英语）.

[4] Christiano, Paul. Announcing the Alignment Research Center. Medium. 2021-04-26 [2023-04-16]. （原始内容存档于2023-08-07）（英语）.

[5] Christiano, Paul; Cotra, Ajeya; Xu, Mark. Eliciting Latent Knowledge: How to tell if your eyes deceive you. Google Docs. Alignment Research Center. 2021-12 [2023-04-16]. （原始内容存档于2023-04-20）（英语）.

[6] Alignment Research Center. Alignment Research Center. [2023-04-16]. （原始内容存档于2023-07-18）（英语）.

[7] Pandey, Mohit. Stop Questioning OpenAI's Open-Source Policy. Analytics India Magazine. 2023-03-17 [2023-04-23]. （原始内容存档于2023-05-01）（美国英语）.

[8] Alignment Research Center — General Support. Open Philanthropy. 2022-06-14 [2023-04-16]. （原始内容存档于2023-04-20）（美国英语）.

[9] Wallerstein, Eric. FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations. Wall Street Journal. 2023-01-07 [2023-04-30]. ISSN 0099-9660. （原始内容存档于2023-06-28）（美国英语）.

[10] GPT-4 System Card (PDF), OpenAI, 2023-03-23 [2023-04-16], （原始内容存档 (PDF)于2023-04-07）（英语）

[11] Edwards, Benj. OpenAI checked to see whether GPT-4 could take over the world. Ars Technica. 2023-03-15 [2023-04-30]. （原始内容存档于2023-04-05）（美国英语）.

[12] Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude. Alignment Research Center. 2023-03-17 [2023-04-16]. （原始内容存档于2023-04-05）（英语）.

[13] Cox, Joseph. GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human. Vice News Motherboard. 2023-03-15 [2023-04-16]. （原始内容存档于2023-04-10）（英语）.

[14] Burke, Cameron. 'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'. Yahoo Finance. 2023-03-20 [2023-04-30]. （原始内容存档于2023-05-04）（美国英语）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

查论编通用人工智能的存在風險
概念	人工智能对齐人工智慧能力控制（英语：AI capability control）人工智慧叛變加速變化（英语：Accelerating change）友好的人工智慧（英语：Friendly artificial intelligence）工具性收斂（英语：Instrumental convergence）技术奇异点機器倫理學（英语：Machine ethics）超智能
組織	對齊研究中心人工智慧安全中心（英语：Center for AI Safety）艾倫人工智慧研究所（英语：Allen Institute for AI）應用理性中心（英语：Center for Applied Rationality）人類兼容人工智慧中心（英语：Center for Human-Compatible Artificial Intelligence）存在風險研究中心（英语：Centre for the Study of Existential Risk） DeepMind 基礎問題研究所（英语：Foundational Questions Institute）人類未來研究所（英语：Future of Humanity Institute）生命未来研究所 Humanity+（英语：Humanity+）新興技術與倫理研究所（英语：Institute for Ethics and Emerging Technologies）萊弗哈姆智能未來中心（英语：Leverhulme Centre for the Future of Intelligence）機器智慧研究所（英语：Machine Intelligence Research Institute） OpenAI
人士	史考特·亞歷山大（英语：Slate Star Codex）尼克·博斯特罗姆 K·埃里克·德雷克斯勒山姆·哈里斯史蒂芬·霍金比爾·希巴德（英语：Bill Hibbard）比爾·喬伊埃隆·馬斯克史蒂夫·歐莫杭德羅（英语：Steve Omohundro）胡·普赖斯马丁·里斯斯圖爾特·J·羅素（英语：Stuart J. Russell）讓·塔林（英语：Jaan Tallinn）馬克斯·泰格馬克弗朗克·韦尔切克羅曼·揚波爾斯基（英语：Roman Yampolskiy）楊安澤伊利澤·尤考斯基（英语：Eliezer Yudkowsky）
其它	人工智慧作為全球災難性風險通用人工智慧的爭議和危險人工智慧倫理學（英语：Ethics of artificial intelligence）痛苦風險（英语：Suffering risks）《人類兼容（英语：Human Compatible）》關於人工智慧的公開信（英语：Open Letter on Artificial Intelligence）《我們的最終發明（英语：Our Final Invention）》《懸崖：生存的風險與人類的未來（英语：The Precipice）》《超級智能：路徑、危險、戰略（英语：Superintelligence: Paths, Dangers, Strategies）》《你相信這台電腦嗎？（英语：Do You Trust This Computer?）》人工智能法案
Category（英语：:Category:Existential risk from artificial general intelligence）