3 Threat 1: Control Problem 3 Threat 1: Control Problem
3.1 Technical Reasons for Lack of Control 3.1 Technical Reasons for Lack of Control
3.1.1 Guiding Questions 3.1.1 Guiding Questions
- What is the control problem?
- What are some examples of “breaks” in reward functions?
- How do emergent abilities in technology develop?
- Why might safety concerns about AI be overblown? Why might they be underappreciated?
3.1.2 Readings (Recommended and Assigned) 3.1.2 Readings (Recommended and Assigned)
Required Readings
- Dan Hendrycks & Mantas Mazeika, X-Risk Analysis for AI Research, (2022), https://arxiv.org/pdf/2206.05862.pdf
- Useful in answering quided question #1. The reading also provides Q&A that may be useful for in class discussion in reagrds to threats and future impact.
- Ben Gilburt, AI—The Control Problem, Medium (May 24, 2018), https://towardsdatascience.com/ai-the-control-problem-c82bb485bc54
- Short article on safety with AI; use of the website is not very user friendly; a majority of the article is covered by a request to sign up. May be best to not use this article as it would not be consequential if it was not used.
- Jack Clark & Dario Amodei, Faulty Reward Functions in the Wild, OpenAI (Dec. 21, 2016), https://openai.com/research/faulty-reward-functions
- Fun game version of demanding AI agents how to act in a game with a video showing the results; relates to safety and provides minimal insight on how to deal with certain AI functions.
- Dario Amodei, et al., Concrete Problems in AI Safety (2016), https://arxiv.org/pdf/1606.06565.pdf
- Pages 2-21; would assist in answering guided question #4.
- Stephen Ornes, The Unpredictable Abilities Emerging From Large AI Models, Quanta Magazine (Mar. 16, 2023), https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316/
- Fun article on emergence of AI and the unpredictable powers and pifalls. May be useful to do a lighthearted class discussion about what kinds of questions may arise when researching/interacting with AI models.
- AI Impacts, Likelihood of Discontinuous Progress Around the Development of AGI, https://aiimpacts.org/likelihood-of-discontinuous-progress-around-the-development-of-agi/
- Discusses human brain evolution in comparison to that of apes.
Recommended
- Steven Kerr, On the Folly of Rewarding A, While Hoping for B, 9 Acad. Mang. Exec. (1995), https://www.ou.edu/russell/UGcomp/Kerr.pdf
- Did not particularly find this article super useful.
- Tim G. J. Rudner & Helen Toner, Key Concepts in AI Safety: Specification in Machine Learning, CSET (Dec. 2021), https://cset.georgetown.edu/publication/key-concepts-in-ai-safety-specification-in-machine-learning/
- Discusses misspecification and how ML works; has a graphic as well that discusses levels of specification and makes it easy to understand.
- Katja Grace, Counterarguments to The Basic AI x-Risk Case, AI Impacts (Aug. 31, 2022), https://aiimpacts.org/counterarguments-to-the-basic-ai-x-risk-case/
- Good explanations/arguments to AI risks and concerns; would suggest swiching this article to required and moving the Ben Gilbert article to recommended.
3.1.3 Exercise 3.1.3 Exercise
- Threat practice—in groups, analyze the following regarding the threat of a loss of control over AI:
-
-
-
-
- Identify key risk factors that impact probability of the threat;
- identify key actors that could shape these factors;
- identify key actions these actors could take to shape these factors; and
- identify key pathways to ensuring actors take key actions
-
-
-
3.2 Alignment 3.2 Alignment
3.2.1 Guiding Questions 3.2.1 Guiding Questions
- What is AI alignment?
- Why is alignment such a difficult problem?
- What can we learn from alignment issues in other contexts?
3.2.2 Readings (Recommended and Assigned) 3.2.2 Readings (Recommended and Assigned)
Required Readings
- Jaime Fisac et al., Pragmatic-Pedagogic Value Alignment (2007), https://arxiv.org/pdf/1707.06354.pdf [8 pages]
- Article concerns robots and includes a study; would suggest this as a recommended reading instead of a required reading.
- [Start at Alignment] Dan Hendrycks, ML Safety Newsletter #6 (Oct. 13, 2022), https://open.substack.com/pub/mlsafety/p/ml-safety-newsletter-6?utm_campaign=post&utm_medium=web
- Goes into the mechanics of things, provides charts and graphs. May be useful in answering guiding questions.
- Betty L. Hou & Brain P. Green, A Multilevel Framework for the AI Alignment Problem, Markkula Center, (Jul. 25, 2022), https://www.scu.edu/ethics/focus-areas/technology-ethics/resources/a-multilevel-framework-for-the-ai-alignment-problem/
- Extremely helpful in defining what AI alignment is and provides framework; would definitely be useful in answering the guiding questions (easy to understand.)
- Extremely helpful in defining what AI alignment is and provides framework; would definitely be useful in answering the guiding questions (easy to understand.)
- Christoph Winter et al., Value Alignment for Advanced Artificial Judicial Intelligence, 60 Am. Phil. Quart. 187 (2023), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4252645 [16 pages].
- Very informative article on judicial contexts of AI usage. Beneficial to see how AI can be utilized in legal contexts; would provide for a fascinating in class conversation about whether or not AI should be considered in judicial proceedings/ if so/ what implications come along with it, and how should possible issues be resolved.
- [Skim] Andreas Kopf et al., OpenAssistant Conversations—Democratizing Large Language Model Alignment (2023), https://arxiv.org/abs/2304.07327
- Risk section beginning on page 11 is useful in regards to human interaction; would be helpful in terms of alignment.
Recommended
- Jan Leike et al., Our Approach to Alignment Research, OpenAI (Aug. 24, 2022), https://openai.com/blog/our-approach-to-alignment-research
- One particular company's approach to AI alignment; recommend skimming this article for some general insight.
- Ondrej Bajgar & Jan Horenovsky, Negative Human Rights as a Basis for Long-term AI Safety and Regulation, JAIR (2023), https://doi.org/10.1613/jair.1.14020
- Start on page 1054-1063; discusses human rights risks with AI.
- Anton Korinek & Avital Balwit, Aligned With Whom?, Brookings, https://www.brookings.edu/wp-content/uploads/2022/05/Aligned-with-whom-1.pdf
- Suggest making this article required to skim; has a good general overview of alignment and some key term explanation.
- Azim Shariff et al., Whose Life Should Your Car Save?, New York Times (Nov. 3, 2016),https://www.nytimes.com/2016/11/06/opinion/sunday/whose-life-should-your-car-save.html
- Would allow for good in class conversation topics -- how do students feel about driverless cars? How would they answer the question presented in the article?
3.2.3 Exercise 3.2.3 Exercise
- Threat practice—in groups, analyze the following regarding the threat of misaligned AI:
- Identify key risk factors that impact probability of the threat;
- identify key actors that could shape these factors;
- identify key actions these actors could take to shape these factors; and
- identify key pathways to ensuring actors take key actions
3.3 Power Seeking and Enfeeblement 3.3 Power Seeking and Enfeeblement
3.3.1 Guiding Questions 3.3.1 Guiding Questions
- What's another example of technology-induced enfeeblement? What permitted such enfeeblement and what, if anything, could have been done to slow or prevent such enfeeblement?
- Can you explain why even giving AI the "right" goals may still lead to power-seeking behavior?
3.3.2 Readings (Recommended and Assigned) 3.3.2 Readings (Recommended and Assigned)
Required Readings
- Joseph Carlsmith, Is Power-Seeking AI an Existential Risk?, https://arxiv.org/abs/2206.13353
- Beneficial to answer guiding questions.
- Skyscape, Does Artificial Power Corrupt Absolutely? GPT-4 Isn't Saying, https://spyscape.com/article/do-ais-seek-power
- Fun article on AI safety, breaks it down in a digestible manner using Pac Man as an example; discusses AI safegaurding.
- Could lead to an in class discussion about how certain commands could lead to power seeking; brings up a question of whether certain words may trigger power seeking?
- Fun article on AI safety, breaks it down in a digestible manner using Pac Man as an example; discusses AI safegaurding.
- Victoria Krakovna & Janos Kramar, Power-Seeking Can be Probable and Predictive for Trained Agents (2023), https://arxiv.org/pdf/2304.06528v1.pdf
- Study done; does not provide a lot of information. Would suggest this as recommended and not required.
- Holden Karnofsky, AI Could Defeat All of Us Combined, Cold Takes (Jun. 9, 2022), https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/
- There are questions within the article to be discussed which may spark in class discussion; convenient that there is a recording so students can listen to the article.
Recommended
- Jan Leike et al., AI Safety Gridworlds, https://arxiv.org/pdf/1711.09883.pdf
- Middle of page 2 until page 15. There are possible discussion questions on page 15 of the article.
- Migle Laukyte, Averting Enfeeblement and Fostering Empowerment: Algorithmic Rights and the Right to Good Administration, 46 Comp. L. & Sec. Rev. (2022), https://www.sciencedirect.com/science/article/pii/S0267364922000607
- 2.1-2.6 seem to be the most helpful sections of the article.
- Ian Sample, Human Compatible by Stuart Russell review—AI and Our Future, Guardian (Oct. 24, 2019), https://www.theguardian.com/books/2019/oct/24/human-compatible-ai-problem-control-stuart-russell-review
- Quick and easy read
3.3.3 Exercise 3.3.3 Exercise
- Individuals who completed Threat Exercise 1 will share their “key actions” for actors to mitigate exacerbating risk factors
- Groups will select what they deem the most efficacious action and share that action with the class