3 Threat 1: Control Problem 3 Threat 1: Control Problem

3.1 Technical Reasons for Lack of Control 3.1 Technical Reasons for Lack of Control

3.1.1 Guiding Questions 3.1.1 Guiding Questions

 

    • What is the control problem?
    • What are some examples of “breaks” in reward functions?
    • How do emergent abilities in technology develop?
    • Why might safety concerns about AI be overblown? Why might they be underappreciated?

3.1.2 Readings (Recommended and Assigned) 3.1.2 Readings (Recommended and Assigned)

Required Readings

  • Dan Hendrycks & Mantas Mazeika, X-Risk Analysis for AI Research, (2022), https://arxiv.org/pdf/2206.05862.pdf
    • Useful in answering quided question #1. The reading also provides Q&A that may be useful for in class discussion in reagrds to threats and future impact. 
  • Ben Gilburt, AI—The Control Problem, Medium (May 24, 2018), https://towardsdatascience.com/ai-the-control-problem-c82bb485bc54
    • Short article on safety with AI; use of the website is not very user friendly; a majority of the article is covered by a request to sign up. May be best to not use this article as it would not be consequential if it was not used.
  • Jack Clark & Dario Amodei, Faulty Reward Functions in the Wild, OpenAI (Dec. 21, 2016), https://openai.com/research/faulty-reward-functions
    • Fun game version of demanding AI agents how to act in a game with a video showing the results; relates to safety and provides minimal insight on how to deal with certain AI functions. 
  • Dario Amodei, et al., Concrete Problems in AI Safety (2016), https://arxiv.org/pdf/1606.06565.pdf
    • Pages 2-21; would assist in answering guided question #4.
  • Stephen Ornes, The Unpredictable Abilities Emerging From Large AI Models, Quanta Magazine (Mar. 16, 2023), https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316/
    • Fun article on emergence of AI and the unpredictable powers and pifalls. May be useful to do a lighthearted class discussion about what kinds of questions may arise when researching/interacting with AI models. 
  • AI Impacts, Likelihood of Discontinuous Progress Around the Development of AGI,  https://aiimpacts.org/likelihood-of-discontinuous-progress-around-the-development-of-agi/
    • Discusses human brain evolution in comparison to that of apes.

Recommended

  • Steven Kerr, On the Folly of Rewarding A, While Hoping for B, 9 Acad. Mang. Exec. (1995),  https://www.ou.edu/russell/UGcomp/Kerr.pdf
    • Did not particularly find this article super useful.
  • Tim G. J. Rudner & Helen Toner, Key Concepts in AI Safety: Specification in Machine Learning, CSET (Dec. 2021), https://cset.georgetown.edu/publication/key-concepts-in-ai-safety-specification-in-machine-learning/
    • Discusses misspecification and how ML works; has a graphic as well that discusses levels of specification and makes it easy to understand. 
  • Katja Grace, Counterarguments to The Basic AI x-Risk Case, AI Impacts (Aug. 31, 2022), https://aiimpacts.org/counterarguments-to-the-basic-ai-x-risk-case/
    • Good explanations/arguments to AI risks and concerns; would suggest swiching this article to required and moving the Ben Gilbert article to recommended. 

3.1.3 Exercise 3.1.3 Exercise

 

  • Threat practice—in groups, analyze the following regarding the threat of a loss of control over AI:
          • Identify key risk factors that impact probability of the threat;
          • identify key actors that could shape these factors;
          • identify key actions these actors could take to shape these factors; and
          • identify key pathways to ensuring actors take key actions

 

3.2 Alignment 3.2 Alignment

3.2.1 Guiding Questions 3.2.1 Guiding Questions

 

    • What is AI alignment?
    • Why is alignment such a difficult problem?
    • What can we learn from alignment issues in other contexts?

3.2.2 Readings (Recommended and Assigned) 3.2.2 Readings (Recommended and Assigned)

  •  

Required Readings

  • Jaime Fisac et al., Pragmatic-Pedagogic Value Alignment (2007), https://arxiv.org/pdf/1707.06354.pdf [8 pages]
    • Article concerns robots and includes a study; would suggest this as a recommended reading instead of a required reading.
  • [Start at Alignment] Dan Hendrycks, ML Safety Newsletter #6 (Oct. 13, 2022), https://open.substack.com/pub/mlsafety/p/ml-safety-newsletter-6?utm_campaign=post&utm_medium=web
    • Goes into the mechanics of things, provides charts and graphs. May be useful in answering guiding questions. 
  • Betty L. Hou & Brain P. Green, A Multilevel Framework for the AI Alignment Problem, Markkula Center, (Jul. 25, 2022), https://www.scu.edu/ethics/focus-areas/technology-ethics/resources/a-multilevel-framework-for-the-ai-alignment-problem/
    • Extremely helpful in defining what AI alignment is and provides framework; would definitely be useful in answering the guiding questions (easy to understand.)

  • Christoph Winter et al., Value Alignment for Advanced Artificial Judicial Intelligence, 60 Am. Phil. Quart. 187 (2023), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4252645 [16 pages].
    • Very informative article on judicial contexts of AI usage. Beneficial to see how AI can be utilized in legal contexts; would provide for a fascinating in class conversation about whether or not AI should be considered in judicial proceedings/ if so/ what implications come along with it, and how should possible issues be resolved.
  • [Skim] Andreas Kopf et al., OpenAssistant Conversations—Democratizing Large Language Model Alignment (2023), https://arxiv.org/abs/2304.07327
    • Risk section beginning on page 11 is useful in regards to human interaction; would be helpful in terms of alignment. 

Recommended

  • Jan Leike et al., Our Approach to Alignment Research, OpenAI (Aug. 24, 2022), https://openai.com/blog/our-approach-to-alignment-research
    • One particular company's approach to AI alignment; recommend skimming this article for some general insight. 
  • Ondrej Bajgar & Jan Horenovsky, Negative Human Rights as a Basis for Long-term AI Safety and Regulation, JAIR (2023), https://doi.org/10.1613/jair.1.14020
    • Start on page 1054-1063; discusses human rights risks with AI. 
  • Anton Korinek & Avital Balwit, Aligned With Whom?, Brookings, https://www.brookings.edu/wp-content/uploads/2022/05/Aligned-with-whom-1.pdf
    • Suggest making this article required to skim; has a good general overview of alignment and some key term explanation.
  • Azim Shariff et al., Whose Life Should Your Car Save?, New York Times (Nov. 3, 2016),https://www.nytimes.com/2016/11/06/opinion/sunday/whose-life-should-your-car-save.html
    • Would allow for good in class conversation topics -- how do students feel about driverless cars? How would they answer the question presented in the article? 

3.2.3 Exercise 3.2.3 Exercise

 

    • Threat practice—in groups, analyze the following regarding the threat of misaligned AI:
      • Identify key risk factors that impact probability of the threat;
      • identify key actors that could shape these factors;
      • identify key actions these actors could take to shape these factors; and
      • identify key pathways to ensuring actors take key actions

3.3 Power Seeking and Enfeeblement 3.3 Power Seeking and Enfeeblement

3.3.1 Guiding Questions 3.3.1 Guiding Questions

  • What's another example of technology-induced enfeeblement? What permitted such enfeeblement and what, if anything, could have been done to slow or prevent such enfeeblement?
  • Can you explain why even giving AI the "right" goals may still lead to power-seeking behavior?

3.3.2 Readings (Recommended and Assigned) 3.3.2 Readings (Recommended and Assigned)

Required Readings

  • Joseph Carlsmith, Is Power-Seeking AI an Existential Risk?, https://arxiv.org/abs/2206.13353
    • Beneficial to answer guiding questions. 
  • Skyscape, Does Artificial Power Corrupt Absolutely? GPT-4 Isn't Saying, https://spyscape.com/article/do-ais-seek-power
    • Fun article on AI safety, breaks it down in a digestible manner using Pac Man as an example; discusses AI safegaurding. 
      • Could lead to an in class discussion about how certain commands could lead to power seeking; brings up a question of whether certain words may trigger power seeking?
  • Victoria Krakovna & Janos Kramar, Power-Seeking Can be Probable and Predictive for Trained Agents (2023), https://arxiv.org/pdf/2304.06528v1.pdf
    • Study done; does not provide a lot of information. Would suggest this as recommended and not required. 
  • Holden Karnofsky, AI Could Defeat All of Us Combined, Cold Takes (Jun. 9, 2022), https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/
    • There are questions within the article to be discussed which may spark in class discussion; convenient that there is a recording so students can listen to the article.

Recommended

  • Jan Leike et al., AI Safety Gridworlds, https://arxiv.org/pdf/1711.09883.pdf
    • Middle of page 2 until page 15. There are possible discussion questions on page 15 of the article.
  • Migle Laukyte, Averting Enfeeblement and Fostering Empowerment: Algorithmic Rights and the Right to Good Administration, 46 Comp. L. & Sec. Rev. (2022), https://www.sciencedirect.com/science/article/pii/S0267364922000607
    • 2.1-2.6 seem to be the most helpful sections of the article.
  • Ian Sample, Human Compatible by Stuart Russell review—AI and Our Future, Guardian (Oct. 24, 2019), https://www.theguardian.com/books/2019/oct/24/human-compatible-ai-problem-control-stuart-russell-review
    • Quick and easy read

 

3.3.3 Exercise 3.3.3 Exercise

 

    • Individuals who completed Threat Exercise 1 will share their “key actions” for actors to mitigate exacerbating risk factors
    • Groups will select what they deem the most efficacious action and share that action with the class