The alignment problem is a challenge of ensuring that artificial intelligence systems, especially those that are powerful and general-purpose, act in accordance with human values and intentions. It is a multidisciplinary field that involves computer science, ethics, psychology, and law. Some of the subproblems of the alignment problem include:
- How to define and measure human values and preferences in a way that can be understood and optimized by AI systems.
- How to avoid or mitigate undesirable side effects, biases, or errors in AI systems that may harm humans or the environment.
- How to ensure that AI systems are transparent, accountable, and trustworthy, and that humans can understand and control their behavior.
- How to prevent or resolve conflicts between different AI systems or between AI systems and humans, especially in cases of moral dilemmas or existential risks.
The alignment problem is an important and urgent issue, as AI systems become more capable and ubiquitous, and may have significant impacts on human society and well-being. There are many researchers and organizations working on various aspects of the alignment problem, such as OpenAI, DeepMind, the Partnership on AI, and the Center for Human-Compatible AI.
Source: Conversation with Bing, 03/11/2023
- Goodhart’s law
Goodhart’s law is a principle that states that when a measure becomes a target, it ceases to be a good measure. This means that if a certain indicator is used to evaluate the performance of a system or a policy, it will lose its reliability as people will try to manipulate it to achieve their desired outcomes. For example, if a school uses test scores to measure the quality of education, teachers and students may focus on teaching and learning to the test, rather than acquiring the skills and knowledge that the test is supposed to measure. Goodhart’s law was proposed by British economist Charles Goodhart in 1975, based on his observations of the effects of monetary policy in the United Kingdom.