02 Oct The $1 billion effort to ensure AI doesn’t harm humanity
Anthropic’s founders left OpenAI to create a safe AI company, but it’s a challenging task. They aim to ensure AI systems do what humans want, the second part of this entails developing a system that purposely misleads users in order to gain insights into strategies for averting deception. This project involves a highly capable text model called Claude, similar to OpenAI’s GPT models. The aim is to check if regular methods can’t stop the system from being deceptive, showing where AI safety might need improvement.
Anthropic, a company founded by former OpenAI members, emphasizes safety and ethical considerations in AI development. They are even open to relinquishing control of their company’s board to experts to uphold ethical standards.
However, they believe that true progress in AI safety requires experimentation with advanced models, which necessitates substantial financial investment.
This approach raises questions about whether Anthropic’s efforts make AI safer or just more powerful, possibly speeding up unforeseen repercussions. Anthropic has gained significant funding from Google, Amazon and other investors, and they aim to raise billions more to develop advanced AI models that could have a profound impact on the economy.
Despite concerns about their ambitions and funding. Anthropic operates as a public benefit corporation, offering legal protections to prioritize ethics over profits. They’ve introduced a unique corporate structure, the Long-Term Benefit Trust, which aims to prevent dangerous AI by placing control in the hands of financially disinterested trustees.
It’s important to mention that Anthropic is closely linked to the effective altruism movement, which is all about doing the most good possible. This connection shows their dedication to AI safety through their team and investors.
These relationships became widely known when FTX’s financial information was made public last year. Surprisingly, FTX listed a $500 million investment in Anthropic as an asset. This implies that the investors who were supposedly misled by Bankman-Fried now have a strong interest in seeing Anthropic thrive. The more valuable this investment becomes, the better chance FTX has of repaying its investors and customers who are owed approximately $8 billion.
However, many effective altruists have reservations about Anthropic’s approach. This group has long been connected to the AI safety community, with influential figures like writer Eliezer Yudkowsky and philosopher Nick Bostrom expressing concerns that highly intelligent AI could pose an existential threat to humanity. Their worry comes from the fact that advanced AI may be too intelligent to be precisely controlled by humans, potentially leading to unwanted outcomes ranging from humans living in the AI’s shadow to extinction.
In recent times, the gloomy view of AI, similar to the worries expressed by MIRI founder Yudkowsky, has been replaced by research from labs like Anthropic and OpenAI. These labs, connected to the effective altruism movement, are consistently creating advanced AI instead of just discussing it theoretically.
Some critics worry about the outcomes of this approach. They fear that the desire to stay competitive and attract investment might lead Anthropic to release advanced models too quickly, potentially compromising safety. Academic experts, such as David Krueger, a computer science professor at the University of Cambridge, also express concerns about relying solely on testing advanced models to learn about safety, as it can be difficult to detect deceptive or challenging behavior.
To tackle these worries, some believe that major AI organizations, including Anthropic, should come together and collectively decide to slow down their race to develop more powerful models, focusing on safety research instead.
Anthropic’s AI model, Claude, similar to ChatGPT, as it generates text from prompts. Claude has received great reviews for being “friendly” and effective in working with documents.
Claude’s training differs from ChatGPT, using a technique called “constitutional AI.” This approach incorporates principles, or a “constitution,” and instructs the model to change its responses to align with these principles. One principle is based on the Universal Declaration of Human Rights, promoting equality, freedom and brotherhood. Another principle encourages responses that are least likely to be harmful or offensive to non-Western audiences. This self-critique process helps reduce harmful content generated by the model.
Constitutional AI aims to provide a higher-level understanding of the model’s behavior, making it more interpretable than other advanced AI models. Although it doesn’t provide full transparency at the individual weight or neuron level, it offers a clearer grasp of the system’s functioning and the reasons behind its particular responses to prompts.
Anthropic’s AI approach is less known compared to OpenAI’s ChatGPT. This is mainly because they’re cautious about quickly releasing powerful models and the potential risks. Anthropic decided not to make Claude available to the public to avoid fueling a competitive rush to create more advanced models. Their focus is on safety rather than speedy deployment.
Anthropic believes that their slower rollout is less likely to drive an arms race in AI, as they are not trying to surpass OpenAI but rather to ensure that AI development remains focused on safety. Yet, some critics say that even trailing OpenAI could affect the AI field and speed up the competition.
Despite their different strategies, Anthropic and OpenAI share a history, as all seven of Anthropic’s co-founders previously worked at OpenAI.They point out distinctions in research vision and style as reasons for their departure. Anthropic aims to complement OpenAI’s efforts in AI development by emphasizing safety and pushing the field in a safer direction.
Anthropic’s unique approach to AI development, constitutional AI, focuses on interpretability, making it easier for researchers to understand and track the model’s behavior. This interpretability helps identify issues and refine the model’s performance. However, while OpenAI and other organizations have also embraced mechanistic interpretability, Anthropic believes its approach can further contribute to safety research.
In terms of corporate governance, Anthropic has taken fewer formal steps than OpenAI to establish measures specific to mitigating safety concerns. OpenAI has a charter that commits them to assist a safety-conscious project close to building AGI, rather than competing with it. This approach is designed to prevent a hasty and unsafe race to develop advanced AI.
Anthropic, in contrast, has not made such commitments. Their establishment of the Long-Term Benefit Trust is the most significant step they’ve taken to ensure that their executives and board prioritize the societal impact of their work. However, they have not committed to actions like “merge and assist” or any specific measures if AI approaches human-level capabilities.
“I am quite skeptical about matters related to corporate governance because I believe corporate incentives are often distorted, including our own,” Clark remarks.
After my visit, Anthropic announced a significant partnership with Zoom, the video conferencing company, to integrate Claude into their product. While this aligns with their for-profit goals and the pursuit of investment and revenue, it raises concerns about potential incentives becoming skewed over time.
“If we felt that things were getting close, we might consider actions like ‘merge and assist,’ or if we had something that seemed to generate significant profits to the extent that it disrupted capitalism, we would find a way to distribute those gains equitably, as not doing so would lead to severe societal consequences,” Clark explains. “But I’m not inclined to make many commitments like that because I believe the critical commitments need to be made by governments regarding the actions of private sector actors like us.”
“It’s really strange that this isn’t a government project,” Clark noted at one point. Indeed, it is. Anthropic’s safety mission appears to align more naturally with a government agency than a private company. Would you trust a private pharmaceutical company to conduct safety trials on smallpox or anthrax, or would you prefer a government biodefense lab to handle such work?
Sam Altman, the CEO of OpenAI, under whose leadership the Anthropic team left, Altman has been pushing for the establishment of new government agencies to supervise AI in different nations. This has sparked worries about the possibility of regulatory capture, where Altman’s initiatives might shape policies that discourage new companies from challenging OpenAI’s supremacy. However, it also raises a core question: Why are cutting-edge AI advancements mainly carried out by private firms like Anthropic and OpenAI?
Although academic institutions lack the resources to engage in cutting-edge AI research, federally funded national laboratories such as Lawrence Berkeley, Lawrence Livermore, Argonne, and Oak Ridge have been actively involved in substantial AI advancements. However, their research does not appear, at first glance, to focus on alignment and safety questions to the same degree as Anthropic. Additionally, private sector firms offer significantly higher salaries, making it challenging for government entities to attract top talent. For instance, a recent job listing at Anthropic for a software engineer with a bachelor’s degree and two to three years of experience offered a salary range of $300,000 to $450,000, along with stock options in a rapidly growing billion-dollar company. In contrast, the expected salary range at Lawrence Berkeley for a machine learning scientist with a PhD and two or more years of experience is $120,000 to $144,000.
In a world where AI talent is highly sought after and scarce, it becomes challenging for the government and government-funded organizations to compete. Starting a venture capital-funded company dedicated to advanced safety research may seem like a reasonable choice compared to establishing a government agency for the same purpose, given the better financial incentives and access to high-quality staff.
Some may argue that this situation is acceptable if they don’t consider AI to be particularly dangerous and believe that its benefits outweigh the risks. Yet, for those deeply concerned about safety, which the Anthropic team asserts to be, allowing tech investors and the potential “misaligned motivations” of private companies to impact AI safety initiatives, as Clark suggests, can be risky. The necessity to strike deals with firms like Zoom or Google to sustain the business might encourage the deployment of technology before ensuring its safety. While government agencies have their own incentive-related challenges, they don’t encounter this specific incentive dilemma.
My visit to Anthropic left me with an understanding of why its leaders chose the private sector path. They managed to establish a formidable AI lab in just two years, a relatively optimistic timeline compared to getting Congress to pass a law authorizing a similar government lab. Given these options, I would also opt for a private approach.
However, as policymakers evaluate these companies, Clark’s observation that it’s “strange this isn’t a government project” should be a significant consideration. If cutting-edge AI safety work indeed requires substantial funding, and if it is genuinely one of the most critical missions at this moment, the question of where that funding should come from—public or private interests—demands careful consideration.