As government officials strive for the responsible use of generative AI technologies, public sector leaders on Wednesday said they are focused on testing and evaluating AI models to help strike a balance between regulatory frameworks and innovation.

At the AWS Summit in Washington, D.C., the tech leaders stressed that both industry and the Federal government should be looking to test and evaluate AI models.

“If you can test and evaluate, you can determine: ‘Is this model safe and responsible to deploy or is it not?’” explained Anthropic Head of Global Affairs Michael Sellitto. “I think a system that’s based on actually testing and evaluating and not having like rigid ‘checkbox’ compliance is really key.”

The Department of Defense’s (DoD) Chief Digital and Artificial Intelligence Officer (CDAO) Radha Plumb noted that her office is continually testing and evaluating “to make sure the models are doing what we want.” Additionally, she said her office is testing and evaluating to ensure the models are aligned with the DoD’s Responsible Artificial Intelligence (RAI) Toolkit.

“I think having test and evaluation, having these ongoing conversations where on the government side we can understand where the models can and should be tested – both by the companies and then inside the government – and building that in as part of adoption is part of what proving the value proposition of responsible AI globally is,” Plumb said.

“The most sophisticated customers that we see in this space are the ones that know how to test and evaluate whether the model actually works for their use case,” added Sellitto. “So, I think that the work that DoD is doing here to develop good evaluations for their use cases is the most important thing you can do to not only ensure that the technology is being used responsibly but also speed adoption.”

One entity in the Federal government that is focused on building out concrete ways to test and evaluate AI is the National Institute of Standards and Technology’s (NIST) recently established U.S. AI Safety Institute (AISI).

NIST stood up the AISI at the direction of President Biden to support the responsibilities assigned to the Department of Commerce under the administration’s landmark AI executive order.

NIST AISI Director Elizabeth Kelly said today that a key goal of the new institute is advancing the science of AI safety. It plans to do that through direct testing of AI model systems, focusing especially on “frontier” generative AI models.

“We’re going to be building out a suite of evaluations to use data on how AI models perform, what capabilities they exhibit, and what risks are posed by those capabilities,” Kelly said. “This is going to be an entirely new U.S. government capacity to directly test frontier AI models and systems before deployment across a range of risks.”

“In our initial pilot stage, we’re focused especially on capabilities that could pose a threat to national security,” she added.

For example, Kelly said AISI will look at whether these models could be used to perpetuate more effective cyberattacks or to develop biological weapons.

“We’re also going to be looking at societal harm perpetuated by frontier models and systems,” Kelly said. “And we’re going to be sharing feedback with model developers on where mitigations may be needed prior to deployments.”

Sellitto added that developing a regulatory framework that is based on “empirical data” from test and evaluation of AI models “is the way that you balance innovation and adoption and responsibility.”

Regulatory Outlook

Rep. Ami Bera, D-Calif., a member of the House AI Task Force, pointed out that much of the regulatory work will fall to Congress.

“Right now, Congress really isn’t moving on this … the problem is the states are moving ahead,” Rep. Bera said. “I think it’s real dangerous if you have 50 different frameworks – that could really stifle innovation.”

“I also think we ought to talk to our like-minded allies and think about what’s that framework for folks that share our values. So, again, that we don’t end up a year, five years from now with a set of European guidelines, with a set of American guidelines, and with a set of Asian guidelines,” the congressman said. “Those are conversations that we’re thinking about right now.”

Similarly, Rep. Bera said that California already has a set of privacy laws and stressed that “we can’t have 50 different data privacy laws.”

“I think we’ve got to think about this framework as a Federal framework, and then to the extent possible, as an international framework,” he concluded.

Kelly also emphasized the importance of looking at AI from an international perspective – for both corporate and government leaders.

“It’s part of why the work that we’re doing in the AI Safety Institute is launching a network of AI Safety Institutes globally, where we’re able to get the technical folks in the room and talk about what should test and evaluations look like, what should risk mitigations be – both to make sure that we’re learning from each other, given how quickly it’s moving, but also that we’re moving towards aligned and interoperable standards and evaluation so we can really enable the innovation to thrive,” Kelly said.

“We really encourage folks to think about this in a global perspective and keep an eye on everything that’s happening in so many other countries,” she added.

Read More About
Recent
More Topics
About
Grace Dille
Grace Dille
Grace Dille is MeriTalk's Assistant Managing Editor covering the intersection of government and technology.
Tags