AWS Summit: Fed Leaders Focused on Testing, Evaluating AI Models

As government officials strive for the responsible use of generative AI technologies, public sector leaders on Wednesday said they are focused on testing and evaluating AI models to help strike a balance between regulatory frameworks and innovation.

At the AWS Summit in Washington, D.C., the tech leaders stressed that both industry and the Federal government should be looking to test and evaluate AI models.

“If you can test and evaluate, you can determine: ‘Is this model safe and responsible to deploy or is it not?’” explained Anthropic Head of Global Affairs Michael Sellitto. “I think a system that’s based on actually testing and evaluating and not having like rigid ‘checkbox’ compliance is really key.”

The Department of Defense’s (DoD) Chief Digital and Artificial Intelligence Officer (CDAO) Radha Plumb noted that her office is continually testing and evaluating “to make sure the models are doing what we want.” Additionally, she said her office is testing and evaluating to ensure the models are aligned with the DoD’s Responsible Artificial Intelligence (RAI) Toolkit.

“I think having test and evaluation, having these ongoing conversations where on the government side we can understand where the models can and should be tested – both by the companies and then inside the government – and building that in as part of adoption is part of what proving the value proposition of responsible AI globally is,” Plumb said.

“The most sophisticated customers that we see in this space are the ones that know how to test and evaluate whether the model actually works for their use case,” added Sellitto. “So, I think that the work that DoD is doing here to develop good evaluations for their use cases is the most important thing you can do to not only ensure that the technology is being used responsibly but also speed adoption.”

One entity in the Federal government that is focused on building out concrete ways to test and evaluate AI is the National Institute of Standards and Technology’s (NIST) recently established U.S. AI Safety Institute (AISI).

NIST stood up the AISI at the direction of President Biden to support the responsibilities assigned to the Department of Commerce under the administration’s landmark AI executive order.

NIST AISI Director Elizabeth Kelly said today that a key goal of the new institute is advancing the science of AI safety. It plans to do that through direct testing of AI model systems, focusing especially on “frontier” generative AI models.

“We’re going to be building out a suite of evaluations to use data on how AI models perform, what capabilities they exhibit, and what risks are posed by those capabilities,” Kelly said. “This is going to be an entirely new U.S. government capacity to directly test frontier AI models and systems before deployment across a range of risks.”

“In our initial pilot stage, we’re focused especially on capabilities that could pose a threat to national security,” she added.

For example, Kelly said AISI will look at whether these models could be used to perpetuate more effective cyberattacks or to develop biological weapons.

“We’re also going to be looking at societal harm perpetuated by frontier models and systems,” Kelly said. “And we’re going to be sharing feedback with model developers on where mitigations may be needed prior to deployments.”

Sellitto added that developing a regulatory framework that is based on “empirical data” from test and evaluation of AI models “is the way that you balance innovation and adoption and responsibility.”

Regulatory Outlook

Rep. Ami Bera, D-Calif., a member of the House AI Task Force, pointed out that much of the regulatory work will fall to Congress.

“Right now, Congress really isn’t moving on this … the problem is the states are moving ahead,” Rep. Bera said. “I think it’s real dangerous if you have 50 different frameworks – that could really stifle innovation.”

“I also think we ought to talk to our like-minded allies and think about what’s that framework for folks that share our values. So, again, that we don’t end up a year, five years from now with a set of European guidelines, with a set of American guidelines, and with a set of Asian guidelines,” the congressman said. “Those are conversations that we’re thinking about right now.”

Similarly, Rep. Bera said that California already has a set of privacy laws and stressed that “we can’t have 50 different data privacy laws.”

“I think we’ve got to think about this framework as a Federal framework, and then to the extent possible, as an international framework,” he concluded.

Kelly also emphasized the importance of looking at AI from an international perspective – for both corporate and government leaders.

“It’s part of why the work that we’re doing in the AI Safety Institute is launching a network of AI Safety Institutes globally, where we’re able to get the technical folks in the room and talk about what should test and evaluations look like, what should risk mitigations be – both to make sure that we’re learning from each other, given how quickly it’s moving, but also that we’re moving towards aligned and interoperable standards and evaluation so we can really enable the innovation to thrive,” Kelly said.

“We really encourage folks to think about this in a global perspective and keep an eye on everything that’s happening in so many other countries,” she added.

Cookie	Duration	Description
AWSALBCORS	7 days	Amazon Web Services set this cookie for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pxhd	1 year	PerimeterX sets this cookie for server-side bot detection, which helps identify malicious bots on the site.

Cookie	Duration	Description
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
_gat	1 minute	Google Universal Analytics sets this cookie to restrain request rate and thus limit data collection on high-traffic sites.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
ln_or	1 day	Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.
pardot	past	The pardot cookie is set while the visitor is logged in as a Pardot user. The cookie indicates an active session and is not used for tracking.
UID	1 year 1 month 4 days	Scorecard Research sets this cookie for browser behaviour research.
vuid	1 year 1 month 4 days	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
__gads	1 year 24 days	Google sets this cookie under the DoubleClick domain, tracks the number of times users see an advert, measures the campaign's success, and calculates its revenue. This cookie can only be read from the domain they are currently on and will not track any data while they are browsing other sites.

Cookie	Duration	Description
anj	3 months	AppNexus sets the anj cookie that contains data stating whether a cookie ID is synced with partners.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
GoogleAdServingTest	session	Google sets this cookie to determine what ads have been shown to the website visitor.
IDE	1 year 24 days	Google DoubleClick IDE cookies store information about how the user uses the website to present them with relevant ads according to the user profile.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
muc_ads	1 year 1 month 4 days	Twitter sets this cookie to collect user behaviour and interaction data to optimize the website.
personalization_id	1 year 1 month 4 days	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
uuid2	3 months	The uuid2 cookie is set by AppNexus and records information that helps differentiate between devices and browsers. This information is used to pick out ads delivered by the platform and assess the ad performance and its attribute payment.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
_mkto_trk	1 year 1 month 4 days	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
__gpi	1 year 24 days	Google Ads Service uses this cookie to collect information about from multiple websites for retargeting ads.