Red Hat, NIH Make the Case for Model Meshes to Scale Agentic AI With Trust and Transparency

As agencies race to put generative artificial intelligence (AI) into production, many are confronting the same challenge: The bigger and more “all-in-one” the model gets, the harder it becomes to predict outcomes, explain decisions, and demonstrate accountability.

That challenge – and a blueprint for addressing it – was at the center of a Red Hat Government Symposium session, “From Monoliths to Model Meshes: Achieving Predictability, Transparency, and Scale with Agentic AI,” moderated by Ben Cushing, chief architect for health and life sciences at Red Hat, with Susan Gregurick, associate director of data science at the National Institutes of Health (NIH).

Gregurick framed monolithic GenAI systems as powerful but structurally mismatched to high-stakes public-sector environments where auditability and traceability are required.

“Monolithic GenAI systems are a static snapshot in time,” she said. “And because of that, there’s going to be some systematic uncertainty.”

The issue isn’t only that models age. Gregurick also pointed to the inherent variability of AI outputs: “These are stochastic processes, and so there’s going to be algorithmic randomness that is going to also limit our predictability.”

Gregurick also noted that monolithic systems often make transparency and traceability difficult by design.

“We don’t often know how the large language models are trained, what the training parameters are, what the training data are,” she said. “It’s often very difficult to track that.”

She also pointed to a practical operational gap for regulated missions: “There’s almost no audit capabilities. So traceability is a particular issue in terms of … understanding the conclusion and how the results were determined.”

That has direct consequences for trust in AI outputs. “Every time there’s a sort of a hallucination or [an] error, there’s a lack of trust,” Gregurick said.

In government, where systems must withstand scrutiny and drive repeatable outcomes, unpredictability and lack of audit capabilities quickly become a governance problem. Cushing summarized the gap: “It’s like the data governance itself is missing.”

Policy friction and technical fragmentation collide

For NIH, the AI transparency challenge collides with another reality: Biomedical data is both abundant and difficult to safely reuse at scale, particularly across the agency’s ecosystem of intramural teams and extramural partners.

“At NIH, we often have a patchwork of different data systems that have very complicated data sharing and data access requirements, so it’s not uniform,” Gregurick said.

A major constraint, she noted, comes from consent. “Whatever was in the original IRB (Institutional Review Board) and the original informed consent … takes precedence on how that data is shared,” she said. But “it’s not a uniform, standardized IRB consent process.”

This makes secondary reuse of data – including training AI – especially difficult, Gregurick noted. “It’s incompatible,” she said. “That is a big problem.”

NIH is also working through the mechanics of controlled access. It does not have a single portal for access to all NIH data that could be used for training algorithms, she said.

“Many of our data resource repositories are siloed. They’re in different storage systems,” Gregurick said. “The data is a little bit of heterogeneous wild west.”

And when data sharing is allowed, moving it is often economically unrealistic. Cushing emphasized data gravity, noting that moving petabytes of data “into an extramural research facility is not only time-consuming, but incredibly costly.”

Gregurick agreed and described NIH’s push to keep compute closer to where data already lives. “We absolutely do not want to see data copied across different platforms,” she said. “When you’re working across clouds, there are ways to compute without moving the data.”

Agentic microservices address some tech and policy challenges

The answer to many of these technical and policy challenges is to decompose monolithic AI into modular components – a model mesh of smaller agents and services that can be versioned, governed, and audited, Cushing and Gregurick observed.

Cushing described the goal as breaking GenAI down “into agentic microservices, or micromodel services” to improve scientific reproducibility, auditability, and transparency.

Modularity creates practical control points that monoliths lack, Gregurick noted. “We have the ability to really do a lot of version control in those modules,” she said. “Researchers can cite the exact version of the agent used. And that will allow others to reproduce this work.”

She also tied microservices to reduced risk: “Agentic microservices might reduce the role of hallucinations,” and can support model lineage to “clearly delineate how the data and the training history from each specific agent is used.”

Cushing connected the approach to a familiar engineering practice: observability. “As soon as I started working [with] microservices and started to look at the logs of the conversation between applications … [I] started to realize that you have a system, a chain of thought, and that produces artifacts you can audit, you can reproduce.”

Applied to agentic AI, he said, “you want to be able to see the conversation that occurs between each agent,” and use those logs to “reproduce it, audit it, whatever we need to do … to build trust.”

Adversarial validation builds guardrails around agentic AI

Cushing and Gregurick also explored oversight mechanisms that can be layered onto agentic systems, including adversarial validation and “model as judge” patterns.

To build trust in agentic AI systems, Cushing and Gregurick pointed to adversarial validation layers that test models before they reach production. For example, “model as judge patterns” can add accountable oversight, Cushing noted. In controlled-access data environments, Gregurick emphasized the value of being able to “understand and intercept input and output” so agencies can catch risky prompts or unsafe responses early, as well as adversarial methods to improve uncertainty quantification and support real-time monitoring of model drift and bias.

At NIH, more than 100 active grant research projects are developing adversarial networks, and especially generative adversarial networks, Gregurick noted. These projects include the development of a scalable risk prediction model that fuses imaging data with non-imaging data to detect pancreatic cancer in asymptomatic patients. It uses an adversarial de-biasing technique so risk scores can be generalized across different patient populations, Gregurick said.

MCP servers could provide AI-native governance

Another emerging piece of the model mesh ecosystem is the use of Model Context Protocol (MCP) servers to let AI agents query data repositories with policy awareness and traceability built in. NIH is piloting the approach with two projects, Gregurick said.

One project is an MCP server that enables agents to search and retrieve metadata across the PubMed, a search engine that provides access to more than 39 million citations of biomedical literature. The other is an effort to use MCP servers with NIH’s RePORTer data, a searchable database of NIH-funded research projects, and link that data to USASpending.gov, which provides data on all federal spending. The aim is to support cross-government portfolio analysis.

For controlled data, Gregurick suggested MCP-style access could unlock something agencies have struggled to build: embedded governance. “That AI-native governance … that’s something that we aren’t doing right now,” she said. “The audibility that would be provided by MPC servers would be outstanding.”

Successful AI scaling depends on design

Ultimately, Cushing and Gregurick suggested, scaling AI responsibly in government will depend less on choosing the model and more on designing the system: modular services, auditable interactions, policy-aware data access, and continuous validation that can show not just an answer, but also how an answer was produced. This approach offers a pragmatic path to operationalizing AI where predictability, transparency, and trust are built in.

Watch the Red Hat Government Symposium session: “From Monoliths to Model Meshes: Achieving Predictability, Transparency, and Scale with Agentic AI,” and explore more sessions from the Red Hat Government Symposium.

Cookie	Duration	Description
AWSALBCORS	7 days	Amazon Web Services set this cookie for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pxhd	1 year	PerimeterX sets this cookie for server-side bot detection, which helps identify malicious bots on the site.

Cookie	Duration	Description
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
_gat	1 minute	Google Universal Analytics sets this cookie to restrain request rate and thus limit data collection on high-traffic sites.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
ln_or	1 day	Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.
pardot	past	The pardot cookie is set while the visitor is logged in as a Pardot user. The cookie indicates an active session and is not used for tracking.
UID	1 year 1 month 4 days	Scorecard Research sets this cookie for browser behaviour research.
vuid	1 year 1 month 4 days	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
__gads	1 year 24 days	Google sets this cookie under the DoubleClick domain, tracks the number of times users see an advert, measures the campaign's success, and calculates its revenue. This cookie can only be read from the domain they are currently on and will not track any data while they are browsing other sites.

Cookie	Duration	Description
anj	3 months	AppNexus sets the anj cookie that contains data stating whether a cookie ID is synced with partners.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
GoogleAdServingTest	session	Google sets this cookie to determine what ads have been shown to the website visitor.
IDE	1 year 24 days	Google DoubleClick IDE cookies store information about how the user uses the website to present them with relevant ads according to the user profile.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
muc_ads	1 year 1 month 4 days	Twitter sets this cookie to collect user behaviour and interaction data to optimize the website.
personalization_id	1 year 1 month 4 days	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
uuid2	3 months	The uuid2 cookie is set by AppNexus and records information that helps differentiate between devices and browsers. This information is used to pick out ads delivered by the platform and assess the ad performance and its attribute payment.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
_mkto_trk	1 year 1 month 4 days	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
__gpi	1 year 24 days	Google Ads Service uses this cookie to collect information about from multiple websites for retargeting ads.