U.S. federal laboratories focused on energy and critical infrastructure (CI) research and development (R&D) are investing in new artificial intelligence (AI) and machine learning (ML) technologies to advance mission objectives. From driving emerging nuclear energy and particle physics capabilities to enabling data analytics at scale, AI/ML capabilities have the potential to help labs move the needle forward on groundbreaking R&D that can help increase our Nation’s energy security and advance our economy.
To take advantage of all AI/ML has to offer and realize economies of scale and enterprise analytics, robust model operations (ModelOps) will be critical. Yet, labs all face a common challenge: addressing the fundamental differences between agile software engineering and AI/ML engineering. Addressing these differences will require the evolution of established development policies and paradigms to handle the nuances of full lifecycle AI model management, from development to deployment to maintenance.
Working closely with AI/ML experts, labs can successfully address key considerations to optimize ModelOps:
Recognize that there is no one-size-fits-all solution to addressing fundamental AI/ML development challenges
Labs will need robust capabilities that help ensure the accuracy and performance of AI/ML tools once they move into production. While out-of-the-box solutions to meet ModelOps needs exist, they come with a significant vendor lock, difficult-to-predict costs, and closed ecosystems. Instead, labs can leverage experts to integrate open, interoperable technologies and capabilities, ensuring their AI/ML tools achieve goals and objectives in line with their vision and mission – but without proprietary tools, vendor lock, or unexpected costs.
Ensure AI/ML development pipelines include vigilant model monitoring and management
Unlike software engineering, AI/ML engineering involves not only producing code – it also includes producing and maintaining the AI/ML model itself, which is subject to change as production data changes and the model is refined over time. Labs can look to trusted experts to ensure their development pipeline includes vigilant model monitoring and management. Doing so will help ensure quality and accuracy of both the model and the overall AI solution – critical for research labs where R&D has implications for energy, security, and our nation’s infrastructure.
Include production-level data during AI/ML model retraining to optimize accuracy
In setting up AI/ML production environments, labs can leverage AI engineering experts to establish model training protocols and choose the types of data used for this purpose. Refining models with production-level data in a continuous and iterative loop will help maximize accuracy and effectiveness by helping the model learn patterns that are newly emerging or did not exist in the training set.
Design a robust feedback approach as part of ModelOps to maximize performance
This ongoing process is tied closely with model monitoring to enable model governance, performance evaluation, and the capability to flag a model for retraining if monitoring data indicates a degradation in accuracy or performance during production use. AI/ML experts play a key role in ensuring this feedback approach is integrated into the overall ModelOps ecosystem.
Prioritize current tools to maximize development environment efficiency
For energy and CI labs and their R&D environments, advanced and accurate AI/ML capabilities are critical in helping to scale capabilities; however, progress in ModelOps should not mean having to relearn how to create AI/ML models. In addition to robust data and model governance, labs will need to work with industry partners to integrate current development tools into their ModelOps development pipeline. Doing so can help avoid the need to set up customized development environments with each new AI/ML project, and the need to retrain the entire workforce on development paradigms for proprietary solutions.
Leverage model management expertise to integrate data analysis capabilities into ModelOps
AI-powered data analysis capabilities are key to scaling R&D for energy and CI projects. However, integrating these capabilities into AI/ML solutions ready for production again requires a shift away from what labs are used to with software engineering, where analytics could just run on top of the development stack. With AI/ML engineering, the model management must be integrated into both the R&D as well as the production environment instead of layered in on top of the deployment pipeline.
By addressing these key considerations for robust ModelOps in consultation with expert, trusted technology integrators, AI/ML projects at our national labs will be poised to flourish – and to drive forward the innovation that their scientists and technologists are focused on.
Learn more about how we are leveraging innovation to advance critical infrastructure.