Rethinking Workload Placement In The Age Of Generative AI

Generative AI workloads require powerful compute and storage, making it more critical than ever for IT leaders to be intentional about how and where they place their application workloads.

IT leaders have made considerable strides modernizing their computing operations to deliver better business outcomes. Less clear is whether IT leaders are prepared for the unprecedented growth and acceleration of generative AI, which is poised to disrupt businesses worldwide.

As these workloads become more embedded throughout enterprises, IT leaders will have to grapple with not only where to run workloads but how to do so for maximum efficiency. This presents both a challenge and opportunity for IT leaders seeking to make their mark on their businesses.

Generative AI software comprise classes of hyperarticulate conversational assistants that generate text, images, video and even software code with user prompts. Microsoft, Google, OpenAI, Midjourney, Stability AI are just a handful of the tech incumbents and startups steering the market.

Hype for these emerging classes of technologies is palpable. Generative AI could add $2.6 trillion to $4.4 trillion annually, boosting the impact of all AI by 15 to 40 percent, according to McKinsey Global Institute. That figure doubles if you count instances where generative AI is embedded into software beyond the 63 business use cases McKinsey analyzed across customer operations, sales and marketing, software development and R&D.

IT leaders must figure out not only how to run generative AI workloads but help implement guardrails that prevent exposure of proprietary data as employees consume these tools.

The Pursuit of Enterprise-grade Generative AI

Most generative AI technologies run in public clouds, which is practical considering that large language models require vast amounts of elastic computing and storage. But as with many technologies popularized among consumers, enterprises will take more differentiated approaches to adoption.

This will include building and running some AI applications in-house to optimize performance and reduce latency or because the apps will include sensitive information, such as proprietary or customer data.

IT leaders may run certain workloads leveraging generative AI algorithms in-house, such as applications that fuel tailored marketing campaigns by personalizing content on the fly based on users’ preference patterns or activity history.

Regulated businesses may deploy their generative AI apps internally for fear of IP leakage and other security and compliance concerns. For instance, banks and healthcare organizations may use generative AI algorithms to comb through copious amounts of data to flag fraudulent activity and answer IT staffs’ questions about suspicious behaviors.

Some enterprises may have more technical reasons to run AI workloads in-house.

Some apps may run better in a corporate datacenter or a colo because they require specific hardware, such as special GPUs or unique memory, storage or networking requirements, that organizations can’t procure from a cloud provider.

Moreover, some engineers prefer to manage their own systems internally so they may customize them to best train and tune their AI models, according to Andreessen Horowitz. Finally, by running apps internally, IT staffers have more fine-grained control over how often they update and tune their secret sauce—the large language models (LLMs) that fuel the predictions.

Added up, generative AI solutions could, in theory, help drive competitive advantages for organizations who get ahead of how to leverage and implement the technologies.

Generative AI in a Multicloud World

Yet as IT leaders experiment with bespoke applications fueled by LLMs they must exhibit caution in how to deploy their workloads.

With the calculus including such considerations as data locality and security, performance and cost efficiency, many organizations are pivoting toward an intentional approach to allocating workloads. In this multicloud-by-design strategy, purpose-built applications run on premises in corporate datacenters or colocation facilities, as well as in public and private clouds—and even in edge environments.

Indeed, 92% of IT decision makers Dell surveyed said that they have a formal strategy for deciding where to place workloads.

Regardless of where IT leaders choose to run their workloads, generative AI requires significant compute resources to train its models and fuel inferencing—the predictions that serve up words or phrases to answer a prompt in ChatGPT or a similar text tool.

Supporting these tasks requires rigor around determining the right technologies and roadmaps to fuel these applications.

Emerging classes of servers feature multiple processors or GPUs to accommodate modern parallel processing techniques, in which workloads are split across multiple cores or devices to speed up training and inference tasks.

Equally important are approaches to help organizations accelerate their AI goals. Inside Dell, we recently announced Project Helix, our collaboration with Nvidia that offers organizations Dell infrastructure, software and services, NVIDIA accelerated computing and software solutions and a comprehensive AI framework so organizations can quickly turn their AI initiatives into results.

Regardless of the path organizations take to building applications that harness the power of generative AI, getting workload placement right is critical. How organizations build, tune and run these algorithms may mean the difference between gaining a competitive edge—or losing one.

The good news? Many organizations are already factoring workload types, operational requirements and costs as part of their multicloud strategy. Choosing the best generative AI use cases, blueprints and supporting infrastructure is the next step for innovation.

Learn more about Project Helix.

Read the full article here