Selecting the right generative AI model.

It seems like every other day there are new AI models being released. While we’ve spoken to many CEOs and tech leaders who are looking to deploy AI within their organizations, many are overwhelmed by the sheer number of options now available. As of February 2024, Hugging Face had nearly 500K different AI models listed. In truth, most of the models out there are good enough for the majority of applications. We believe that most decision makers can radically simplify this decision by focusing on two factors: licensing and size.

*Figure 1: A snapshot taken in Feb 2024. Back in October, there were only 320K models listed.*

Licensing
Like most enterprise software, Large Language Models (LLMs) are now offered in two primary types of licenses: proprietary and open source. Each path offers distinct advantages:

Proprietary Models: Speed, Scalability, Simplicity
Proprietary LLMs, such as GPT-4 or Claude 2, are built to be as turn-key as possible. These models have APIs that are well-documented, and come with tooling to facilitate deployment. This means your team can get up and running far more quickly, as they won’t need to deal with details like hosting, moderation guardrails, security and observability. Scaling up your AI application is also straightforward, as you simply pay more as you use it more. Finally, the complexity of the model itself is abstracted behind an API, reducing the need for extensive technical expertise.

That said, you inevitably sacrifice customizability and control. While you’ll be able to modify the architecture surrounding the model and take advantage of techniques like Retrieval Augmented Generation, you won’t be able to fine tune the model itself. Your application is also beholden to the model provider, who may choose to modify the service or pricing without your consent.

Open Source Models: Flexibility, Control, Community
Open source LLMs such as Llama 2 or BLOOM represent a more bespoke approach. One key advantage is flexibility, as businesses can modify and adapt these models to their specific requirements. You can tune and tweak these models as you see fit, adjusting not only the output but also optimizing response times. The second major advantage is the ability to mitigate data and privacy concerns. Since you’re the one operating the model, you maintain control over where the data sits and how it is or isn’t consumed.

While you benefit from the collective expertise of the community developing these open source models, the usage of open source LLMs generally requires deeper technical expertise. You will also need to commit to setting up, developing and maintaining the model itself in addition to your application.

Size
Generally speaking, the larger the model, the more capable it will be. The size of a model is typically measured by the number of parameters it contains. This latest wave of generative AI has been driven by large language models because they contain far more parameters as a result of having far more compute and training data.

*Figure 2: Model sizes are growing exponentially. For context, AlexNet is 62 million parameters whereas BERT has 345 million. Source: OpenAI*

Within the realm of large language models, there’s still a large spread in terms of size. To give you a sense, Microsoft’s Phi-2 (with 2.7 billion parameters) and Mistral’s 7B model (containing 7 billion parameters) are on the smaller size. As size continues to scale, you’ll see models like Llama-2 (70B parameters) and Falcon 180B (180B parameters). At the very top sits OpenAI’s GPT-4, which is estimated to contain an astounding 1.76 trillion parameters.

Bigger however, is not always better. First of all, the largest models may be overkill for your needs. Many of the largest models are multi-modal, in that they’re designed to handle multiple data types (e.g. text, images, video, audio, etc). Many AI applications will not need that full spectrum of capability, so why pay for it? Second, the larger models tend to have longer inference times - that is, the lag from when you input instructions to when the model returns a response. It’s well understood that page load times have a significant impact on the performance of web applications like e-commerce. Many AI applications will similarly benefit from faster inference times. Finally, smaller models are faster and cheaper to train on your proprietary data sets compared to larger models.

*Figure 3: Using the same hardware in column 3, the 7 billion parameter model is nearly 6x faster than the 70 billion parameter model. Source: Databricks*

What does this all mean in practice?
The right model choice will depend heavily on how far along you are in your AI journey, and the needs of your specific use case. At Eskridge, we believe that you should always start small with a pilot or proof-of-concept. The best models for pilots are the large, proprietary ones like GPT-4. Your team will be able to stand up something quicker and more cost effectively and they’ll be using a more powerful model, so the outputs will be more likely to meet expectations and drive positive ROI. Finally, you most likely won’t need significant customization at this stage.

Once you’ve established the value of your AI pilot and start to scale it to production, we believe the timing is right to consider an open source model to optimize your application. At a higher scale, the performance nuances of your AI will matter more. For instance, the cost per inference isn’t material for a few uses a day, but as the number of inferences grows to the thousands per day the cost will add up. Or you may start to see the limits of the quality in response you can get from an off-the-shelf model and wish to explore how much better output you could get from a model tuned on your own data sets. In these situations, you’ll want to use the smallest open source model that gives you comparable results, for all the reasons outlined above. Many modern applications are built in a modular fashion that you can take a more bespoke route to AI down the road, once the associated costs become clearly justified.

Of course, there are exceptions to every rule. There are certain niche use cases where it would make sense to use an open source or small model right out of the gate. For one, if your use case requires handling extremely sensitive data, you could decide that even the limited exposure of that data to 3rd parties during a pilot is simply unacceptable. This is a common scenario encountered in verticals like healthcare or defense. Second, your application may require running the model on local hardware like a phone or IOT device. In these instances, proprietary models by definition won’t be an option, and you won’t have sufficient compute to run the larger open source models. Finally, if your use case requires a very specific type of output, you may find a need to fine tune the model upfront, in which case you’ll find yourself reaching for open source.

We do expect to see the practical differences between open/closed and large/small models to diminish over time. Just as the proprietary players are slowly opening up their walled gardens to give customers finer grained control, new AI infrastructure companies like Baseten are emerging to make it much easier to deploy open source models. But for the foreseeable future, you’ll want to pilot with large proprietary models like OpenAI’s GPT-4 and over time migrate to custom implementation of a smaller, open source model like Mistral 7B.

Selecting the right generative AI model.

Mitigating the Risks of AI

Benchmarks for AI-driven productivity gains.