Connecting to Gen AI Models#
Introduction#
ARTKIT model classes support connecting to Gen AI model APIs from various providers. This tutorial walks through the process of setting up various ARTKIT connectors to send requests and receive responses from popular Gen AI models.
We currently support the following model connections:
OpenAI - Text, Image Generation, Vision
Anthropic - Text only
Groq - Text only
Google’s Gemini - Text only
Hugging Face - Text only
Amazon’s Titan - Text only
We also support connectivity to GenAI platforms like: - Amazon Bedrock - Google Vertex AI
Key benefits of using ARTKIT model classes include:
Built-in support for asynchronous requests
Asynchronicity improves the performance of I/O bound tasks like API calls or file operations by allowing them to run concurrently without blocking the execution of other parts of your program.
All ARTKIT model classes are designed for asynchronicity. This design decision promotes asynchronous programming in ARTKIT pipelines and keeps our library minimalistic.
ARTKIT pipelines are capable of handling synchronous steps and users are welcome to define custom synchronous API connectors. However, running ARTKIT synchronously negates the performance gains enabled by the underlying asynchronous computing engine.
Chat history management
Chat history management is essential for realistic multiturn interactions. Without chat history management, a chat model can only respond to the most recent request.
ARTKIT chat model classes manage chat history in the background. See the Multi-Turn Personas tutorial for a demonstration of automated multiturn chats between a challenger bot and a target system.
Additionally, ARTKIT offers the
HistorizedChatModel
wrapper class which manages chat histories for any givenChatModel
Local cache management
Caching Gen AI model responses is useful for ensuring reproducibility and avoiding repetitive API calls which can run up costs and runtime during development.
For technical details on ARTKIT model classes and all supported models, see the API Reference. If you need to connect to an endpoint which is not supported by ARTKIT, see our tutorial on Creating New Model Classes, and please consider Contributing to ARTKIT!
Setup#
To connect to different model providers, you must install the necessary platform-specific libraries:
pip install openai
pip install anthropic
pip install groq
pip install google-generativeai
pip install huggingface
pip install boto3
pip install google-cloud-aiplatform
You must obtain an API Key for each platform you plan to use. Platform-specific links are provided in the sections below. We recommend storing environment variables in a .env
file within your project and loading them using load_dotenv
, as shown below. For a step-by-step guide, see our documentation on Environment Variables.
[1]:
import artkit.api as ak
from dotenv import load_dotenv
load_dotenv();
Important: The
await
keyword can normally only be used within an asynchronous function. In the codeblocks below, we are able to useawait
on its own because Jupyter notebooks implicitly execute cells asynchronously. If running code in a Python script, you must use ARTKIT connectors within the context of asynchonous functions, as demonstrated throughout the documentation (e.g., Building Your First ARTKIT Pipeline)
OpenAI#
To make API calls to OpenAI, you must install the openai
Python library and obtain an OpenAI API key.
ARTKIT connectors currently support three types of OpenAI models:
Chat (LLMs)
Text-to-image (diffusion models)
Image-to-text (vision models)
Below we show how to instantiate each OpenAI model class with caching, send a request, and view the response.
Chat#
Chat models are used to emulate a conversation between a user and language model by utilizing it’s text completion capabilities. The OpenAIChat
class returns an ARTKIT list of strings, which are probable candidates for a conversational reply given the prompt.
[2]:
# Initialize an OpenAI chat model
openai_chat_model = ak.CachedChatModel(
model = ak.OpenAIChat(
api_key_env="OPENAI_API_KEY",
model_id="gpt-3.5-turbo",
system_prompt=None,
temperature=1,
seed=0,
),
database="cache/connecting_to_openai_gpt3-5.db"
)
# Send a message to the chat model and print the response
response = await openai_chat_model.get_response(message="What is 2+2?")
print(response[0])
2 + 2 equals 4.
Text-to-Image#
The OpenAIDiffusion
class returns an ARTKIT Image
object, which has a show
method for visualizing images.
[3]:
# Initialize an OpenAI diffusion model
openai_diffusion_model = ak.CachedDiffusionModel(
model = ak.OpenAIDiffusion(
api_key_env = "OPENAI_API_KEY",
model_id = "dall-e-2",
size = "256x256",
),
database="cache/connecting_to_openai_dalle.db"
)
# Send text to the diffusion model and display the image response
winged_cat = await openai_diffusion_model.text_to_image(text="a cat with wings")
winged_cat[0].show(width=200)
Vision#
Here, we use ARTKIT’s OpenAIVision
class to send an image and a prompt to GPT-4-turbo, which recently enabled image inputs through the API:
[4]:
# Initialize an OpenAI vision model
openai_vision_model = ak.CachedVisionModel(
model=ak.OpenAIVision(
api_key_env="OPENAI_API_KEY",
model_id='gpt-4-turbo'
),
database="cache/connecting_to_openai_gpt4v.db"
)
# Send an image and a prompt to the vision model and print the response
response = await openai_vision_model.image_to_text(
image=winged_cat[0],
prompt="What is this, in 10 words?")
print(response[0])
Mythical winged cat, serene expression, feathers, tail, artistic creature portrayal.
Anthropic#
Anthropic is an AI research company that prioritizes ethics and social impact in their products and services. To interact with the Anthropic API, register an account with Anthropic and generate an API key in your Account Settings. You must also install the anthropic
Python library. Visit Anthropic Models for a list of
available models.
Just as with the OpenAI example ARTKIT offers an AnthropicChat
class that returns a list of possible responses in string format. Below we show how to instantiate and use an Anthropic chat model:
[5]:
# Initialize an Anthropic chat model
anthropic_chat_model = ak.CachedChatModel(
model=ak.AnthropicChat(
api_key_env="ANTHROPIC_API_KEY",
model_id="claude-3-sonnet-20240229",
max_tokens=100,
),
database="cache/connecting_to_anthropic_chat.db"
)
# Send a message to the chat model and print the response
response = await anthropic_chat_model.get_response(message="What is 2+2?")
print(response[0])
2 + 2 = 4
Groq#
Groq is a Gen AI hardware and software platform that prioritizes inference speed. To make API calls to Groq, register an account and create your API key in your GroqCloud console. Groq currently hosts a few popular Gen AI models, including LLaMA3, Mixtral, Gemma, and Whisper. Visit Groq Models for the models you can use with Groq.
Below we show how to instantiate a Groq chat model to interact with LLaMA-8b:
[6]:
# Initialize a Groq chat model
groq_chat_model = ak.CachedChatModel(
model=ak.GroqChat(
api_key_env="GROQ_API_KEY",
model_id="llama3-8b-8192",
),
database="cache/connecting_to_groq_chat.db"
)
# Send a message to the chat model and print the response
response = await groq_chat_model.get_response(message="What is 2 + 2")
print(response[0])
The answer to 2 + 2 is 4.
Google#
Gemini#
Gemini is a family of multimodal Gen AI models available. To interact with the Gemini API, install the Python library google-generativeai and generate a Gemini API key. For a list of currently available Gemini models, visit Gemini Models.
Below we show how to instantiate a Gemini model class to interact with the Gemini API:
[2]:
# Initialize a Gemini chat model
gemini_chat_model = ak.CachedChatModel(
model=ak.GeminiChat(
api_key_env="GEMINI_API_KEY",
model_id="gemini-1.5-pro-latest",
),
database="cache/connecting_to_gemini_chat.db"
)
# Send a message to the chat model and print the response
response = await gemini_chat_model.get_response(message="What is 2 + 2?")
print(response[0])
2 + 2 = 4
Vertex AI#
Vertex AI is Google’s fully-managed AI development platform for building and using Gen AI. Vertex AI gives you access to over 100 models from third-party AI companies, including Anthropic Claude 3.5 Sonnet, Meta Llama 3, and Mistral AI Mixtral 8x7B.
Here’s an example of how to connect:
Authenticating to Vertex AI#
You can authenticate to Vertex AI by setting up Application Default Credentials (ADC) in GCP, which is a strategy used by the authentication libraries to automatically find credentials based on the application environment. To start, make sure you have:
GCP project ID with Vertex AI enabled for billing purposes
Installed the Google Cloud CLI
Next, configure your ADC by following these steps in your terminal (details here):
gcloud init
gcloud auth application-default login
The above commands will setup your ADC and create a credentials.json file that will be saved in these locations depending on your environment.
You are now all set! You should be able to run the below vertex chat model and connect to the model you need.
[ ]:
# Initialize a Gemini chat model
vertex_chat_model = ak.CachedChatModel(
model=ak.VertexAIChat(
model_id="gemini-1.5-pro-latest",
gcp_project_id="gcp-gp-xyz"
),
database="cache/connecting_to_gemini_chat.db"
)
# Send a message to the chat model and print the response
response = await vertex_chat_model.get_response(message="What is 2 + 2?")
print(response[0])
Note for both Gemini and Vertex AI:
The system prompt should only be initialized when you initialize an instance of the chat model. The system prompt will remain for all subsequent conversations with the chat model and should not be updated after you create the instance. Here’s an example of correct and incorrect usage:
[ ]:
#setup code:
FIRST_SYSTEM_PROMPT = "Your name is first bot, and you are helpful, creative, clever, and very friendly."
SECOND_SYSTEM_PROMPT = "Your name is second bot, and you are mean, snarky and misleading."
async def ask_vertex(question: str, llm: ak.ChatModel):
for response in await llm.get_response(message=question):
yield dict(response=response)
prompt = {"question": "Why is the sky green?"}
[ ]:
#correct way to handle system prompts, each chat model has its own system prompt
first_model = ak.VertexAIChat(
model_id="gemini-1.5-pro-001", gcp_project_id="gcp-gp-abc-123"
).with_system_prompt(FIRST_SYSTEM_PROMPT)
second_model = ak.VertexAIChat(
model_id="gemini-1.5-pro-001", gcp_project_id="gcp-gp-abc-123"
).with_system_prompt(SECOND_SYSTEM_PROMPT)
first_step = ak.step("ask_first", ask_vertex, llm=first_model)
second_step = ak.step("ask_second", ask_vertex, llm=second_model)
results = ak.run(input=prompt, steps=ak.chain(first_step, second_step))
[ ]:
#wrong way to handle system prompts, the second step will use the first system prompt
shared_model = ak.VertexAIChat(
model_id="gemini-1.5-pro-001", gcp_project_id="gcp-gp-abc-123"
)
first_step = ak.step("ask_first", ask_vertex, llm=shared_model.with_system_prompt(FIRST_SYSTEM_PROMPT))
second_step = ak.step("ask_second", ask_vertex, llm=shared_model.with_system_prompt(SECOND_SYSTEM_PROMPT))
results = ak.run(input=prompt, steps=ak.chain(first_step, second_step))
Amazon Bedrock - Titan#
Bedrock is a LLM platform supported by AWS and hosts a variety of Gen AI Foundational models. To interact with Bedrock, you will need an appropriate AWS account and a specific IAM user with the permissions associated with Bedrock. Visit Getting Started for additional details on getting started with AWS
Below, we show how to instantiate a Titan model class with the Bedrock API:
[3]:
import json
# Initialize a Titan chat model
titan_bedrock_model = ak.CachedChatModel(
model=ak.TitanBedrockChat(
model_id = 'amazon.titan-text-lite-v1',
region='us-east-1'),
database="cache/connecting_to_titan1.db",
)
# Send a message to the chat model and print the response
response = await titan_bedrock_model.get_response(json.dumps({"inputText": "What is the best Javascript framework?"}))
print(response[0])
{"inputText": "What is the best Javascript framework?"}
{}
There are many frameworks available for JavaScript development. Some of the popular ones are AngularJS, ReactJS, VueJS, and EmberJS. Each framework has its own strengths and weaknesses, and the best choice depends on the specific needs of the project. AngularJS is known for its strong MVC structure, reactive programming, and large ecosystem of libraries and tools. ReactJS is known for its component-based architecture, declarative programming, and performance. VueJS is known for its simplicity, flexibility, and performance. EmberJS is known for its component-based architecture, declarative programming, and integration with other frameworks.
Hugging Face#
Hugging Face is a popular platform for hosting open source models, datasets, and code. To make API calls to HuggingFace, you must first register an account with Hugging Face and create a Hugging Face API key in your profile settings. You must also install the huggingface
Python library.
Given the diversity of models hosted on Hugging Face, it is important to read the documentation for individual models to understand the expected input and output formats, model-specific parameters, limitations, and best practices. See Hugging Face Models to browse the models available on Hugging Face.
Note that some models require you to request access by visiting the model’s repository site. Gated or private models require running the following:
[8]:
# Needed to access gated or private models on the Hugging Face Hub
from huggingface_hub import login
login()
Chat templates#
A key concept for working with Hugging Face models is chat templates, which specify how to convert conversation histories into a format which works for a specific model. A conversation history is comprised of a list of messages, where each message is a dictionary containing role
and content
keys. For example:
[9]:
messages = [
{"role": "system",
"content": "You are a friendly assistant."},
{"role": "user",
"content": "What is 2+2?"},
{"role": "assistant",
"content": "2 + 2 = 4"},
{"role": "user",
"content": "Looks correct, thanks."},
]
Individual LLMs are trained to recognize chat histories in a particular format. The Hugging Face standard for chat templates is to represent them as Jinja templates, which look like this:
{% for message in messages %}
{% if message['role'] == 'user' %}
{{ ' ' }}
{% endif %}
{{ message['content'] }}
{% if not loop.last %}
{{ ' ' }}
{% endif %}
{% endfor %}
{{ eos_token }}
There are two types of Hugging Face models:
Models with native chat templates:
The chat template will be specified as the
chat_template
parameter inside the model’stokenizer_config.json
file.For example, the chat template for the
zephyr-7b-beta
model can be found here.
Models with empty chat templates:
For many models on Hugging Face, the
chat_template
parameter in thetokenizer_config.json
file is empty.Usually, the chat template can be found elsewhere in the documentation, e.g., on the model card page.
However, if you’re unable to find any chat template, make sure that the model you’re trying to use is actually trained for chat applications
Below, we show how to work with both types of models. We’ll start with the first option, as it is the easiest:
Native chat templates#
We’ll demonstrate usage of a native chat template using the zephyr-7b-beta
model referenced above. The fact that the chat template is specified here in the tokenizer_config.json
allows us to set up a Hugging Face connector like this:
[10]:
hf_chat_native = ak.HuggingfaceChat(
model_id="HuggingFaceH4/zephyr-7b-beta",
api_key_env="HF_TOKEN",
chat_template_params={"add_generation_prompt": True}
)
Note the usage of the add_generation_prompt
argument. The generation prompt argument adds a token to the end of each message to indicate that the bot should start its response. This helps ensure the model generates the next response in the conversation, which is useful for conversational applications of LLMs.
Now let’s get a response from the model:
[11]:
response = await hf_chat_native.get_response("What is 2 + 2?")
print(response[0])
/Users/lontkealexander/.pyenv/versions/rai_red/lib/python3.12/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
4 is the answer to the mathematical expression "2 + 2". This is a basic arithmetic operation called addition, where you add two numbers (2 in this case) to get a sum (4).
This approach works for models that have a chat template defined in their tokenizer_config.json
file, but what if the chat template parameter is empty? That’s when custom chat templates are needed.
Custom chat templates#
For models without a chat template specified in the tokenizer_config.json
file, it is essential to determine the expected template from another source. If you cannot find and specify a correct chat template, then the model is likely to return unexpected and unpredictable responses.
We saw in the previous section that the zephyr-7b
model has a native template defined, so we don’t need to specify a custom template. However, for illustrative purposes, we will define and use a custom chat template for zephyr-7b
using the ARTKIT HuggingfaceChat
connector and confirm that it produces the same output as the native template approach.
Here is the zephyr-7b
chat template defined as a Python string:
[12]:
zephyr_template = (
"{% for message in messages %}\n{% if message['role'] == 'user' %}\n"
"{{ '<|user|>\n' + message['content'] + eos_token }}\n"
"{% elif message['role'] == 'system' %}\n"
"{{ '<|system|>\n' + message['content'] + eos_token }}\n"
"{% elif message['role'] == 'assistant' %}\n"
"{{ '<|assistant|>\n' + message['content'] + eos_token }}\n"
"{% endif %}\n"
"{% if loop.last and add_generation_prompt %}\n"
"{{ '<|assistant|>' }}\n"
"{% endif %}\n"
"{% endfor %}"
)
Now, we pass the Jinja template to the chat_template
argument when we set up our HuggingfaceChat
connector. Note that this argument was not used in the previous section because we used the native template. When we execute this code, ARTKIT will show a warning alerting us that we are overriding the native chat template with our custom template:
[13]:
hf_chat_custom = ak.HuggingfaceChat(
model_id="HuggingFaceH4/zephyr-7b-beta",
api_key_env="HF_TOKEN",
chat_template=zephyr_template,
chat_template_params={"add_generation_prompt": True}
)
Now let’s get a response from the model. Since our custom chat template is identical to the native template for zephyr-7b
, the response should be identical:
[14]:
response = await hf_chat_custom.get_response("What is 2 + 2?")
print(response[0])
Tokenizer (HuggingFaceH4/zephyr-7b-beta) natively supports chat templates, but will be overridden by the given custom chat template.
4 is the answer to the mathematical expression "2 + 2". This is a basic arithmetic operation called addition, where you add two numbers (2 in this case) to get a sum (4).
Looks like it worked!
Chat templates are one of the more challenging aspects of working with Hugging Face models, but understanding how they work and using them correctly is necessary for anyone who wishes to tap into the vast Gen AI model landscape available on Hugging Face.
Concluding Remarks#
We hope this tutorial helps ARTKIT users to leverage our asynchronous model connectors with confidence. We aim to keep this notebook updated with the latest ARTKIT model classes, but please note that the API Reference always contains the most comprehensive list of connectors and capabilities.
If you find any bugs or feature limitations, please open an issue and we will follow up as soon as possible. If you need to a connector which is not supported by ARTKIT, our tutorial on Creating New Model Classes provides guidance on how to build a custom connector class. If you create a connector which is likely to be useful for others, consider Contributing to ARTKIT!