artkit.model.llm.huggingface.HuggingfaceChatLocal#

class artkit.model.llm.huggingface.HuggingfaceChatLocal(*, model_id, api_key_env=None, use_cuda=False, **model_params)[source]#

Huggingface model base local chat.

Bases:

HuggingfaceLocalConnectorMixin, HuggingfaceChatConnectorMixin [AutoModelForCausalLM]

Metaclasses:

ABCMeta

Parameters:

model_id (str) – the ID of the model to use
api_key_env (Optional[str]) – the environment variable that holds the API key; if not specified, use the default API key environment variable for the model as returned by get_default_api_key_env() (defaults to 5) parameters with None values are considered unset and will be ignored
use_cuda (bool) – if True, use CUDA; if False, use CPU
model_params (Any) – additional parameters for the model

Raises:

RuntimeError – if CUDA is requested but not available

Method summary

`get_api_key`	Get the API key from the environment variable specified by `api_key_env`.
`get_client`	Get a shared client instance for this connector; return a cached instance if for this model's API key if available.
`get_default_api_key_env`	Get the default name of the environment variable that holds the API key.
`get_model_params`	Get the parameters of the model as a mapping.
`get_response`	Get a response, or multiple alternative responses, from the chat system.
`get_tokenizer`	The tokenizer used by this connector.
`to_expression`	Render this object as an expression.
`with_system_prompt`	Set the system prompt for the LLM system.

Attribute summary

`model_id`	The ID of the model to use.
`system_prompt`	The system prompt used to set up the LLM system.
`use_cuda`	If `True`, use CUDA; if `False`, use CPU
`chat_template`	The chat template to use for the model; `None` if using the default template.
`chat_template_params`	Additional chat template parameters.
`api_key_env`	The environment variable that holds the API key.
`initial_delay`	The initial delay in seconds between client requests.
`exponential_base`	The base for the exponential backoff.
`jitter`	Whether to add jitter to the delay.
`max_retries`	The maximum number of retries for client requests.
`model_params`	Additional model parameters, passed with every request.

Definitions

get_api_key()#

Get the API key from the environment variable specified by api_key_env.

Return type:: str
Returns:: the API key
Raises:: ValueError – if the environment variable is not set

get_client()#

Get a shared client instance for this connector; return a cached instance if for this model’s API key if available.

Return type:: HuggingfaceChatLocal
Returns:: the shared client instance

classmethod get_default_api_key_env()#

Get the default name of the environment variable that holds the API key.

Return type:: str
Returns:: the default name of the api key environment variable

get_model_params()#

Get the parameters of the model as a mapping.

This includes all parameters that influence the model’s behavior, but not parameters that determine the model itself or are are specific to the client such as the model ID or the API key.

Return type:: Mapping[str, Any]
Returns:: the model parameters

async get_response(message, *, history=None, **model_params)#

Get a response, or multiple alternative responses, from the chat system.

Parameters:

message (str) – the user prompt to pass to the chat system
history (Optional[ChatHistory]) – the chat history preceding the message
model_params (dict[str, Any]) – additional parameters for the chat system

Return type:

list[str]

Returns:

the response or alternative responses generated by the chat system

Raises:

RequestLimitException – if an error occurs while communicating with the chat system

get_tokenizer()#

The tokenizer used by this connector.

Gets loaded on first access.

Return type:: AutoTokenizer
Returns:: the tokenizer

to_expression()#

Render this object as an expression.

Return type:: Expression
Returns:: the expression representing this object

with_system_prompt(system_prompt)#

Set the system prompt for the LLM system.

Parameters:: system_prompt (str) – the system prompt to use
Return type:: Self
Returns:: a new LLM system with the system prompt set

api_key_env: str#: The environment variable that holds the API key.

chat_template: Optional[str]#: The chat template to use for the model; None if using the default template.

chat_template_params: Mapping[str, Any]#: Additional chat template parameters.

exponential_base: float#: The base for the exponential backoff.

initial_delay: float#: The initial delay in seconds between client requests.

jitter: bool#: Whether to add jitter to the delay.

max_retries: int#: The maximum number of retries for client requests.

property model_id: str#: The ID of the model to use.

model_params: dict[str, Any]#: Additional model parameters, passed with every request.

property system_prompt: str | None#: The system prompt used to set up the LLM system.

use_cuda: bool#: If True, use CUDA; if False, use CPU