Embedding Models served by DeepInfra
Availble models
Fireworks provide a larger variety of models, here’s the one we use in our example. You can replace it with any embedding model supported by DeepInfra:
Model | Dimensions | Max Tokens | Cost | MTEB Avg Score | Similarity Metric |
---|---|---|---|---|---|
thenlper/gte-large | 1024 | 512 | $0.010 / 1M tokens | 63.23 | cosine |
Usage
Note that DeepInfra doesn’t have an SDK. Their documentation shows the use of the REST API directly with a client library in your language of choice, or you can use OpenAI’s SDK - DeepInfra is compatible with OpenAI’s API.
In the examples below, we use OpenAI’s SDK with DeepInfra URL, API key and models.
Installing dependencies
Generating embeddings with DeepInfra
Storing and retrieving the embeddings
Additional information
Reducing dimensions
Using larger embeddings generally costs more and consumes more compute, memory and storage than using smaller embeddings. This is especially true for embeddings stored with pg_vector
.
When storing embeddings in Postgres, it is important that each vector will be stored in a row that fits in a single PG block (typically 8K). If this size is exceeded,
the vector will be stored in TOAST storage which can slow down queries. In addition vectors that are “TOASTed” are not indexed, which means you can’t reliably use vector indexes.
Fireworks supports multiple models. gte-large
and nomic-embed-text-v1.5
are two of the models available.
The gte-large
model has 1024 dimensions and does not support scaling down. The nomic-embed-text-v1.5
model has
768 dimensions and can scale down to 512, 256, 128 and 64.