Oracle 23ai v23.6 new feature: Vector Search and a local Ollama Service
Oracle released Database 23ai in May 2024, but did you know that regular new sub-versions are released, with significant functional additions? This article will zoom in on one of the features added in version 23.6: calling a local Ollama service, running on an environment of your choice.
In my previous article “Setting up Oracle Database 23ai with AI Vector Search” I already described how to upload your embedding model into the database itself.
So now there is a third option: call a “local” Ollama service running the model of your choice besides the database, on the infrastructure of your choice. You can use this model to create your embeddings for populating Vector types in tables, as well as for the actual generation of the answer to the questions of users, enriched with your proprietary content coming from a semantic Vector search in the database.
Overview of Options to implement RAG
So let’s detail the available options to create embedding in the Oracle Database
1. Using an external public AI provider
Oracle Database allows you to configure a connection to a number of third-party AI services to enable the Vector generation: Oracle GenAI, Cohere, Google AI, Hugging Face, and OpenAI are now availabe.
In this use-case you don’t have to set up anything yourself: you basically outsource all the setup of the embedding LLM to your preferred GenAI provider. There are however some drawbacks to this option: first of all you need a subscription to a GenAI service, and each API call will come with a cost. Also you are sending your data to the public AI service, where you have little control over the protection of that data or the potential reuse for training the next generation model. And of course this call will come with a certain latency, impacting the performance of the embedding, especially if the database is not running inside the GenAI provider of your choice.
2. Uploading an Embedding model in the database
As already mentioned in my previous article “Setting up Oracle Database 23ai with AI Vector Search” you can upload an Embedding model inside of the database, and run your embedding generation natively in the database. This allows you direct access to the LLM, as well as the ability to leverage the power of your database hardware (for example an ExaData machine). Some drawbacks: you need to manage the uploading and updating of the LLM models in your database yourself, and you are using a part of the database availabe CPU power to perform the embeddings.
3. Using a local Ollama server
As of version 23.6 of the database, you can now also call a “Local” ollama server, running the model of your choice. This allows you to have full control of the machines and the models used, and also allows you to segregate Database CPU power from the embedding CPU power. And of course this can be run 100% within your chosen environment, either on-premise or on a (public or private) Cloud environment of your chosing. Disadvantage is that you are now required to manage an extra service: your local Ollama server needs to be configured and managed.
It’s this third choice I will describe in detail in the remainder of this article. To make things simple, I’ll run all the components on my personal laptop — a Mac M1. But you could easily do the same on a Windows PC or on a Linux VM in a Cloud environment.
Practical set-up on a Mac
There are a lot of options to run the latest version of the database: on Oracle Cloud, on Engineered systems, as a classic database deployment or as an Autonomous Database. To illustrate the usefulness for developers I ran it all on my local mac, both the Oracle database using podman, as well as the Ollama server.
What you’ll need on your machine:
- podman
- ollama
- VSCode with the SQLDeveloper plugin
For more details on the installation of these components, please follow the appropriate how-to’s. In the next chapter I’ll explain the steps required to configure these components to get to the desired result of using the local Ollama server to create vector embeddings in the database.
Oracle offers 2 flavours of the database: the “Full” and the “Light”. In order to have all the Vector features available, you have to go for the “Full” image, which is approximately 9 GB in size. So although the initial download takes some time, spinning the database up is really fast.
We’ll use podman to pull the 23.6 Oracle Database container image from the Oracle Container Registry. But before we try to run the database, you need to make sure to provide the podman VM machine at least 8 GB of memory in order to run:
podman machine init
podman machine set --memory 8192
podman machine start
podman pull container-registry.oracle.com/database/free:23.6.0.0
Now you can run the database, with a few important parameters:
- Specify the database name
- Specify to map port 1521 to your local machine so you can access the database through SQL Developer
- Specify a password for SYS
podman run --name mydb -p 1521:1521 \
-d -e ORACLE_PWD=Welcome#123123123 \
container-registry.oracle.com/database/free:23.6.0.0
You can see the resulting container status and the output for eventual debugging via the below commands:
podman ps -a
podman logs mydb
Once the mydb container is in “Up “ and “Healthy’” mode, you can try to connect to the database through the shell with this command:
podman exec -it mydb sqlplus SYS/Welcome#123123123 as sysdba
If all went well you are now connected as SYS to the database!
In order to make interacting with the database a bit easier, you can connect using the SQLDeveloper plugin of VSCode. For this, configure a new connection, using the following parameters:
- Username: sys
- Password: as you defined it
- Role: Sysdba
- Connection type: basic
- Hostname: localhost
- Port: 1521
- Servicename: freepdb1
This will connect you to the freepdb1 pluggable database environment. If you want to connect to the Container database, you can use the servicename “FREE”. Once you define a user for running your tests with vectors (docuser in this article), you can create a second connection using the credentials of that user.
Running the Ollama server
Before we start working with the vector setup in the database, we need to make sure the Ollama server is up and running, using the appropriate model and addressable by the DB in the podman container.
By default Ollama starts on localhost:11434, but this address is not addressable from a container running in podman. So you need to first set the environment variable to your local IP address. Below the command to list your local IP address in case you don’t know it, and the export command to set the OLLAMA_HOST:
ipconfig getifaddr en0
export OLLAMA_HOST=<your IP addrss, for example 192.168.1.20>:11434
Make sure to kill any running Ollama servers, then relaunch it and run the model of your choice — I’ll be using llama3.1
ollama serve &
ollama run llama3.1
You’ll end up in the interactive Ollama environment, you can leave it with “Ctrl d” without stopping the server.
Test if you can address the server using curl:
curl -X POST http://192.168.1.20:11434/api/generate -d \
'{"model": "llama3.1","prompt":"Give a brief description of a Spanish Waterdog."}'
Notice I use the external IP address in order to ensure I can address the service correctly.
Alternative option : You could also run the Ollama service on a separate machine (or a Cloud VM environment). In that case you need to check you can access the service correctly from the machine running the database, and open the required port 11434 on the machine and on the Cloud network.
Calling Ollama from the DB
OK, all elements are now in place to use vectors in the database and call the external Ollama server to do the encodings.
First we’ll use the connection we defined earlier to set up a DB user and give it the required privileges:
CREATE USER docuser identified by docuser;
GRANT DB_DEVELOPER_ROLE, create credential to docuser;
ALTER USER docuser QUOTA UNLIMITED ON users;
grant execute on SYS.UTL_HTTP to docuser;
grant execute on sys.dbms_network_acl_admin to docuser;
Once you’ve done this, you can create a new connection using “docuser”. We’ll be creating a table to store some embeddings, and see how we can call the Ollama server to create the embedding vectors and insert them.
First you need to allow outgoing connections:
BEGIN
sys.DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => '*',
ace => xs$ace_type(privilege_list => xs$name_list('connect'),
principal_name => 'docuser',
principal_type => xs_acl.ptype_db));
END;
/
Next you create the table to insert the results:
create table my_test
(
id number,
the_v vector
);
insert into MY_TEST values (0, '[1,2,3,4,5]');
commit;
select * from my_test;
And finally you can run the below example to create 100 embeddings:
DECLARE
embed_ollama_params clob;
my_res vector;
my_string varchar2(2000);
BEGIN
embed_ollama_params := '{
"provider": "ollama",
"host" : "local",
"url" : "http://192.168.1.20:11434/api/embeddings",
"model" : "llama3.1"
}';
for i in 1..100 loop
my_string := 'Just some text' || i;
DBMS_OUTPUT.PUT_LINE('The value of my_string is: ' || my_string);
my_res:= dbms_vector.utl_to_embedding(my_string, JSON(embed_ollama_params));
insert into MY_TEST values (i, my_res);
end loop;
END;
/
Hopefully you see the list of output lines, followed by the “PL/SQL procedure successfully completed” message !
You can check if the embeddings were successfully generated by rerunning the select statement on the table.
Conclusions
In this article I explained how I tested a new option of the Database 23ai v23.6 to use a local Ollama server for vector generation, running the Ollama server on a machine of my choice.
There are more new features in the Oracle Database 23.6 release, so make sure to check them out in the “Release Update” chapter of the documentation!
If you want to know more about the full RAG setup in the database, check out my previous article on the topic.