Giving the Oracle AI Services a spin using the Python SDK

9 min readJan 26, 2024

Introduction

Oracle AI services can be easily integrated into your applications. This article will guide you through the process of setting up and running the main AI services available on the Oracle cloud. I chose to use the Python SDK and Jupyter notebooks because it allows for easy interaction and step-by-step execution of code. It also enables you to copy the API calls straight into your actual application.

I’ll cover the following AI services in this article:

AI Language Service: This service allows you to detect language, analyse text sentiment, and translate between a large set of languages.
AI Vision Service: This service allows you to detect objects, faces, and text in image files.
AI Speech: This service allows you to transcribe sound from video or audio into a text file.

I’ll also be using the identity service as well as the object storage service to manipulate files.

First, I’ll give a brief explanation of the capabilities of these services. Then, I’ll provide a detailed description of how you can set up your machine to run the code yourself. If you want to dive straight in, you can download my notebook here.

Overview of the services

The OCI AI Language service

Oracle OCI Language is a cloud-based AI service that can help you analyse text. It contains a whole range of pre-trained language processing capabilities such as Language Detection, Text Classification, Named Entity Recognition, Key Phrase Extraction, Sentiment Analysis, Text Translation, and removal of Personal Identifiable information. With these capabilities, you can process any text and extract insights without data science expertise.

In this article, we’ll explore 3 of these capabilities:

Language detection: This service will detect the language of the provided text among 100+ languages, typically as a first step in processing the information. You just pass the text along as a string and you get the language code as a return.
Sentiment Analysis: The Aspect-Based Sentiment Analysis feature extracts the critical components of text and provides the associated sentiment — either positive, negative, or neutral. You can do this on the level of the whole text, or detect the sentiment in each sentence.
Text Translation: This service allows you to translate text between 10+ languages. This is especially useful for managing multilingual chatbots or document searches in different languages.

AI Vision Service

You can use the Vision service to detect and classify objects in pictures. If you have lots of images, you can process them in batches.

I’ve provided a few simple example images to cover the most obvious use-cases, but you can easily specify your own images and see the results:

Object detection: This service will return a list of the detected objects (e.g. flower, electronic device, etc.), as well as the “bounding boxes” on the picture where these objects are located,
Face detection: This service can detect human faces, again with the “bounding boxes” of the face location,
Text detection: This service can detect text in the image and return the strings of text detected.

To use this service, you need to have the image files in a bucket in the OCI object storage. The code includes the upload of a local image into the bucket.

Picture of the console showing the result of an Object Detection query — Object detection through the OCI Console

AI Speech Service

The last service we’ll be exploring is the Speech to Text service. To use it, you need to first upload your video or sound file to an object storage bucket. Then, launch the Transcription job and wait for it to finish. You can read the results from a file in the bucket, which can be either JSON or SRT.

In the example code, we’ll upload a short sound file, launch the transcription job, and then read back the result file so we can display the text on screen. Because the transcription happens asynchronously, I wrote a simple loop over the result of the job to wait for the result to be available before reading the actual result file.

Getting your hands dirty

So now it’s time to dive into the practical setup. Start by downloading the github repo https://github.com/janleemans/oci-ai-python, which contains the workbook as well as a few example image and sound files. In the “notebook” folder you’ll find the file “oci-ai-python.ipyb” containing the python code.

Installation of the environment

Before you can start you need to install the required components. I’m assuming you already have a python environment, if not you can go to the python.org website (https://www.python.org/downloads/) or use anaconda to install a richer set of tools, including python and Jupyter notebooks. I would also suggest installing Visual Studio Code for easy development.

The next step is to set up your access to the Oracle OCI instance.

First install the python OCI SDK. On a mac you can do this running the command `sudo yum install python36-oci-sdk`. More info can be found here.
Next you need to install the OCI Command Line interface (oci cli). If you’re on a mac and have homebrew set up, you can simply run the `brew update && brew install oci-cli` command. If not, please consult the documentation for more details.
Next you need to have an Oracle Cloud environment. If you’re reading this article, I’m assuming you already have access to an OCI environment, if not you can obtain a free trial environment via this link.
In your OCI environment you need to set up the API Access. Go to your User Profile page and select the “API Keys” tab. Here you can Add a new API key, which will provide you a Fingerprint and a private pem key, as well as an example of the .oci/config file. While you’re in the console, create a compartment for this experiment and note down the compartment OCID, this is one of the parameters you will need later to interact with the environment.

Validating your setup

Before you start in python, I would advise to test the oci cli itself from a shell, for example executing the command `oci iam region list`. This is a simple command without any parameters that should list the available regions for your tenancy. If this command works correctly you are sure the `.oci/config` file is set up correctly, and you are ready to move to python.

At this point we’ll be following the script in the file “oci-ai-python.ipyb”.

Import the python oci SDK, then store the connection information in the `config` variable, which we’ll be using throughout this article:

import oci
config = oci.config.from_file("~/.oci/config", "DEFAULT")

Initialize the IdentityClient object, and use it to obtain the list of regions of the tenancy:

idd = oci.identity.IdentityClient(config)
regions = idd.list_regions().data
for region in regions:
  print(region.name)

If this piece of code is printing a list of OCI regions, your setup is working correctly, and you are ready to start exploring the AI services!

Exchanging files with OCI

Before we can start working with the various AI services, we need to ensure we are ready to upload and download image, text and video files from OCI. This is done through Object Storage buckets.

Below a series of interactions with the object storage API to ensure we are ready to upload files.

The compartment OCID needs to be copied from your OCI environment,
the bucket name can be chosen to be any value,
initiate the Object Storage client using the ObjectStorageClient() function.

# Lets create a bucket to use in this lab:
bucket_name = "ai-lab-bucket"

# Set the OCID of the compartment you created for this lab:
compart = "ocid1.compartment.oc1..aaaaa… your OCID …"

object_storage_client = oci.object_storage.ObjectStorageClient(config)
namespace = object_storage_client.get_namespace().data

We can now use the various functions of our object_storage_client to interact with the bucket:

list_buckets() to list all the buckets in the compartment,
list_objects() to get the available objects in a bucket
create_bucket() to create a new bucket

In the script you can see we print the relevant results to track what is happening.

The AI Language Service

Let’s get started with the actual AI services. We’ll start with the Language service, which needs to be initialized as below:

ai_language_client = oci.ai_language.AIServiceLanguageClient(config)

We can now use the ai_language_client to interact with the service, for example to get the dominant language of a text:

response = ai_language_client.detect_dominant_language(
    oci.ai_language.models.DetectLanguageSentimentsDetails(
        text="Some text you want to have analyzed
    )
)

In a very similar way, you can use the function batch_detect_language_sentiments() to do sentiment analysis on a text or a series of texts, and you can use the function batch_language_translation() for translating your text into any other language.

The AI Vision Service

The next service to investigate is the AI Vision service. The key difference here is that you need to upload the images to a bucket before you can invoke the service. As an example, you can do this as per the below code:

# Set up the storage object
object_storage = oci.object_storage.ObjectStorageClient(config)

# Set up the file path
img1_file_path = "desk.jpeg"

with open(img1_file_path, "rb") as f:
    put_object_response = object_storage.put_object(
        namespace_name=namespace,
        bucket_name = bucket_name, 
        object_name = img1_file_path.split("/")[-1], 
        put_object_body = f)

With the file available in the bucket, we can start preparing the AI Vision service:

ai_vision_client = oci.ai_vision.AIServiceVisionClient(config)
namespace = object_storage_client.get_namespace().data

analyze_image_response = ai_vision_client.analyze_image(
    analyze_image_details=oci.ai_vision.models.AnalyzeImageDetails(
        features=[
            oci.ai_vision.models.ImageClassificationFeature(
                feature_type="IMAGE_CLASSIFICATION",
                max_results=130)],
        image=oci.ai_vision.models.ObjectStorageImageDetails(
            source="OBJECT_STORAGE",
            namespace_name=namespace,
            bucket_name=bucket_name,
            object_name=img1_file_path),
        compartment_id=compart)
    )

# Get the data from response
print(analyze_image_response.data)

To see the result, you simply print the response.data

Take a note of the parameter called feature_type that was set to “IMAGE_CLASSIFICATION” in the above example. By changing this parameter, you can invoke the other types of image analysis:

“FACE_DETECTION”: to activate the face detection,
TEXT_DETECTION”: to extract the text from the image.

In the code you can see examples of these calls and how to visualize the results. Again, check the workbook for the full working code of the service

The AI Speech Service

So we’ve come to the last service to explore: transcribing sound into text.
This service is a bit more complex to handle from python, because the result comes in asynchronously, and because the result is stored in a JSON file in a bucket.

We already uploaded files to the bucket in the previous example. Basically, you can upload any sound or video file. Once your file is uploaded it is relatively easy to launch the transcription job, although you must pay attention to the naming of the objects:

ai_speech_client = oci.ai_speech.AIServiceSpeechClient(config)

create_transcription_job_response = ai_speech_client.create_transcription_job(
    create_transcription_job_details=oci.ai_speech.models.CreateTranscriptionJobDetails(
        compartment_id=compart,
        input_location=oci.ai_speech.models.ObjectListInlineInputLocation(
            location_type="OBJECT_LIST_INLINE_INPUT_LOCATION",
            object_locations=[
                oci.ai_speech.models.ObjectLocation(
                    namespace_name=namespace,
                    bucket_name=bucket_name,
                    object_names=[video_file_path])]),
        output_location=oci.ai_speech.models.OutputLocation(
            namespace_name=namespace,
            bucket_name=bucket_name),
        additional_transcription_formats=["SRT"],
        display_name = video_file_path,
        model_details=oci.ai_speech.models.TranscriptionModelDetails(
            domain="GENERIC",
            language_code="en-GB",
            transcription_settings=oci.ai_speech.models.TranscriptionSettings(
                diarization=oci.ai_speech.models.Diarization(
                    is_diarization_enabled=False,
                    number_of_speakers=2)))))

The response you get back will only indicate if the Job submission was performed correctly, not if the job itself is actually performing well. To check the status of the job and visualize the result, you can use the below code:

get_transcription_job_response = ai_speech_client.get_transcription_job(
        transcription_job_id=job_id)
ori_name = get_transcription_job_response.data.input_location.object_locations[0].object_names[0]
print("Original name = ",ori_name)

# Compose the path to the result file:
res_file = out_loc + namespace + "_" + bucket_name + "_" + ori_name +".json"
print("Result filename: ", res_file)

get_object_response = object_storage_client.get_object(
    namespace_name=namespace,
    bucket_name=bucket_name,
    http_response_content_type = 'text/plain',
    object_name=res_file)

data = json.loads(get_object_response.data.content)

print("Transcription text: ", data['transcriptions'][0]['transcription'])

Note that the get_object() method will return a stream, and this is a bit under-documented in the python code examples of the API reference.

To get the actual content of the file and the result of the transcription, you need to use json.loads() which will allow you to extract the actual transcription as illustrated in the print command above.

Again, for the full code, check the workbook.

Conclusions

You’ve now played with 3 out-of-the-box services of the OCI AI Services, allowing you to incorporate this functionality into your own applications. In his article we used the Python SDK, but please note you can also use direct REST calls to the API, or use any of the other language SDK’s that are available (Java, Go, Ruby, Typescript, JavaScript or .Net).

Stay tuned for exploring more AI services!