Deploying Responsible AI with Vertex AI and Gemini Models

From Zero to Vertex AI Invoke Gemini using Responsible AI Principles

This article serves as a tutorial on deploying a FastAPI application to Google Cloud Run that invokes Gemini models via Vertex AI while adhering to responsible AI principles.

Introduction

The guide illustrates how to configure safety filters for four harm categories: dangerous content, harassment, hate speech, and sexually explicit content, using strict blocking thresholds. It utilizes Vellox as an adapter for running ASGI applications in Google Cloud Functions and implements Bearer token authentication for enhanced security.

Moreover, the tutorial details the entire setup process, including enabling necessary Google Cloud services, configuring IAM roles, and deploying the function with environment variables.

This tutorial emphasizes practical safety implementations by demonstrating how Vertex AI screens both inputs and outputs. It returns a “SAFETY” finish reason accompanied by detailed safety ratings when harmful content is detected, which is particularly beneficial for developers aiming to build AI applications with integrated content moderation and security.

Technology Used

Cloud Run Functions:

Designed for quick responses to events or HTTP triggers.
Minimal configuration required — all infrastructure is managed for you.
Suitable for concise functions rather than full-fledged services.

Velox:

Vellox is an adapter that allows the execution of ASGI applications (Asynchronous Server Gateway Interface) in Google Cloud Functions.

HTTPBearer:

HTTPBearer in FastAPI is a security utility provided by the fastapi.security module. It manages Bearer token authentication, a common method for securing API endpoints by handling the presence and extraction of the Bearer token.

Steps to Implement

Development Environment Setup:

Use devcontainer to install all necessary components. Set up Docker and DevContainer, and after pulling the code, you will be ready to go.

Enable Services:

Initially, execute the following commands:

gcloud init

Followed by:

gcloud services enable artifactregistry.googleapis.com cloudbuild.googleapis.com run.googleapis.com logging.googleapis.com aiplatform.googleapis.com

IAM Permissions:

Assign the project role ‘roles/aiplatform.user’ to the current project in IAM.

Deploy with Environment Variables:

Use the following command to deploy:

gcloud run deploy fastapi-func --source . --function handler --base-image python313 --region asia-south1 --set-env-vars API_TOKEN="damn-long-token",GOOGLE_GENAI_USE_VERTEXAI=True,GOOGLE_CLOUD_LOCATION=global --allow-unauthenticated

This command:

Deploys a FastAPI function named handler from your local folder.
Runs on Python 3.13, in the Mumbai (asia-south1) region.
Sets environment variables for API tokens and Google Vertex AI usage.
Makes the function publicly available (no authentication required except for the Bearer token).

Walkthrough of main.py

The core of the implementation involves a simple FastAPI application integrated with Google Gemini AI, while also incorporating safety content filters.

import httpx, os, uuid
from fastapi import Depends, FastAPI, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from vellox import Vellox
from pydantic import BaseModel
from typing import Optional
from google import genai
from google.genai.types import (
GenerateContentConfig,
HarmCategory,
HarmBlockThreshold,
HttpOptions,
SafetySetting,
)

The safety_settings variable is defined as a list of SafetySetting objects, each specifying a harm category along with a block threshold. This includes:

Dangerous content

Harassment

Hate speech

Sexually explicit content

All categories are configured to block at the BLOCK_LOW_AND_ABOVE threshold, ensuring strict moderation.

Essentially, these settings allow the application to screen both inputs and outputs. If the model assesses the content as harmful, the call is blocked and no text is returned. By default, Gemini employs a severity-aware harm-block method in Vertex AI, which can be adjusted as necessary.

Conclusion

The implementation of responsible AI principles in deploying FastAPI applications using Google Cloud Run and Vertex AI is crucial in developing secure, ethical AI solutions. By leveraging these technologies, developers can effectively manage content moderation while ensuring the safety and integrity of their applications.

For further details, visit the GitHub repository.