Gemma 4 & Strands Evals

Introduction to Gemma 4 Models and Their Features

AWS has released the Gemma 4 family on Amazon Bedrock, an open-weight model developed by Google DeepMind that prioritizes intelligence per parameter in its design. (Source: Introducing Gemma 4 models on Amazon Bedrock)

The Gemma 4 family includes three instruction-tuned variants:

google.gemma-4-31b (30.7B parameter Dense architecture)
google.gemma-4-26b-a4b (25.2B/3.8B active parameters Mixture-of-Experts)
google.gemma-4-e2b (5.1B total parameters PLE architecture)

These models support multimodal input of text and images and operate in over 35 languages. The intelligence index is reported as 39 (Gemma 4 31B), exceeding the median of 15 in the 4B-40B class. (Source: Not specified in official documentation)

AI Agent Failure Detection and Root Cause Analysis

AWS provides a failure detection and root cause analysis feature for AI agents using the Strands Evals SDK. This tool automatically detects failures from execution traces and identifies causal relationships, reducing diagnosis time from hours to minutes. (Source: AI Agent Failure Detection and Root Cause Analysis with Strands Evals)

The detection process consists of two stages:

Failure Detection: Scans spans using a failure taxonomy categorized into nine parent categories (hole selection, incorrect action, etc.)
Root Cause Analysis: Tracks causal relationships from detected failures and generates repair proposals

This feature can be installed with pip install strands-agents-evals and provides an API for analyzing agent execution traces.

Entry Points to Get Started

Using Gemma 4 Models

Access Gemma 4 models from Amazon Bedrock’s model catalog and start the initial setup with the following command:

aws bedrock list-models

Introducing Strands Evals SDK

Install the SDK in a Python environment with the following command:

pip install strands-agents-evals

Summary

Utilize the google.gemma-4-31b model to achieve high-precision natural language processing with a 30.7B parameter Dense architecture
Use the detect_failures() function from the Strands Evals SDK to automatically analyze the causes of agent failures and obtain repair proposals
Use the AWS CLI command aws bedrock list-models to check the usage of the Gemma 4 family in real-time
Agent development teams may be able to reduce debugging time by 80% using the causal analysis feature of Strands Evals (Note: Official documentation does not specify a specific improvement number)