Skip to content

GLM 4.7 Flash

GLM 4.7 Flash is the speed-optimized variant in Z.ai's GLM-4.7 generation, released N/A. It delivers faster inference for high-throughput workloads while retaining the coding, tool usage, and conversational improvements introduced in GLM-4.7.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.7-flash',
prompt: 'Why is the sky blue?'
})

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
0.3s
149tps
$1.30/M$4.30/M
Read:$0.26/M
Write:
+1
baseten logo
deepinfra logo
fireworks logo
+3
04/07/2026
203K
4.0s
63tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
+1
zai logo
03/15/2026
203K
0.6s
128tps
$0.80/M$2.56/M
Read:$0.16/M
Write:
+1
baseten logo
bedrock logo
deepinfra logo
+4
02/12/2026
205K
0.1s
758tps
$2.25/M$2.75/M
Read:$2.25/M
Write:
+1
baseten logo
bedrock logo
cerebras logo
+3
12/22/2025
205K
0.4s
186tps
$0.60/M$2.20/M
Read:$0.11/M
Write:
+1
deepinfra logo
novita logo
zai logo
09/30/2025
128K
1.6s
145tps
$0.30/M$0.90/M
Read:$0.05/M
Write:
+3
zai logo
09/30/2025