Pikumo · probe image-gen pipeline · aspect ratio A/B/C

What's the right aspect ratio for a story panel?

Same 4-beat grief story, same character photo, same style, same gpt-5.4-mini orchestrator with <story_source> attached. The only variable is the image_generation size parameter: 1024×1024 (square, current production default), 1024×1536 (portrait 2:3), 1536×1024 (landscape 3:2).

All panels rendered natively at that size — not resized post-hoc. Below each panel: which composition genuinely "won" the beat (green pill) vs runner-up (faded green).

Beat 1 — Entryway

"I stand in the entryway for too long with the key still in my palm."

square 1:1 square beat 1
portrait 2:3 portrait beat 1 runner-up
landscape 3:2 landscape beat 1 winner

Square crops Mei into a portrait shot — the hall is barely visible. Portrait stretches the hall deep into the apartment, terracotta tiles + key clearly visible. Landscape goes further: reveals the coat hook on the left wall, scarf detail, and the kitchen at the end of the hall. Landscape reads as "she is in a place" rather than "she is centered in a frame."

Beat 2 — Kitchen (one bowl, one spoon, one teacup)

"one teacup with a chip I'd forgotten about, on its little wire shelf … the dimmer over the stove still at her setting."

square 1:1 square beat 2
portrait 2:3 portrait beat 2 runner-up
landscape 3:2 landscape beat 2 winner

Square turns the kitchen into a stage behind Mei. Portrait gives the full vertical kitchen — range hood, stove with kettle, sage-green curtain, dish rack with bowl & cup. Landscape reveals a red sitting nook on the right we never see otherwise — the apartment reads as a lived-in space, not just a kitchen. Counter-intuitive but consistent finding: landscape on a small interior feels expansive, not cramped, because interior architecture is horizontal.

Beat 3 — Bedroom (sleeve of the green sweater)

"I open the closet because I have to and the sleeve of her green sweater swings out … I press it to my face."

square 1:1 square beat 3
portrait 2:3 portrait beat 3 winner
landscape 3:2 landscape beat 3 runner-up

The one beat portrait clearly wins. Vertical light from the window → seated body → green sweater sleeve to face → pink bedspread anchoring the bottom — feels like a page from a picture book. Landscape is cinematic but loses the storybook intimacy. Square crops the bed entirely and squeezes the closet detail.

Beat 4 — Leaving

"I don't look back. I walk to the stairs and I keep walking."

square 1:1 square beat 4
portrait 2:3 portrait beat 4 runner-up
landscape 3:2 landscape beat 4 winner

Square reverts to the "smiling at camera in a fluorescent box" failure mode from the story-source probe. Portrait gives a closed apartment door on the left and twilight city through the open-air walkway on the right. Landscape reveals both walls of the walkway plus the full skyline — the most evocative of the three on this corridor geometry. Reads as departure, not portrait.

The numbers

metricsquare 1:1portrait 2:3landscape 3:2
size (px)1024×10241024×15361536×1024
wall seconds (direct OpenAI)137.4143.2135.2
input tokens9,27311,32111,003
cached tokens01,7921,792
reasoning tokens1,239661486
output tokens2,0391,6741,327
image cost (low qual, est.)$0.044$0.064$0.064
panels delivered4/44/44/4

All three runs: gpt-5.4-mini · OpenAI direct · image_generation quality: low · same prompt (variant B from story-source probe). Latency parity is the headline — non-square ratios are not meaningfully slower.

Verdict

Retire square as the default. The 1:1 crop forces character-dominates-frame on every beat — environment is always sacrificed. It's a worse composition for every beat tested.

Default to landscape 3:2 (1536×1024). Wins beats 1, 2, 4 outright. Strong second on beat 3. Reveals architectural context — corridors, kitchens, walls, beds, and outdoor scenes all have horizontal geometry, and landscape lets that breathe. Reads as "she is in a place" rather than "she is centered in a frame."

Hold portrait 2:3 (1024×1536) as a per-story or per-beat option. Wins on intimate interior beats (the bedroom) where the vertical composition reads as a picture-book page. Worth a follow-up probe that mixes the two within a single story — maybe per-beat: outdoor / wide → landscape; close intimate → portrait.

Cost +45% per image vs square. $0.044 → $0.064 per 4-panel story at low quality. Negligible against orchestrator + extract costs. Latency is unchanged (~140s on direct OpenAI).

Caveat. This is one story (interior-heavy grief). Trip stories with outdoor vistas would lean even harder landscape. Pure single-character intimate-moment stories might favor portrait. Worth a second probe on a trip-shaped story before fully committing — but landscape is the safe default move now.