AI & VFX10 min read

AI video generation in 2026: what actually works (and what doesn't)

Current limitations of AI video generation in 2026: an honest comparison of Runway Gen-4, Veo, Kling, Sora and Seedance 2.0 by a VFX artist who uses them daily for real clients. What works, what doesn't, where we're headed.

AI video generation 2026 — glove detail from a Veo-generated video

Last updated: March 2026

The AI video generation market has exploded. Every month a new model drops, every week someone declares "traditional cinema is dead." I use these tools daily in my AI video production workflow for real clients. Here's what actually works in 2026, without proclamations.

The tools I use and how I judge them

Runway Gen-4. The most reliable for professional work. Camera control is the best available: pan, tilt, zoom, dolly — coherent and predictable responses. I use it for generating environmental elements, backgrounds and sequences where precise movement control is needed. The weak point remains face coherence in long sequences — after 4-5 seconds the features start to drift.

Veo (Google). Impressive visual quality — individual frames are often indistinguishable from real footage. I used it a lot for the Roche project, where we needed a broadcast look on a budget that wouldn't have covered even one day of traditional crew. The limitation: less camera control than Runway and longer generation times. I choose it when the priority is single-frame quality, not movement controllability. I use it directly from Google AI Studio — not from Flow or third-party platforms — because the control is greater and the output is better. It's expensive, very expensive, but it works better.

Kling AI. Excellent for human subject movement — walks, gestures, facial expressions. Where Runway and Veo produce rigid body movements, Kling generates natural fluidity. I use it when the video protagonist is a person in motion. A concrete example: for a walking sequence in a recent project, Runway produced a mechanical stride. Kling generated a believable gait on the second attempt.

Sora (OpenAI). Powerful on cinematic quality and real physics understanding — water, smoke, bouncing light. But the workflow is less flexible for professional production: fewer control parameters, less predictability in the result. I use it for concept and pre-visualization more than final output.

Seedance 2.0 (ByteDance). The newcomer that made noise — for good reason. Human subject motion is the most realistic I've seen: weight, gravity, believable physics on moving bodies. Characters run, fall, fight with a naturalness other models still can't match. The multi-shot system maintains visual coherence across different clips, and native lip-sync with integrated audio opens interesting possibilities for narrative content. The limits are concrete: native resolution is 720p — upscaling with Topaz Video is stunning and I use it often, but the starting point remains a real limitation, especially on large screens where artifacts show. Access outside China goes through third-party platforms, and censorship on real faces is aggressive. For now I use it for previsualization and social content where motion realism matters more than resolution. But I'm watching every update — when native 1080p arrives, things will change.

Higgsfield. More than a single model, it's a platform that aggregates the best — Sora 2, Kling, Veo 3.1 — under a single interface with cinematic controls. Cinema Studio lets you set specific camera movements (dolly, tracking, steadicam) and the preset system replicates real cinema grammar. The internal generator isn't the strongest, but the value is in the workflow: choose the right model for each shot, apply controls, and work in a single environment without jumping between five different platforms. For those producing high volumes of content — social, advertising, creative variants — it's an efficiency multiplier.

For reference frames: Nano Banana and Midjourney. The key frame — the starting image that guides video generation — is the most important step in the workflow. 90% of the final video's quality is decided here. Midjourney remains a reference for aesthetic quality, especially for cinematic atmospheres and lighting. But Google's Nano Banana 2 has changed the game: it generates Pro-quality images at instant speed, maintains character consistency across multiple images, and renders readable text — a historic weak point of all generators. I use it increasingly to iterate quickly on creative directions: ten variants in five minutes, pick the best, and from there move to video generation.

What works for professional production

Concept and pre-visualization. Here AI is already irreplaceable. A director asks me to explore ten creative directions? I generate visual variants in hours, not days. For Doppelganger AI generated the project's entire visual base — work that in traditional pre-production would have required weeks of storyboarding and concept art.

Environmental elements and backgrounds. Skies, landscapes, fantastic environments — AI produces excellent material I then integrate in compositing with traditional VFX techniques. The key is never using AI material as-is: it always needs integration work to make it live in the scene.

Fully AI-generated content. For projects where the budget doesn't allow traditional production, AI produces professional results. The Roche project demonstrates this: a complete video with broadcast quality, entirely made with AI tools then refined in post-production. The client got the result they wanted, at a fraction of the cost of traditional production.

Prototyping for commercial pitches. An agency needs to sell a concept to a client? Instead of a static mood board, I deliver an AI video concept in a few days. The difference in approval rate is huge: the client sees the result, they don't have to imagine it.

What doesn't work (yet)

Character coherence. Same person, same face, same clothes for thirty seconds of video. No tool does this reliably — Seedance 2.0 has made strides with multi-shot, but we're not at total coherence yet. Workarounds are needed — face swap, compositing, frame-by-frame generation with reference — and all require hours of manual work. It's the number one problem in AI video today.

Readable text. AI generates text in video like a drunk writes on a blackboard. For any content with on-screen text — titles, lower thirds, name straps — traditional motion graphics is still needed. Nano Banana 2 solved the problem for still images, but in video we're still far off.

Precise subject-object interaction. A hand gripping a specific object, a finger pressing a button, a product manipulated by human hands. AI produces obvious artifacts — fingers merging, objects floating, impossible grips. For product videos with human interaction, traditional 3D animation is needed.

Synchronized audio. Lipsync, coherent ambient sounds, Foley — AI-generated audio is still primitive by professional standards. Seedance 2.0 has the most advanced native lip-sync, but for serious production AI video still needs to be paired with separate audio production. There are no shortcuts here.

The factor nobody mentions: post-processing

Here's the truth that AI tool demo reels don't show: raw AI video is never the final video. Never. Every generated clip goes through my post-production workflow — color correction, stabilization, artifact cleanup, compositing with real elements, grading for coherence with the rest of the project. The video you see in the tool's showreel and the video I deliver to the client are two different things.

This is why twenty years of post-production experience are my real competitive advantage in the AI era. Anyone can generate a video. Very few know how to transform it into a professional product.

Where we're heading

The pace has accelerated compared to six months ago. Seedance 2.0 has shown that realistic human motion is within reach — in one year character coherence will be solved. Platforms like Higgsfield are transforming AI video from isolated individual tools into complete production ecosystems. Nano Banana 2 has made reference frame generation instant and nearly free.

In two years, frame-by-frame control will be standard. In three, the distinction between "shot" and "generated" will be irrelevant for 90% of applications.

But the principle doesn't change: someone with the eye, experience and taste to direct these tools will always be needed. Not "use them" — direct them. Like a director directs a crew, an experienced professional directs AI. And the result is incomparably different.