Zum Inhalt springen

Audio Out of Sync in Your Video — How to Fix Drift and Sync Problems

Sound running 500ms ahead of or behind the picture? A burst at the start? These aren't player bugs, they're container problems left over from repair. With ffmpeg commands and automated fixes.

J

Jonas Becker

Codec-Research · Tech Writer · May 19, 2026 · 5min read

You repaired a broken video file (with untrunc, VLC or some other tool), and the picture is there — but the audio is offset. Sometimes ahead, sometimes behind. Sometimes with a deafening noise at the very start. This isn’t your player, and it isn’t an encoder quirk either.

It’s a container-repair artifact, and it can be fixed — but you need to understand what’s happening.

Why audio drift appears after repair

In an MP4 file, video and audio samples are stored interleaved inside the mdat block. The sample table in the moov atom says, for every audio sample: “starts at byte offset X, is Y bytes long, belongs to timestamp T.”

When moov is missing and a tool like untrunc reconstructs the index, it has to guess where the first audio sample begins in the mdat stream. For most codec combinations that works fine. In two specific cases it fails predictably:

Case 1: Sony XAVC with 24-bit PCM (classic drift)

For XAVC, Sony writes the audio stream as 24-bit PCM with a pre-roll of a few hundred samples ahead of the first video frame. Technically those samples belong to the video sync track. untrunc’s default counts them as “normal audio” and shifts the entire audio stream ahead of the picture — typically 480ms in the simple case, up to 640ms in hard cases.

Case 2: Header bytes read as audio (noise burst)

When the audio decoder doesn’t know where the first real audio sample sits, it sometimes starts decoding right in the middle of the container header bytes. Those bytes aren’t valid audio — but the decoder renders them anyway, which produces a deafening full-scale clipping in the first 400–1050ms. On many Sony and Canon files this happens predictably.

How to spot drift and burst

Three tests you can run in 30 seconds:

  1. Play the video and clap your hands while it runs (for real, in front of you). If you hear the clap offset in the playback, you’ve got audio drift.
  2. Listen to the first 2 seconds at moderate volume. If the start is a full-scale roar, you’ve got a noise burst.
  3. Play the video in VLC and watch the status bar (View → Audio Filters → Spectrum or similar). Some players show the audio-track offset relative to the video.

Manual fix with ffmpeg

For a one-off file with clearly measured drift, you can fix this manually in ffmpeg. The -itsoffset flag shifts the audio stream relative to the video.

# Example: audio runs 480ms ahead of the picture → shift audio 480ms BACKWARD
ffmpeg -i input.mp4 -itsoffset 0.480 -i input.mp4 \
  -map 0:v:0 -map 1:a:0 -c copy output.mp4

# Example: audio runs 300ms BEHIND the picture → shift audio 300ms FORWARD
ffmpeg -i input.mp4 -itsoffset -0.300 -i input.mp4 \
  -map 0:v:0 -map 1:a:0 -c copy output.mp4

This assumes you’ve measured the drift exactly — which is hard in practice. A guess of “somewhere around 500ms” is often good enough, but it leaves subtle drift in the 50ms range that’s noticeable on voice recordings.

Removing a noise burst with ffmpeg

The burst at the start can be brute-forced away with silence:

# Mute the first 1 second
ffmpeg -i input.mp4 -af "volume=enable='between(t,0,1)':volume=0" \
  -c:v copy output.mp4

This is a crude fix — the first 1000ms of audio is gone. On recordings that start with a clap or a loud opening you won’t notice. On ambient recordings you will.

A cleaner approach is a per-50ms peak scan: each 50ms audio block is checked for whether it exceeds −3dB. If it does → that’s probably decoder garbage, not real audio → silence it. With 50ms of padding front and back, so the burst transition stays smooth.

In ffmpeg that’s a pipe of several filters and not trivial to express in a single command. It’s exactly the kind of algorithm Haven runs automatically.

Fully automatic with Haven

In Haven, audio repair runs as a 3-stage pipeline after every container repair:

  1. Per-50ms-window peak scan of each audio track. Anything above −3dB within the 1-second window after file start is flagged as a burst and silenced (with padding for the transition).
  2. Cross-correlation between the audio stream and the video-frame pulses (extracted from the mdat) measures the drift to millisecond precision.
  3. The audio track is shifted (-itsoffset internally) and truncated to video duration.

The result: every audio stream is exactly video_duration, no burst, no drift. That’s the real differentiator over generic repair tools.

Download Haven →

Side by side: ffmpeg vs Haven on audio sync

Problemffmpeg manualHaven automatic
Measure driftYour own guess, often ±50msCross-correlation, ±1ms
Correct drift-itsoffsetAuto
Remove burstCrude, with volume=enablePer-50ms scan with padding
Multi-track audio (4-track FX3)Per track, manuallyAll tracks in parallel
VerificationTrial and errorVisual preview

Special case: drift in iPhone Cinematic mode

With iPhone Cinematic (Dolby Vision Profile 8.4) there’s another audio-drift path: the audio stream has its own track edit (elst atom) with a pre-skip, which often gets lost during container reconstruction. That results in ~50ms of drift that doesn’t come from Sony XAVC drift but has its own cause. Haven recognizes iPhone files by the mvhd brand identifier and applies the right correction.

See also: Repair iPhone video

Summary

Audio drift after video repair is not a player bug, it’s a container artifact. The key takeaways:

  1. Sony XAVC almost always has 480ms of drift after a naive repair — it’s a stream pre-roll quirk
  2. A noise burst at the start comes from decoder garbage bytes — it can be removed automatically with a peak scan
  3. ffmpeg can fix both manually, but it’s trial and error
  4. Automated tools (Haven) do it via cross-correlation and a per-50ms scan in one step

For a single file with measured drift, ffmpeg is enough. For multiple files or unknown drift amounts, the automatic pipeline is faster and more accurate.

Check your file for audio drift →

More context: Understanding the MP4 container explains why the sample table plays the central role in sync problems.

About the author

J

Jonas Becker

Codec-Research · Tech Writer

A computer-science background with a focus on video containers and stream parsing. Writes Haven’s deep-dives — from HEVC NAL-unit structures to Dolby Vision RPU metadata.

Specialty · HEVC · ProRes · Container standards · NAL-unit analysis

Newsletter

When a new repair guide goes up — one email. Monthly at most.

No marketing sequences, no promo blasts. Just a heads-up when a new pillar guide, codec deep-dive or honest comparison goes live.

No spam. One-click unsubscribe from every email.