Pillar article
Understanding the MP4 Container: moov, mdat, and Why Videos Break
What technically happens when an MP4 file breaks? An engineering deep-dive into the ISO-BMFF box structure, NAL units, and the repair algorithms behind modern video-repair tools.
Jonas Becker
Codec-Research · Tech Writer · May 19, 2026 · 8min read
Heads up: This article goes deep on the engineering. If you just want to repair your video, the complete repair guide is the better starting point. If you want to understand why videos break and how tools repair them — read on.
An MP4 file isn’t magic. It’s a simple container format that follows a handful of clear rules — and anyone who knows them can make far better sense of damaged files than any marketing material ever could.
What MP4 really is: ISO BMFF
“MP4” is colloquial. Technically the format is called ISO BMFF — the ISO Base Media File Format, specified in ISO/IEC 14496-12. MOV (Apple QuickTime), 3GP, M4A, M4V and the HEIC image format are all variants of the same standard. They differ only in the brand identifier at the start and in the list of permitted codecs.
The format dates back to 1991 (QuickTime), was ISO-standardized in 1998, and has changed only marginally since. It’s a “get out of my way” format: the spec only lays down how you structure the file, not what goes inside it.
Box structure: the only concept you need
An MP4 file is a sequence of boxes (Apple calls them atoms — it’s exactly the same thing). Every box has the same header:
[4 bytes: box size in big-endian, incl. header]
[4 bytes: box type as 4-character ASCII]
[body: box size minus 8 bytes]
That’s it. The entire ISO-BMFF spec is conceptually just this structure, nested recursively.
If you open an MP4 in a hex editor, you’ll typically see this at the start:
00 00 00 20 66 74 79 70 6D 70 34 32 00 00 00 00
^^^^^^^^^^^ ^^^^^^^^^^^
size=32 "ftyp"
The first 4 bytes say “this box is 32 bytes long,” the next 4 say “I’m of type ftyp.” Then comes the body — usually a major-brand identifier like “mp42”, “isom”, “qt ” (for QuickTime).
The top-level boxes that matter
A complete MP4 has tens of thousands of possible boxes. You only need to know five:
ftyp — File Type
Sits at the start. Tells the decoder: “I’m an MP4 of brand X.” If ftyp is missing, the decoder often gives up right here. If ftyp doesn’t match the content (e.g. qt brand but MP4 codecs), you get edge-case behavior.
moov — Movie Atom
The index. Contains:
mvhd— movie header with total duration, timescaletrak— track definitions (typically two: one video track, one audio track)- Inside each
trak:tkhd(track header),mdia(media container) withminf(media information) withstbl(sample table) - The sample table is the real gold:
stts(decode times),stss(sync samples, i.e. keyframes),stsc(sample-to-chunk mapping),stsz(sample sizes),stco(chunk offsets in the file)
Without moov the file won’t play. The decoder doesn’t know where each frame is, how long it is, or which codec it’s encoded in.
mdat — Media Data
The actual bytes. This is where all the video frames and all the audio samples sit back to back, in a plain bytestream sequence. The sample table in moov points into this block with offsets.
mdat is usually 99% of the file size. If the file is 1.4 GB, roughly 1.38 GB of that is mdat.
free and skip
Padding boxes. Can be ignored.
uuid
Extension boxes with a 128-bit UUID instead of a 4-char type. Sony, Canon and others use them for proprietary metadata (camera model, profile settings, XML sidecars). During DiskDrill recovery, uuid can cause trouble because bytes from adjacent sectors get mixed in — see Sony FX3 card crash.
Where does the encoder write the index?
Here’s the critical detail: ISO BMFF allows three different layouts.
Layout A — moov at the end (standard for live recording):
[ftyp][mdat (all raw bytes)][moov (index)]
During recording the encoder doesn’t know how big the video will be. So it writes ftyp at the start, then continuously pumps frames into mdat, and right at the end — when the user hits “stop” — it writes the index. Upside: minimal RAM usage.
And this is exactly where the problem comes in. If the recording is interrupted (dead battery, card pulled, crash), moov is never written. The file has ftyp + mdat but no index. Players can’t do anything with it.
Layout B — moov at the front (streaming-optimized):
[ftyp][moov (index, often with placeholder offsets)][mdat]
Used with -movflags faststart in ffmpeg, or for web-optimized files. The encoder either computes ahead of time or writes to Layout A first and then remuxes. These files are robust against recording interruptions because moov is already there — but camera recordings are rarely like this.
Layout C — fragmented MP4 (live streaming):
[ftyp][moov (init segment, empty)][moof+mdat][moof+mdat][moof+mdat]…[mfra]
DASH/HLS streaming uses this. Each fragment has its own mini-index (moof). Robust against interruption, but more complex. Camera recordings rarely use it — some cinema cameras (RED, Sony Cinema) do.
Most broken recordings are Layout A with a missing moov. That’s the statistical reality.
How do you repair an MP4 without a moov?
This is where it gets interesting. The mdat block is there — all the frames are in it. What’s missing is the index. That has to be reconstructed from the raw bytes.
Step 1: Find frame boundaries
In an H.264 or HEVC stream, frames are separated by NAL-unit start codes: 0x00 00 00 01 (4 bytes) or 0x00 00 01 (3 bytes). If you walk through the mdat block looking for start codes, you find the start of every frame.
... 0F 3A 89 00 00 00 01 65 88 12 ...
^^^^^^^^^ NAL start
^^ NAL header (Type=5 = IDR frame)
From the NAL header you can read the frame type (I-frame, P-frame, B-frame, SPS, PPS, IDR). That’s enough to rebuild a complete sample table.
Important: In the MP4 container the NAL units are often length-prefixed rather than encoded with start codes. With H.264 under the avc1 tag and HEVC under the hvc1 tag, each NAL is preceded by its length in 4 bytes big-endian — no start codes. Tools have to know this, otherwise they find no frames.
Step 2: Reconstruct the codec parameters
Even with all the frame offsets, the decoder still needs the codec parameters: SPS (Sequence Parameter Set), PPS (Picture Parameter Set), and for HEVC also VPS (Video Parameter Set). These normally sit in moov (in the avcC or hvcC atom). If that’s missing, there are two options:
(a) Extract them from the bitstream. If the encoder wrote the parameters “in-band” (NAL-unit types 7/8 for H.264, 32–34 for HEVC), you can read them from the mdat itself. Often works with H.264, less often with HEVC.
(b) Steal them from a reference recording. This is the reliable method. An intact recording from the same camera with identical settings has exactly the same SPS/PPS — codec profile, level, color primaries, bit depth. You copy the avcC/hvcC box from the reference into the reconstructed moov of the broken file.
Step 3: Audio stream sync
Here’s the most common failure in naive repair tools. Audio samples are interleaved between video frames in the mdat. If you compute the audio offsets wrong, the sound runs ahead or behind.
A concrete example: with Sony XAVC and 24-bit PCM audio, there’s a 480ms drift between audio and video, because the XAVC audio stream begins with a pre-roll that doesn’t arrive in the standard sample-table format. untrunc’s default returns these files with 480ms of audio lead. Haven measures the drift by cross-correlation with the reference recording and corrects it automatically — see Fix audio drift in your video.
Why the hvcC box is so touchy
HEVC has an additional problem: the hvcC box contains not just SPS/PPS but also:
- lengthSizeMinusOne — length of the length-prefix in the NAL units (typically 3, i.e. a 4-byte prefix)
- arrays[] — VPS/SPS/PPS NAL units as binary blobs
- chromaFormat — 4:2:0 (0), 4:2:2 (1), 4:4:4 (3)
- bitDepthLumaMinus8 and bitDepthChromaMinus8 — 8-bit (0) or 10-bit (2)
- avgFrameRate and constantFrameRate — frame-rate hints
If even a single byte of this box is wrong (e.g. because DiskDrill read from the wrong sector), decoder initialization fails and the player shows nothing but a blank picture — even when every frame in the mdat is intact.
With HEVC there’s also the hev1/hvc1 tag difference:
hvc1: SPS/PPS/VPS sit exclusively in thehvcCbox (out-of-band)hev1: SPS/PPS/VPS can also sit in-band in the bitstream
If a tool repackages a file from hev1 to hvc1 without copying the parameters from the bitstream into the hvcC box, the result is a file with an empty hvcC — and black playback.
What repair tools actually do internally
When you open a generic repair tool now and hit “repair,” here’s what runs:
- Scan: Read the file in blocks of a few MB. Look for box headers. Identify ftyp, mdat, moov (if present), uuid.
- Diagnosis: Which top-level boxes are missing? If moov is missing → path A (index reconstruction). If moov is there but mdat is shorter than expected → path B (mdat repair). If both are there but the codec parameters are missing → path C (codec patch).
- Path A (the most common case): Run a NAL-unit scanner through the mdat, find all sample boundaries. Extract SPS/PPS/VPS from the reference. Synthesize a new moov structure. Write the file with the new layout.
- Verification: A dedicated decode pipeline opens the reconstructed file, checks that the first IDR frame is decodable and that audio samples are delivered.
There are 100 points in this process where a tool can make a detail mistake. Experience with different cameras (Sony XAVC vs GoPro HEVC vs iPhone Cinematic) matters just as much as the code itself — see the device-specific guides under Repair.
What you can take away as a user
Three insights that make you a smarter buyer of any repair tool:
- A broken MP4 is usually repairable, because 99% of the file is intact. What’s missing is the small index at the end.
- A reference recording raises success rates dramatically. If a tool is offered without a reference option, it’s either very generic (lower success rate) or it claims magic it doesn’t have.
- “AI repair” is marketing BS. Video repair is deterministic container reconstruction. If a vendor advertises “AI,” ask them specifically about the methodology — often it’s just untrunc under a new brand.
Done — what next?
- Want to see the complete repair strategies? → Repair a damaged video — the complete guide
- Want to see how the specific moov-atom error gets fixed? → moov atom missing — guide
- Want to check your video right now? → Free online diagnosis (local, in your browser, no upload)
About the author
Jonas Becker
Codec-Research · Tech Writer
A computer-science background with a focus on video containers and stream parsing. Writes Haven’s deep-dives — from HEVC NAL-unit structures to Dolby Vision RPU metadata.
Specialty · HEVC · ProRes · Container standards · NAL-unit analysis