SVO2 Format Reverse Engineering¶
.svo2 files are MCAP files with 5 topics:
| Topic | Description |
|---|---|
svo_header |
Contains the SVO metadata in JSON format. It has only one message near the beginning of the file. |
svo_footer |
Contains the SVO footer in JSON format. It has only one message near the end of the file. |
Camera_SN(?<sn>\d+)/side_by_side |
Contains the video stream encoded as H.264/H.265, with left and right stereo images concatenated side-by-side (along the width axis). |
Camera_SN(?<sn>\d+)/sensors |
IMU and temperature sensor data. |
Camera_SN(?<sn>\d+)/sensors_integrated |
IMU and temperature sensor data which has been integrated (?) to match the camera data rate (?). |
Here, (?<sn>\d+) refers to the camera serial number, which is expected to be a decimal number.
Based on our analysis, .svo2 files do not appear to have any gaps between messages which could indicate data stored outside of MCAP message payloads.
Header¶
The header is a JSON object with the following fields:
Calib_acc: Base64-encoded binary containing two 3×3 float32 matrices (72 bytes = 18 floats). The first matrix appears to contain bias correction terms; the second is a scale/cross-axis calibration matrix close to the identity.Calib_gyro: Base64-encoded binary containing two 3×3 float32 matrices (72 bytes = 18 floats). Same layout asCalib_acc: first matrix contains gyroscope offset terms, second is a full scale+cross-axis calibration matrix.IMU_frequency: IMU measurement frequency in Hz (e.g., 200.0 for ZED X).ZED_SDK_version: The version of the Zed SDK used to record the SVO as a string (e.g.,"5.0.0").header: A base64-encoded binary structure.
Header format: see metadata.Header for additional details.
Header Format
| Offset | Size | Content |
|---|---|---|
| 0x00 | 4 | width: u32 = 3840 |
| 0x04 | 4 | height: u32 = 1080 |
| 0x08 | 4 | serial_number: u32 = 40735594 |
| 0x0C | 4 | fps: u32 = 30 |
| 0x10 | 4 | unknown: u32 = 0 |
| 0x14 | 4 | unknown: u32 = 8 |
| 0x18 | 4 | unknown: u32 = 0 |
| 0x1C | 4 | unknown: u32 = 1000 |
| 0x20 | 4 | unknown: u32 = 2001 |
| 0x24 | 48 | transform: f32[12] |
| 0x54 | 4 | unknown: u32 = 0 |
| 0x58 | 4 | unknown: u32 = 0 |
| 0x5C | 4 | unknown: u32 = 0 |
| 0x60 | 4 | unknown: f32 = 1.0 |
| 0x64 | 4 | unknown: u32 = 5 |
| 0x68 | 4 | unknown: u32 = 0 |
| 0x6C | 4 | unknown: u32 = 0 |
| 0x70 | 4 | unknown: u32 = 0 |
| 0x74 | 4 | unknown: u32 = 0 |
| 0x78 | 4 | unknown: u32 = 1 |
| 0x7C | 4 | unknown: u32 = 0 |
Footer¶
The footer is a JSON object with one entry for each time series stream (Camera_SN(?<sn>\d+)/side_by_side, Camera_SN(?<sn>\d+)/sensors, Camera_SN(?<sn>\d+)/sensors_integrated).
- Each entry is a base64-encoded array of timestamps for the messages in the corresponding stream.
- The timestamps are stored as 64-bit integers representing milliseconds since the Unix epoch. These are presumably unsigned integers.
Video Stream¶
The video payload is stored as a standard h264/h265 bitstream, with left and right cameras concatenated side by side (i.e., along the width axis). Each encoded key-frame or i-frame is stored as a separate message.
Based on analysis of a sample h265 svo recording, we believe that video frames have the following structure:
- An 8-byte header containing a 32-bit total size (excluding itself) and a 32-bit H.265 size (excluding the header and footer)
- Standard h264/h265 header and payload
- A 56-byte footer containing metadata about the frame (resolution, timestamp, frame type, etc)
See metadata.FrameFooter for additional details.
Frame Footer Format
| Offset | Size | Content |
|---|---|---|
| 0x000000 | 4 | total_size: u32 = 917080 |
| 0x000004 | 4 | payload_size: u32 = 917020 |
| 0x000008 | 4 | 0x00000001 - H.265 start code |
| 0x00000C | 24 | VPS NAL unit |
| 0x000024 | 4 | 0x00000001 - H.265 start code |
| 0x000028 | 48 | SPS NAL unit (with VUI) |
| 0x000058 | 4 | 0x00000001 - H.265 start code |
| 0x00005C | 7 | PPS NAL unit |
| 0x000063 | 4 | 0x00000001 - H.265 start code |
| 0x000067 | 916981 | IDR slice data (video payload) |
| 0x0dfe24 | 56 | [00-03] width: u32 = 3840 |
| [04-07] height: u32 = 1080 | ||
| [08-0B] unknown: unknown = 0x5c002c00 | ||
| [0C-0F] unknown: i32 = 1 | ||
| [10-13] unknown: i32 = 2 | ||
| [14-17] unknown: i32 = -1 | ||
| [18-1F] timestamp: u64 = 1738434526000 ns | ||
| [20-23] frame_size: u32 = 917020 | ||
| [24-27] frame_type: i32 = | ||
| [28-2F] last_keyframe_index: i32 | ||
| [30-33] frame_idx: int32 = 0, 1, 2, ... | ||
| [34-37] unknown: keyframe id? changes to a seemingly random value every keyframe | ||
| 0x0dfe5c | END |
Sensor Data¶
Both Camera_SN(?<sn>\d+)/sensors and Camera_SN(?<sn>\d+)/integrated_sensors topics use the same 360-byte binary format, wrapped in a JSON object with a single data field containing a base64-encoded payload:
The two topics differ only in the first 8 bytes (a type byte differs: 0x33 for /sensors, 0x26 for /integrated_sensors) and in a few fields that are zero in /sensors but non-zero in /integrated_sensors (noted below).
The /sensors topic streams raw IMU data at the IMU sampling rate (200 Hz for ZED X). The /integrated_sensors topic streams data at the camera frame rate (~30 fps), with IMU readings synchronized to camera frame timestamps.
All multi-byte values are little-endian.
Message Format
| Offset | Size | Type | Content |
|---|---|---|---|
| 0x000 | 1 | u8 | unknown: constant 0x01 |
| 0x001 | 1 | u8 | topic_type: 0x33 = /sensors, 0x26 = /integrated_sensors |
| 0x002 | 2 | u16 | unknown: constant 0xd6f5 |
| 0x004 | 4 | u32 | unknown: constant 0x0000fffe |
| 0x008 | 8 | u64 | timestamp_boot_ns: internal monotonic clock timestamp in nanoseconds (not Unix epoch; differences match MCAP log_time) |
| 0x010 | 8 | u64 | timestamp_unix_ns: Unix epoch timestamp in nanoseconds; identical to MCAP log_time |
| 0x018 | 32 | u8[32] | unknown: constant zeros |
| 0x038 | 4 | u32 | imu_new_sample: 1 when IMU DMP produced a new quaternion estimate, 0 otherwise |
| 0x03C | 16 | f32[4] | orientation: quaternion (x, y, z, w); unit quaternion representing IMU orientation |
| 0x04C | 12 | f32[3] | orientation_covariance_diagonal: (x, y, z) diagonal of orientation covariance matrix; zeros in /sensors, non-zero in /integrated_sensors |
| 0x058 | 12 | f32[3] | angular_velocity_uncalibrated: calibrated gyroscope (x, y, z) in deg/s |
| 0x064 | 12 | f32[3] | linear_acceleration_uncalibrated: calibrated accelerometer (x, y, z) in m/s² |
| 0x070 | 12 | f32[3] | angular_velocity: raw gyroscope (x, y, z) in deg/s, without bias/cross-axis correction |
| 0x07C | 12 | f32[3] | linear_acceleration: raw accelerometer (x, y, z) in m/s², without scale correction |
| 0x088 | 12 | f32[3] | angular_velocity_covariance_diagonal: (x, y, z) diagonal of gyro covariance (rad²/s²); constant per recording |
| 0x094 | 12 | f32[3] | linear_acceleration_covariance_diagonal: (x, y, z) diagonal of accel covariance (m²/s⁴); constant per recording |
| 0x0A0 | 64 | f32[16] | unknown: 16 very small floats (~1e-8 to ~1e-6), varying; possibly 4×4 orientation covariance |
| 0x0E0 | 4 | u32 | unknown: constant 0x00000100 in /sensors, differs in /integrated_sensors |
| 0x0E4 | 4 | f32 | temperature: IMU temperature in °C |
| 0x0E8 | 4 | f32 | unknown: 0.0 in /sensors; a large float (~1e10) in /integrated_sensors |
| 0x0EC | 24 | f32[6] | unknown: 6 NaN values (mixed positive 0x7fc00000 and negative 0xffc00000 NaN); possibly unavailable magnetometer x,y,z uncalibrated + calibrated, or temperature sensor slots |
| 0x104 | 4 | u32 | unknown: constant 0x00000004 |
| 0x108 | 60 | u8[60] | unknown: mix of zeros and special values; 1 NaN (0x7fc00000) at 0x110; 0x3f800000 (1.0f) at 0x120 in /sensors, a large float (~1e10) in /integrated_sensors; 2 NaN values at 0x124–0x12B |
| 0x144 | 4 | u8[4] | unknown: constant bytes 0x64 0x00 0x07 0x00 (reads as u16 pair: 100, 7) |
| 0x148 | 16 | u8[16] | unknown: constant zeros |
| 0x158 | 4 | u32 | unknown: constant 0x00000001 |
| 0x15C | 4 | f32 | effective_rate: IMU effective sampling rate as a fraction of the nominal rate (observed ~0.990–1.000) |
| 0x160 | 8 | u8[8] | unknown: constant zeros |
Confirmed fields
The following fields have been confirmed against ZED SDK reference output:
timestamp_unix_nsat0x010: exactly matches MCAPlog_timefor all messages (zero error across 24,623 messages)angular_velocityat0x058: matchesSensorsData.imu.angular_velocitylinear_accelerationat0x064: matchesSensorsData.imu.linear_accelerationangular_velocity_covariance_diagonalat0x088: matchesSensorsData.imu.angular_velocity_covariancediagonallinear_acceleration_covariance_diagonalat0x094: matchesSensorsData.imu.linear_acceleration_covariancediagonaltemperatureat0x0E4: physically consistent (42.25°C for ZED X under normal operation)