Skip to content

open_svo2

OpenSVO2: An open-source reverse-engineered interface for SVO2 files.

open_svo2.FrameFooter

Bases: Structure

Memory mapping for the SVO2 stereo frame footer (56 bytes, 12 fields).

Attributes:

Name Type Description
width

Image width in pixels.

height

Image height in pixels.

_unknown_magic

Magic number (0x5c002c00).

_unknown_1

Unknown constant (1).

_unknown_2

Unknown constant (2).

_unknown_3

Unknown constant (-1).

timestamp

Timestamp in nanoseconds.

payload_size

Size of H.264/H.265 payload in bytes.

frame_type

3 for key-frame, 0 for i-frame.

last_keyframe_index

Index of the last keyframe.

frame_id

Sequential frame index.

_unsure_keyframe_id

Possible keyframe ID.

Source code in src/open_svo2/metadata.py
class FrameFooter(Structure):
    """Memory mapping for the SVO2 stereo frame footer (56 bytes, 12 fields).

    Attributes:
        width: Image width in pixels.
        height: Image height in pixels.
        _unknown_magic: Magic number (0x5c002c00).
        _unknown_1: Unknown constant (1).
        _unknown_2: Unknown constant (2).
        _unknown_3: Unknown constant (-1).
        timestamp: Timestamp in nanoseconds.
        payload_size: Size of H.264/H.265 payload in bytes.
        frame_type: 3 for key-frame, 0 for i-frame.
        last_keyframe_index: Index of the last keyframe.
        frame_id: Sequential frame index.
        _unsure_keyframe_id: Possible keyframe ID.
    """

    _fields_ = [
        ("width", c_uint32),
        ("height", c_uint32),
        ("_unknown_magic", c_uint32),
        ("_unknown_1", c_int32),
        ("_unknown_2", c_int32),
        ("_unknown_3", c_int32),
        ("timestamp", c_uint64),
        ("payload_size", c_uint32),
        ("frame_type", c_int32),
        ("last_keyframe_index", c_int32),
        ("frame_id", c_uint32),
        ("_unsure_keyframe_id", c_int32),
    ]

open_svo2.Header

Bases: Structure

Memory mapping for the SVO2 binary header (128 bytes, 32 fields).

Field naming conventions

  • Confirmed fields are named directly
  • Unconfirmed fields are prefixed with 'unsure'
  • Likely correct fields are prefixed with 'likely'

Warning

The parsed transformation matrix does not match the stereo transformation values given by the Zed SDK. The exact meaning and relationship is currently unknown.

Attributes:

Name Type Description
width

Image width in pixels (for a single camera).

height

Image height in pixels.

serial_number

Camera serial number (e.g., 40735594).

fps

Frames per second.

_unsure_frame_counter

Possibly frame index or counter.

_unsure_bit_depth

Bits per channel (typically 8).

_unsure_exposure_mode

Exposure control mode.

_likely_exposure_time

Likely exposure time (units unknown, observed: 1000).

_likely_camera_model

Camera model/SKU (e.g., 2001 = ZED 2).

_unsure_ts_sec

Timestamp seconds (often 0).

_unsure_ts_nsec

Timestamp nanoseconds (often 0).

_unsure_imu_status

IMU-related status flag.

w_scale

Scale factor (typically 1.0).

_likely_lens_id

Lens type identifier (observed: 5).

_unsure_isp_gain

ISP gain setting.

_unsure_isp_wb_r

White balance red channel.

_unsure_isp_wb_b

White balance blue channel.

_unsure_isp_gamma

Gamma correction value.

_likely_sync_status

Sync status flag (1 = synced?).

_unsure_padding

Padding or reserved field.

Source code in src/open_svo2/metadata.py
class Header(Structure):
    """Memory mapping for the SVO2 binary header (128 bytes, 32 fields).

    !!! info "Field naming conventions"

        - Confirmed fields are named directly
        - Unconfirmed fields are prefixed with '_unsure_'
        - Likely correct fields are prefixed with '_likely_'

    !!! warning

        The parsed transformation matrix does not match the stereo
        transformation values given by the Zed SDK. The exact meaning and
        relationship is currently unknown.

    Attributes:
        width: Image width in pixels (for a single camera).
        height: Image height in pixels.
        serial_number: Camera serial number (e.g., 40735594).
        fps: Frames per second.
        _unsure_frame_counter: Possibly frame index or counter.
        _unsure_bit_depth: Bits per channel (typically 8).
        _unsure_exposure_mode: Exposure control mode.
        _likely_exposure_time: Likely exposure time (units unknown, observed: 1000).
        _likely_camera_model: Camera model/SKU (e.g., 2001 = ZED 2).
        _unsure_ts_sec: Timestamp seconds (often 0).
        _unsure_ts_nsec: Timestamp nanoseconds (often 0).
        _unsure_imu_status: IMU-related status flag.
        w_scale: Scale factor (typically 1.0).
        _likely_lens_id: Lens type identifier (observed: 5).
        _unsure_isp_gain: ISP gain setting.
        _unsure_isp_wb_r: White balance red channel.
        _unsure_isp_wb_b: White balance blue channel.
        _unsure_isp_gamma: Gamma correction value.
        _likely_sync_status: Sync status flag (1 = synced?).
        _unsure_padding: Padding or reserved field.
    """

    _fields_ = [
        ("width", c_uint32),
        ("height", c_uint32),
        ("serial_number", c_uint32),
        ("fps", c_uint32),
        ("_unsure_frame_counter", c_uint32),
        ("_unsure_bit_depth", c_uint32),
        ("_unsure_exposure_mode", c_uint32),
        ("_likely_exposure_time", c_uint32),
        ("_likely_camera_model", c_uint32),

        ("r00", c_float), ("r01", c_float), ("r02", c_float), ("tx", c_float),
        ("r10", c_float), ("r11", c_float), ("r12", c_float), ("ty", c_float),
        ("r20", c_float), ("r21", c_float), ("r22", c_float), ("tz", c_float),

        ("_unsure_ts_sec", c_uint32),
        ("_unsure_ts_nsec", c_uint32),
        ("_unsure_imu_status", c_uint32),
        ("w_scale", c_float),

        ("_likely_lens_id", c_uint32),
        ("_unsure_isp_gain", c_uint32),
        ("_unsure_isp_wb_r", c_uint32),
        ("_unsure_isp_wb_b", c_uint32),
        ("_unsure_isp_gamma", c_uint32),
        ("_likely_sync_status", c_uint32),
        ("_unsure_padding", c_uint32),
    ]

    @classmethod
    def from_base64(cls, data: str) -> Self:
        """Create an SVO2Header instance from encoded base64."""
        decoded = base64.b64decode(data)
        if len(decoded) != sizeof(cls):
            raise ValueError(
                f"Data length {len(decoded)} does not match SVO2Header "
                f"(requires {sizeof(cls)} bytes).")
        return cls.from_buffer_copy(decoded)

from_base64 classmethod

from_base64(data: str) -> Self

Create an SVO2Header instance from encoded base64.

Source code in src/open_svo2/metadata.py
@classmethod
def from_base64(cls, data: str) -> Self:
    """Create an SVO2Header instance from encoded base64."""
    decoded = base64.b64decode(data)
    if len(decoded) != sizeof(cls):
        raise ValueError(
            f"Data length {len(decoded)} does not match SVO2Header "
            f"(requires {sizeof(cls)} bytes).")
    return cls.from_buffer_copy(decoded)

open_svo2.IMUData dataclass

Zed IMU data.

Attributes:

Name Type Description
timestamp float64

IMU measurement timestamp in seconds.

accel Float32[ndarray, 3]

Linear acceleration in m/s^2, in the Zed camera coordinate frame, without calibration.

avel Float32[ndarray, 3]

Angular velocity in deg/s, in the Zed camera coordinate frame, without calibration.

Source code in src/open_svo2/imu.py
@dataclass
class IMUData:
    """Zed IMU data.

    Attributes:
        timestamp: IMU measurement timestamp in seconds.
        accel: Linear acceleration in m/s^2, in the Zed camera coordinate
            frame, without calibration.
        avel: Angular velocity in deg/s, in the Zed camera coordinate frame,
            without calibration.
    """

    timestamp: np.float64
    accel: Float32[np.ndarray, "3"]
    avel: Float32[np.ndarray, "3"]

    @classmethod
    def from_raw_data(cls, raw_data: bytes) -> "IMUData":
        """Parse raw IMU data from the ZED SDK binary format."""
        timestamp_ns = np.frombuffer(
            raw_data, dtype=np.uint64, count=1, offset=0x010)[0]

        accel = np.frombuffer(
            raw_data, dtype=np.float32, count=3, offset=0x064)
        avel = np.frombuffer(
            raw_data, dtype=np.float32, count=3, offset=0x058)

        return cls(
            timestamp=np.float64(timestamp_ns) / 1e9, accel=accel, avel=avel)

from_raw_data classmethod

from_raw_data(raw_data: bytes) -> IMUData

Parse raw IMU data from the ZED SDK binary format.

Source code in src/open_svo2/imu.py
@classmethod
def from_raw_data(cls, raw_data: bytes) -> "IMUData":
    """Parse raw IMU data from the ZED SDK binary format."""
    timestamp_ns = np.frombuffer(
        raw_data, dtype=np.uint64, count=1, offset=0x010)[0]

    accel = np.frombuffer(
        raw_data, dtype=np.float32, count=3, offset=0x064)
    avel = np.frombuffer(
        raw_data, dtype=np.float32, count=3, offset=0x058)

    return cls(
        timestamp=np.float64(timestamp_ns) / 1e9, accel=accel, avel=avel)

open_svo2.Intrinsics dataclass

Camera intrinsic parameters using the Brown-Conrady (OpenCV) model.

This dataclass represents the calibration parameters for a single camera in OpenCV-compatible format, ready to be passed directly to OpenCV functions.

  • The camera_matrix is the camera intrinsic matrix in the form:
    [[fx,  0, cx],
     [ 0, fy, cy],
     [ 0,  0,  1]]
    
    where fx/fy are focal lengths in pixels and cx/cy is the principal point.
  • The dist_coeffs array contains the distortion coefficients in the order (k1, k2, p1, p2, k3) following OpenCV's standard 5-parameter distortion model:
    • k1, k2, k3: Radial distortion coefficients (2nd, 4th, 6th order)
    • p1, p2: Tangential distortion coefficients

Attributes:

Name Type Description
camera_matrix Float64[ndarray, '3 3']

camera intrinsic matrix in OpenCV format.

dist_coeffs Float64[ndarray, 5]

distortion coefficients in OpenCV order.

Notes
  • Compatible with cv2.undistort(), cv2.calibrateCamera(), etc.
  • Distortion coefficients are dimensionless and resolution-independent
  • Camera matrix scales linearly with image resolution
  • The distortion model follows OpenCV convention (Brown-Conrady model)
Source code in src/open_svo2/intrinsics.py
@dataclass
class Intrinsics:
    """Camera intrinsic parameters using the Brown-Conrady (OpenCV) model.

    This dataclass represents the calibration parameters for a single camera in
    OpenCV-compatible format, ready to be passed directly to OpenCV functions.

    - The `camera_matrix` is the camera intrinsic matrix in the form:
        ```
        [[fx,  0, cx],
         [ 0, fy, cy],
         [ 0,  0,  1]]
        ```
        where fx/fy are focal lengths in pixels and cx/cy is the principal
        point.
    - The `dist_coeffs` array contains the distortion coefficients in the order
        (k1, k2, p1, p2, k3) following OpenCV's standard 5-parameter distortion
        model:
        - k1, k2, k3: Radial distortion coefficients (2nd, 4th, 6th order)
        - p1, p2: Tangential distortion coefficients


    Attributes:
        camera_matrix: camera intrinsic matrix in OpenCV format.
        dist_coeffs: distortion coefficients in OpenCV order.

    Notes:
        - Compatible with cv2.undistort(), cv2.calibrateCamera(), etc.
        - Distortion coefficients are dimensionless and resolution-independent
        - Camera matrix scales linearly with image resolution
        - The distortion model follows OpenCV convention (Brown-Conrady model)
    """

    camera_matrix: Float64[np.ndarray, "3 3"]
    dist_coeffs: Float64[np.ndarray, "5"]

    @classmethod
    def from_config(cls, cfg: dict) -> Self:
        """Create Intrinsics from a parsed configuration dictionary.

        Args:
            cfg: Zed SDK sensor configuration dictionary.
        """
        camera_matrix = np.array([
            [cfg["fx"], 0.0, cfg["cx"]],
            [0.0, cfg["fy"], cfg["cy"]],
            [0.0, 0.0, 1.0]
        ], dtype=np.float64,)

        dist_coeffs = np.array(
            [cfg["k1"], cfg["k2"], cfg["p1"], cfg["p2"], cfg["k3"]],
            dtype=np.float64)

        return cls(camera_matrix=camera_matrix, dist_coeffs=dist_coeffs)

from_config classmethod

from_config(cfg: dict) -> Self

Create Intrinsics from a parsed configuration dictionary.

Parameters:

Name Type Description Default
cfg dict

Zed SDK sensor configuration dictionary.

required
Source code in src/open_svo2/intrinsics.py
@classmethod
def from_config(cls, cfg: dict) -> Self:
    """Create Intrinsics from a parsed configuration dictionary.

    Args:
        cfg: Zed SDK sensor configuration dictionary.
    """
    camera_matrix = np.array([
        [cfg["fx"], 0.0, cfg["cx"]],
        [0.0, cfg["fy"], cfg["cy"]],
        [0.0, 0.0, 1.0]
    ], dtype=np.float64,)

    dist_coeffs = np.array(
        [cfg["k1"], cfg["k2"], cfg["p1"], cfg["p2"], cfg["k3"]],
        dtype=np.float64)

    return cls(camera_matrix=camera_matrix, dist_coeffs=dist_coeffs)

open_svo2.Metadata dataclass

SVO2 file metadata extracted from MCAP container.

Attributes:

Name Type Description
imu_frequency float

IMU sampling frequency in Hz (e.g., 200.0).

zed_sdk_version str

Version of the ZED SDK used to create the file.

calib_acc_matrix1 Float32[ndarray, '3 3']

3x3 float32 matrix for accelerometer calibration.

calib_acc_matrix2 Float32[ndarray, '3 3']

3x3 float32 matrix for accelerometer calibration.

calib_gyro_matrix1 Float32[ndarray, '3 3']

3x3 float32 matrix for gyroscope calibration.

calib_gyro_matrix2 Float32[ndarray, '3 3']

3x3 float32 matrix for gyroscope calibration.

header Header

Parsed SVO2Header.

version str

SVO2 file format version string (e.g., "2.0.3").

channels dict[str, int]

Mapping of topic names to channel IDs in the MCAP file.

timestamps dict[str, UInt64[ndarray, '?N']]

Dictionary mapping topic names to arrays of uint64 timestamps (in nanoseconds since epoch) for each sensor reading.

Source code in src/open_svo2/metadata.py
@dataclass
class Metadata:
    """SVO2 file metadata extracted from MCAP container.

    Attributes:
        imu_frequency: IMU sampling frequency in Hz (e.g., 200.0).
        zed_sdk_version: Version of the ZED SDK used to create the file.
        calib_acc_matrix1: 3x3 float32 matrix for accelerometer calibration.
        calib_acc_matrix2: 3x3 float32 matrix for accelerometer calibration.
        calib_gyro_matrix1: 3x3 float32 matrix for gyroscope calibration.
        calib_gyro_matrix2: 3x3 float32 matrix for gyroscope calibration.
        header: Parsed SVO2Header.
        version: SVO2 file format version string (e.g., "2.0.3").
        channels: Mapping of topic names to channel IDs in the MCAP file.
        timestamps: Dictionary mapping topic names to arrays of uint64
            timestamps (in nanoseconds since epoch) for each sensor reading.
    """

    imu_frequency: float
    zed_sdk_version: str
    calib_acc_matrix1: Float32[np.ndarray, "3 3"]
    calib_acc_matrix2: Float32[np.ndarray, "3 3"]
    calib_gyro_matrix1: Float32[np.ndarray, "3 3"]
    calib_gyro_matrix2: Float32[np.ndarray, "3 3"]
    header: Header
    version: str
    channels: dict[str, int]
    timestamps: dict[str, UInt64[np.ndarray, "?N"]]

    @staticmethod
    def _read_json_msg(stream, topic: str = "") -> dict:
        _schema, _channel, msg = next(stream, (None, None, None))
        if msg is None:
            raise ValueError(f"No {topic} message found in the SVO2 file.")
        return json.loads(msg.data)

    @staticmethod
    def _get_raw_data(reader: McapReader):
        footer_stream = reader.iter_messages(topics=["svo_footer"])
        footer = Metadata._read_json_msg(footer_stream, topic="svo_footer")

        header_stream = reader.iter_messages(topics=["svo_header"])
        header = Metadata._read_json_msg(header_stream, topic="svo_header")

        return header, footer

    @classmethod
    def from_mcap(cls, mcap: McapReader | str) -> Self:
        """Extract metadata from the MCAP reader.

        Args:
            mcap: file path to a svo2 mcap file or a `McapReader` handle.
        """
        if isinstance(mcap, str):
            with open(mcap, "rb") as f:
                return cls.from_mcap(make_reader(f))

        summary = mcap.get_summary()
        if summary is None:
            raise ValueError("Failed to read summary from the SVO2 file.")
        summary_short = {v.topic: k for k, v in summary.channels.items()}

        header, footer = cls._get_raw_data(mcap)
        timestamps = {
            k: np.array(v, dtype=np.uint64)
            for k, v in footer.items()}
        decoded_header = Header.from_base64(header.get("header", ""))

        # Parse calibration data (each is 18 float32s = two 3x3 matrices)
        calib_acc_raw = base64.b64decode(header.get("Calib_acc", ""))
        calib_gyro_raw = base64.b64decode(header.get("Calib_gyro", ""))

        acc_floats = np.frombuffer(calib_acc_raw, dtype=np.float32)
        gyro_floats = np.frombuffer(calib_gyro_raw, dtype=np.float32)

        return cls(
            imu_frequency=header.get("imu_frequency_hz", 0.0),
            zed_sdk_version=header.get("zed_sdk_version", "unknown"),
            header=decoded_header,
            version=header.get("version", "unknown"),
            calib_acc_matrix1=acc_floats[:9].reshape(3, 3),
            calib_acc_matrix2=acc_floats[9:].reshape(3, 3),
            calib_gyro_matrix1=gyro_floats[:9].reshape(3, 3),
            calib_gyro_matrix2=gyro_floats[9:].reshape(3, 3),
            timestamps=timestamps,
            channels=summary_short
        )

    def consistency_check(self) -> None:
        """Check parsed metadata for consistency."""
        for channel in self.channels:
            if channel.startswith("Camera"):
                m = re.match(r"Camera_SN(\d+)/(.*)", channel)
                if m is None:
                    logger.warning(
                        f"Channel name has unexpected pattern: {channel}")
                elif int(m.group(1)) != self.header.serial_number:
                    logger.warning(
                        f"Serial number mismatch: channel {channel} "
                        f"vs {self.header.serial_number} (from header)")

consistency_check

consistency_check() -> None

Check parsed metadata for consistency.

Source code in src/open_svo2/metadata.py
def consistency_check(self) -> None:
    """Check parsed metadata for consistency."""
    for channel in self.channels:
        if channel.startswith("Camera"):
            m = re.match(r"Camera_SN(\d+)/(.*)", channel)
            if m is None:
                logger.warning(
                    f"Channel name has unexpected pattern: {channel}")
            elif int(m.group(1)) != self.header.serial_number:
                logger.warning(
                    f"Serial number mismatch: channel {channel} "
                    f"vs {self.header.serial_number} (from header)")

from_mcap classmethod

from_mcap(mcap: McapReader | str) -> Self

Extract metadata from the MCAP reader.

Parameters:

Name Type Description Default
mcap McapReader | str

file path to a svo2 mcap file or a McapReader handle.

required
Source code in src/open_svo2/metadata.py
@classmethod
def from_mcap(cls, mcap: McapReader | str) -> Self:
    """Extract metadata from the MCAP reader.

    Args:
        mcap: file path to a svo2 mcap file or a `McapReader` handle.
    """
    if isinstance(mcap, str):
        with open(mcap, "rb") as f:
            return cls.from_mcap(make_reader(f))

    summary = mcap.get_summary()
    if summary is None:
        raise ValueError("Failed to read summary from the SVO2 file.")
    summary_short = {v.topic: k for k, v in summary.channels.items()}

    header, footer = cls._get_raw_data(mcap)
    timestamps = {
        k: np.array(v, dtype=np.uint64)
        for k, v in footer.items()}
    decoded_header = Header.from_base64(header.get("header", ""))

    # Parse calibration data (each is 18 float32s = two 3x3 matrices)
    calib_acc_raw = base64.b64decode(header.get("Calib_acc", ""))
    calib_gyro_raw = base64.b64decode(header.get("Calib_gyro", ""))

    acc_floats = np.frombuffer(calib_acc_raw, dtype=np.float32)
    gyro_floats = np.frombuffer(calib_gyro_raw, dtype=np.float32)

    return cls(
        imu_frequency=header.get("imu_frequency_hz", 0.0),
        zed_sdk_version=header.get("zed_sdk_version", "unknown"),
        header=decoded_header,
        version=header.get("version", "unknown"),
        calib_acc_matrix1=acc_floats[:9].reshape(3, 3),
        calib_acc_matrix2=acc_floats[9:].reshape(3, 3),
        calib_gyro_matrix1=gyro_floats[:9].reshape(3, 3),
        calib_gyro_matrix2=gyro_floats[9:].reshape(3, 3),
        timestamps=timestamps,
        channels=summary_short
    )

open_svo2.StereoIntrinsics dataclass

Stereo camera pair parameters.

Info

Zed uses a convention where the left camera is transformed relative to the right camera which is considered the reference frame.

Attributes:

Name Type Description
left Intrinsics

Intrinsics for the left camera.

right Intrinsics

Intrinsics for the right camera.

baseline float

Horizontal separation between cameras in mm.

ty float

Translation offset in Y direction (vertical) in mm.

tz float

Translation offset in Z direction (depth) in mm.

cv float

Convergence angle in radians (angle at which optical axes converge).

rx float

Rotation around X axis (pitch) in radians.

rz float

Rotation around Z axis (roll) in radians.

Source code in src/open_svo2/intrinsics.py
@dataclass
class StereoIntrinsics:
    """Stereo camera pair parameters.

    !!! info

        Zed uses a convention where the left camera is transformed relative
        to the right camera which is considered the reference frame.

    Attributes:
        left: Intrinsics for the left camera.
        right: Intrinsics for the right camera.
        baseline: Horizontal separation between cameras in mm.
        ty: Translation offset in Y direction (vertical) in mm.
        tz: Translation offset in Z direction (depth) in mm.
        cv: Convergence angle in radians (angle at which optical axes converge).
        rx: Rotation around X axis (pitch) in radians.
        rz: Rotation around Z axis (roll) in radians.
    """

    left: Intrinsics
    right: Intrinsics
    baseline: float
    ty: float
    tz: float
    cv: float
    rx: float
    rz: float

    @classmethod
    def from_config(
        cls, cfg: dict | str,
        mode: str | None = None, height: int | None = None
    ) -> Self:
        """Parse Zed SDK `sensor.conf` contents.

        Args:
            cfg: Zed SDK sensor configuration dictionary or path to dictionary.
            mode: Camera mode (e.g., `FHD1200|FHD|SVGA` for the Zed X).
            height: Image height in pixels, used to infer mode if mode is not
                provided. Must be one of {1200, 1080, 600} corresponding to
                modes {FHD1200, FHD, SVGA} respectively.
        """
        if isinstance(cfg, str):
            with open(cfg, "r") as f:
                cfg = toml.load(f)

        if mode is None:
            if height is None:
                raise ValueError("Either mode or height must be provided")
            mode = cls.infer_mode(height)

        try:
            left = Intrinsics.from_config(cfg[f"LEFT_CAM_{mode}"])
            right = Intrinsics.from_config(cfg[f"RIGHT_CAM_{mode}"])
        except KeyError as e:
            raise ValueError(
                f"Missing camera configuration for mode '{mode}': {e}") from e

        return cls(
            left=left, right=right,
            baseline=cfg["STEREO"]["Baseline"],
            ty=cfg["STEREO"]["TY"],
            tz=cfg["STEREO"]["TZ"],
            cv=cfg["STEREO"][f"CV_{mode}"],
            rx=cfg["STEREO"][f"RX_{mode}"],
            rz=cfg["STEREO"][f"RZ_{mode}"],
        )

    def as_dict(self) -> dict:
        """Convert StereoIntrinsics to a dictionary format."""
        return {
            "left": {
                "camera_matrix": self.left.camera_matrix.tolist(),
                "dist_coeffs": self.left.dist_coeffs.tolist(),
            },
            "right": {
                "camera_matrix": self.right.camera_matrix.tolist(),
                "dist_coeffs": self.right.dist_coeffs.tolist(),
            },
            "baseline": self.baseline,
            "ty": self.ty,
            "tz": self.tz,
            "cv": self.cv,
            "rx": self.rx,
            "rz": self.rz,
        }

    @staticmethod
    def infer_mode(height: int) -> str:
        """Infer Zed camera mode from image height."""
        if height == 1200:
            return "FHD1200"
        elif height == 1080:
            return "FHD"
        elif height == 600:
            return "SVGA"
        else:
            raise ValueError(f"Unrecognized image height: {height}")

as_dict

as_dict() -> dict

Convert StereoIntrinsics to a dictionary format.

Source code in src/open_svo2/intrinsics.py
def as_dict(self) -> dict:
    """Convert StereoIntrinsics to a dictionary format."""
    return {
        "left": {
            "camera_matrix": self.left.camera_matrix.tolist(),
            "dist_coeffs": self.left.dist_coeffs.tolist(),
        },
        "right": {
            "camera_matrix": self.right.camera_matrix.tolist(),
            "dist_coeffs": self.right.dist_coeffs.tolist(),
        },
        "baseline": self.baseline,
        "ty": self.ty,
        "tz": self.tz,
        "cv": self.cv,
        "rx": self.rx,
        "rz": self.rz,
    }

from_config classmethod

from_config(
    cfg: dict | str, mode: str | None = None, height: int | None = None
) -> Self

Parse Zed SDK sensor.conf contents.

Parameters:

Name Type Description Default
cfg dict | str

Zed SDK sensor configuration dictionary or path to dictionary.

required
mode str | None

Camera mode (e.g., FHD1200|FHD|SVGA for the Zed X).

None
height int | None

Image height in pixels, used to infer mode if mode is not provided. Must be one of {1200, 1080, 600} corresponding to modes {FHD1200, FHD, SVGA} respectively.

None
Source code in src/open_svo2/intrinsics.py
@classmethod
def from_config(
    cls, cfg: dict | str,
    mode: str | None = None, height: int | None = None
) -> Self:
    """Parse Zed SDK `sensor.conf` contents.

    Args:
        cfg: Zed SDK sensor configuration dictionary or path to dictionary.
        mode: Camera mode (e.g., `FHD1200|FHD|SVGA` for the Zed X).
        height: Image height in pixels, used to infer mode if mode is not
            provided. Must be one of {1200, 1080, 600} corresponding to
            modes {FHD1200, FHD, SVGA} respectively.
    """
    if isinstance(cfg, str):
        with open(cfg, "r") as f:
            cfg = toml.load(f)

    if mode is None:
        if height is None:
            raise ValueError("Either mode or height must be provided")
        mode = cls.infer_mode(height)

    try:
        left = Intrinsics.from_config(cfg[f"LEFT_CAM_{mode}"])
        right = Intrinsics.from_config(cfg[f"RIGHT_CAM_{mode}"])
    except KeyError as e:
        raise ValueError(
            f"Missing camera configuration for mode '{mode}': {e}") from e

    return cls(
        left=left, right=right,
        baseline=cfg["STEREO"]["Baseline"],
        ty=cfg["STEREO"]["TY"],
        tz=cfg["STEREO"]["TZ"],
        cv=cfg["STEREO"][f"CV_{mode}"],
        rx=cfg["STEREO"][f"RX_{mode}"],
        rz=cfg["STEREO"][f"RZ_{mode}"],
    )

infer_mode staticmethod

infer_mode(height: int) -> str

Infer Zed camera mode from image height.

Source code in src/open_svo2/intrinsics.py
@staticmethod
def infer_mode(height: int) -> str:
    """Infer Zed camera mode from image height."""
    if height == 1200:
        return "FHD1200"
    elif height == 1080:
        return "FHD"
    elif height == 600:
        return "SVGA"
    else:
        raise ValueError(f"Unrecognized image height: {height}")

open_svo2.imu_from_svo2

imu_from_svo2(
    mcap: McapReader | str, metadata: Metadata | None = None
) -> dict[str, ndarray]

Extract raw IMU data from SVO2 MCAP into a .npz file.

Parameters:

Name Type Description Default
mcap McapReader | str

file path to a svo2 mcap file or a McapReader handle.

required
metadata Metadata | None

Optional pre-parsed metadata. If not provided, it will be extracted from the MCAP reader.

None

Returns:

Type Description
dict[str, ndarray]

A dictionary containing timestamps, angular_velocity, and linear_acceleration arrays.

Source code in src/open_svo2/convert.py
def imu_from_svo2(
    mcap: McapReader | str, metadata: Metadata | None = None
) -> dict[str, np.ndarray]:
    """Extract raw IMU data from SVO2 MCAP into a .npz file.

    Args:
        mcap: file path to a svo2 mcap file or a `McapReader` handle.
        metadata: Optional pre-parsed metadata. If not provided, it will be
            extracted from the MCAP reader.

    Returns:
        A dictionary containing `timestamps`, `angular_velocity`, and
            `linear_acceleration` arrays.
    """
    if isinstance(mcap, str):
        with open(mcap, "rb") as f:
            return imu_from_svo2(make_reader(f), metadata=metadata)
    if metadata is None:
        metadata = Metadata.from_mcap(mcap)

    topic = f"Camera_SN{metadata.header.serial_number}/sensors"
    stream_iter = mcap.iter_messages(topics=[topic])

    timestamps = []
    angular_velocity = []
    linear_acceleration = []

    for _, _, msg in stream_iter:
        raw = base64.b64decode(json.loads(msg.data)["data"])
        imu = IMUData.from_raw_data(raw)
        timestamps.append(imu.timestamp)
        angular_velocity.append(imu.avel)
        linear_acceleration.append(imu.accel)

    return {
        "timestamps": np.array(timestamps, dtype=np.float64),
        "angular_velocity": np.array(angular_velocity, dtype=np.float32),
        "linear_acceleration": np.array(linear_acceleration, dtype=np.float32)
    }

open_svo2.mp4_from_svo2

mp4_from_svo2(
    mcap: McapReader | str, output: str, metadata: Metadata | None = None
) -> UInt32[ndarray, N]

Extract video stream from SVO2 MCAP into a standard MP4 container.

Parameters:

Name Type Description Default
mcap McapReader | str

file path to a svo2 mcap file or a McapReader handle.

required
output str

file path to the output MP4 file.

required
metadata Metadata | None

Optional pre-parsed metadata. If not provided, it will be extracted from the MCAP reader.

None

Returns:

Type Description
UInt32[ndarray, N]

Index of the last keyframe, as recorded by the frame footer.

Source code in src/open_svo2/convert.py
def mp4_from_svo2(
    mcap: McapReader | str, output: str, metadata: Metadata | None = None
) -> UInt32[np.ndarray, "N"]:
    """Extract video stream from SVO2 MCAP into a standard MP4 container.

    Args:
        mcap: file path to a svo2 mcap file or a `McapReader` handle.
        output: file path to the output MP4 file.
        metadata: Optional pre-parsed metadata. If not provided, it will be
            extracted from the MCAP reader.

    Returns:
        Index of the last keyframe, as recorded by the frame footer.
    """
    if isinstance(mcap, str):
        with open(mcap, "rb") as f:
            return mp4_from_svo2(make_reader(f), output)
    if metadata is None:
        metadata = Metadata.from_mcap(mcap)

    stream_iter = mcap.iter_messages(
        topics=[f"Camera_SN{metadata.header.serial_number}/side_by_side"])

    try:
        first_msg = next(stream_iter)
    except StopIteration:
        return np.zeros((), dtype=np.uint32)

    _, _, msg = first_msg
    _, frame_size = struct.unpack("<II", msg.data[:8])
    payload = msg.data[8 : 8 + frame_size]
    codec_name = detect_codec(payload)
    start_ts = FrameFooter.from_buffer_copy(
        msg.data[8 + frame_size:]).timestamp

    timestamps = []
    keyframes = []
    with av.open(output, mode='w', format='mp4') as container:
        stream = container.add_stream(codec_name, rate=metadata.header.fps)
        # width is just one camera
        stream.width = metadata.header.width * 2
        stream.height = metadata.header.height
        stream.pix_fmt = "yuv420p"
        stream.time_base = Fraction(1, 1_000_000)

        def message_generator():
            yield first_msg
            yield from stream_iter

        last_pts = -1
        for i, (_, _, msg) in enumerate(message_generator()):
            _, size = struct.unpack("<II", msg.data[:8])
            payload = msg.data[8 : 8 + size]
            footer = FrameFooter.from_buffer_copy(msg.data[8 + size:])
            timestamps.append(footer.timestamp)
            keyframes.append(footer.last_keyframe_index)

            packet = av.Packet(payload)
            pts_us = int(footer.timestamp - start_ts) // 1000

            # Enforce strict monotonicity for MP4
            if pts_us <= last_pts:
                logger.warning(
                    f"Non-monotonic timestamp at frame {i}: "
                    f"{pts_us} <= {last_pts}. Correcting.")
                pts_us = last_pts + 1

            last_pts = pts_us
            packet.pts = pts_us
            packet.stream = stream

            container.mux(packet)

    container.close()

    timestamps = np.array(timestamps, dtype=np.uint64)
    timestamps_meta = metadata.timestamps[
        f"Camera_SN{metadata.header.serial_number}/side_by_side"]
    _check_timestamps(timestamps_meta, timestamps)

    return np.array(keyframes, dtype=np.uint32)

open_svo2.raw_from_svo2

raw_from_svo2(
    mcap: McapReader | str, output: str, metadata: Metadata | None = None
) -> tuple[UInt64[ndarray, N + 1], Float64[ndarray, N], UInt32[ndarray, N]]

Extract raw video frames from SVO2 MCAP into a binary file.

Raw frames are concatenated to the output file; this should be readable with ffmpeg.

Parameters:

Name Type Description Default
mcap McapReader | str

file path to a svo2 mcap file or a McapReader handle.

required
output str

file path to the output file.

required
metadata Metadata | None

Optional pre-parsed metadata. If not provided, it will be extracted from the MCAP reader.

None

Returns:

Type Description
UInt64[ndarray, N + 1]

Byte offsets of frame boundaries.

Float64[ndarray, N]

Timestamps in seconds (Float64).

UInt32[ndarray, N]

Index of the last keyframe, as recorded by the frame footer.

Source code in src/open_svo2/convert.py
def raw_from_svo2(
    mcap: McapReader | str, output: str, metadata: Metadata | None = None
) -> tuple[
    UInt64[np.ndarray, "N+1"],
    Float64[np.ndarray, "N"],
    UInt32[np.ndarray, "N"]
]:
    """Extract raw video frames from SVO2 MCAP into a binary file.

    Raw frames are concatenated to the output file; this should be readable
    with ffmpeg.

    Args:
        mcap: file path to a svo2 mcap file or a `McapReader` handle.
        output: file path to the output file.
        metadata: Optional pre-parsed metadata. If not provided, it will be
            extracted from the MCAP reader.

    Returns:
        Byte offsets of frame boundaries.
        Timestamps in seconds (Float64).
        Index of the last keyframe, as recorded by the frame footer.

    """
    if isinstance(mcap, str):
        with open(mcap, "rb") as f:
            return raw_from_svo2(make_reader(f), output, metadata)
    if metadata is None:
        metadata = Metadata.from_mcap(mcap)

    stream_iter = mcap.iter_messages(
        topics=[f"Camera_SN{metadata.header.serial_number}/side_by_side"])

    offsets = []
    timestamps = []
    keyframes = []
    byte_offset = 0

    with open(output, "wb") as f:
        for _, _, msg in stream_iter:
            _, size = struct.unpack("<II", msg.data[:8])
            payload = msg.data[8 : 8 + size]
            footer = FrameFooter.from_buffer_copy(msg.data[8 + size:])

            offsets.append(byte_offset)
            timestamps.append(footer.timestamp)
            keyframes.append(footer.last_keyframe_index)

            f.write(payload)
            byte_offset += len(payload)

    timestamps_ns = np.array(timestamps, dtype=np.uint64)
    timestamps_meta = metadata.timestamps[
        f"Camera_SN{metadata.header.serial_number}/side_by_side"]
    _check_timestamps(timestamps_meta, timestamps_ns)

    return (
        np.array(offsets, dtype=np.uint64),
        timestamps_ns / 1e9,
        np.array(keyframes, dtype=np.uint32),
    )