open_svo2 ¶

OpenSVO2: An open-source reverse-engineered interface for SVO2 files.

open_svo2.FrameFooter ¶

Bases: Structure

Memory mapping for the SVO2 stereo frame footer (56 bytes, 12 fields).

Attributes:

Name	Type	Description
`width`		Image width in pixels.
`height`		Image height in pixels.
`_unknown_magic`		Magic number (0x5c002c00).
`_unknown_1`		Unknown constant (1).
`_unknown_2`		Unknown constant (2).
`_unknown_3`		Unknown constant (-1).
`timestamp`		Timestamp in nanoseconds.
`payload_size`		Size of H.264/H.265 payload in bytes.
`frame_type`		3 for key-frame, 0 for i-frame.
`last_keyframe_index`		Index of the last keyframe.
`frame_id`		Sequential frame index.
`_unsure_keyframe_id`		Possible keyframe ID.

Source code in src/open_svo2/metadata.py

class FrameFooter(Structure):
    """Memory mapping for the SVO2 stereo frame footer (56 bytes, 12 fields).

    Attributes:
        width: Image width in pixels.
        height: Image height in pixels.
        _unknown_magic: Magic number (0x5c002c00).
        _unknown_1: Unknown constant (1).
        _unknown_2: Unknown constant (2).
        _unknown_3: Unknown constant (-1).
        timestamp: Timestamp in nanoseconds.
        payload_size: Size of H.264/H.265 payload in bytes.
        frame_type: 3 for key-frame, 0 for i-frame.
        last_keyframe_index: Index of the last keyframe.
        frame_id: Sequential frame index.
        _unsure_keyframe_id: Possible keyframe ID.
    """

    _fields_ = [
        ("width", c_uint32),
        ("height", c_uint32),
        ("_unknown_magic", c_uint32),
        ("_unknown_1", c_int32),
        ("_unknown_2", c_int32),
        ("_unknown_3", c_int32),
        ("timestamp", c_uint64),
        ("payload_size", c_uint32),
        ("frame_type", c_int32),
        ("last_keyframe_index", c_int32),
        ("frame_id", c_uint32),
        ("_unsure_keyframe_id", c_int32),
    ]

open_svo2.Header ¶

Bases: Structure

Memory mapping for the SVO2 binary header (128 bytes, 32 fields).

Field naming conventions

Confirmed fields are named directly
Unconfirmed fields are prefixed with 'unsure'
Likely correct fields are prefixed with 'likely'

Warning

The parsed transformation matrix does not match the stereo transformation values given by the Zed SDK. The exact meaning and relationship is currently unknown.

Attributes:

Name	Type	Description
`width`		Image width in pixels (for a single camera).
`height`		Image height in pixels.
`serial_number`		Camera serial number (e.g., 40735594).
`fps`		Frames per second.
`_unsure_frame_counter`		Possibly frame index or counter.
`_unsure_bit_depth`		Bits per channel (typically 8).
`_unsure_exposure_mode`		Exposure control mode.
`_likely_exposure_time`		Likely exposure time (units unknown, observed: 1000).
`_likely_camera_model`		Camera model/SKU (e.g., 2001 = ZED 2).
`_unsure_ts_sec`		Timestamp seconds (often 0).
`_unsure_ts_nsec`		Timestamp nanoseconds (often 0).
`_unsure_imu_status`		IMU-related status flag.
`w_scale`		Scale factor (typically 1.0).
`_likely_lens_id`		Lens type identifier (observed: 5).
`_unsure_isp_gain`		ISP gain setting.
`_unsure_isp_wb_r`		White balance red channel.
`_unsure_isp_wb_b`		White balance blue channel.
`_unsure_isp_gamma`		Gamma correction value.
`_likely_sync_status`		Sync status flag (1 = synced?).
`_unsure_padding`		Padding or reserved field.

Source code in src/open_svo2/metadata.py

class Header(Structure):
    """Memory mapping for the SVO2 binary header (128 bytes, 32 fields).

    !!! info "Field naming conventions"

        - Confirmed fields are named directly
        - Unconfirmed fields are prefixed with '_unsure_'
        - Likely correct fields are prefixed with '_likely_'

    !!! warning

        The parsed transformation matrix does not match the stereo
        transformation values given by the Zed SDK. The exact meaning and
        relationship is currently unknown.

    Attributes:
        width: Image width in pixels (for a single camera).
        height: Image height in pixels.
        serial_number: Camera serial number (e.g., 40735594).
        fps: Frames per second.
        _unsure_frame_counter: Possibly frame index or counter.
        _unsure_bit_depth: Bits per channel (typically 8).
        _unsure_exposure_mode: Exposure control mode.
        _likely_exposure_time: Likely exposure time (units unknown, observed: 1000).
        _likely_camera_model: Camera model/SKU (e.g., 2001 = ZED 2).
        _unsure_ts_sec: Timestamp seconds (often 0).
        _unsure_ts_nsec: Timestamp nanoseconds (often 0).
        _unsure_imu_status: IMU-related status flag.
        w_scale: Scale factor (typically 1.0).
        _likely_lens_id: Lens type identifier (observed: 5).
        _unsure_isp_gain: ISP gain setting.
        _unsure_isp_wb_r: White balance red channel.
        _unsure_isp_wb_b: White balance blue channel.
        _unsure_isp_gamma: Gamma correction value.
        _likely_sync_status: Sync status flag (1 = synced?).
        _unsure_padding: Padding or reserved field.
    """

    _fields_ = [
        ("width", c_uint32),
        ("height", c_uint32),
        ("serial_number", c_uint32),
        ("fps", c_uint32),
        ("_unsure_frame_counter", c_uint32),
        ("_unsure_bit_depth", c_uint32),
        ("_unsure_exposure_mode", c_uint32),
        ("_likely_exposure_time", c_uint32),
        ("_likely_camera_model", c_uint32),

        ("r00", c_float), ("r01", c_float), ("r02", c_float), ("tx", c_float),
        ("r10", c_float), ("r11", c_float), ("r12", c_float), ("ty", c_float),
        ("r20", c_float), ("r21", c_float), ("r22", c_float), ("tz", c_float),

        ("_unsure_ts_sec", c_uint32),
        ("_unsure_ts_nsec", c_uint32),
        ("_unsure_imu_status", c_uint32),
        ("w_scale", c_float),

        ("_likely_lens_id", c_uint32),
        ("_unsure_isp_gain", c_uint32),
        ("_unsure_isp_wb_r", c_uint32),
        ("_unsure_isp_wb_b", c_uint32),
        ("_unsure_isp_gamma", c_uint32),
        ("_likely_sync_status", c_uint32),
        ("_unsure_padding", c_uint32),
    ]

    @classmethod
    def from_base64(cls, data: str) -> Self:
        """Create an SVO2Header instance from encoded base64."""
        decoded = base64.b64decode(data)
        if len(decoded) != sizeof(cls):
            raise ValueError(
                f"Data length {len(decoded)} does not match SVO2Header "
                f"(requires {sizeof(cls)} bytes).")
        return cls.from_buffer_copy(decoded)

from_base64 `classmethod` ¶

from_base64(data: str) -> Self

Create an SVO2Header instance from encoded base64.

Source code in src/open_svo2/metadata.py

@classmethod
def from_base64(cls, data: str) -> Self:
    """Create an SVO2Header instance from encoded base64."""
    decoded = base64.b64decode(data)
    if len(decoded) != sizeof(cls):
        raise ValueError(
            f"Data length {len(decoded)} does not match SVO2Header "
            f"(requires {sizeof(cls)} bytes).")
    return cls.from_buffer_copy(decoded)

open_svo2.IMUData `dataclass` ¶

Zed IMU data.

Attributes:

Name	Type	Description
`timestamp`	`float64`	IMU measurement timestamp in seconds.
`accel`	`Float32[ndarray, 3]`	Linear acceleration in m/s^2, in the Zed camera coordinate frame, without calibration.
`avel`	`Float32[ndarray, 3]`	Angular velocity in deg/s, in the Zed camera coordinate frame, without calibration.

Source code in src/open_svo2/imu.py

@dataclass
class IMUData:
    """Zed IMU data.

    Attributes:
        timestamp: IMU measurement timestamp in seconds.
        accel: Linear acceleration in m/s^2, in the Zed camera coordinate
            frame, without calibration.
        avel: Angular velocity in deg/s, in the Zed camera coordinate frame,
            without calibration.
    """

    timestamp: np.float64
    accel: Float32[np.ndarray, "3"]
    avel: Float32[np.ndarray, "3"]

    @classmethod
    def from_raw_data(cls, raw_data: bytes) -> "IMUData":
        """Parse raw IMU data from the ZED SDK binary format."""
        timestamp_ns = np.frombuffer(
            raw_data, dtype=np.uint64, count=1, offset=0x010)[0]

        accel = np.frombuffer(
            raw_data, dtype=np.float32, count=3, offset=0x064)
        avel = np.frombuffer(
            raw_data, dtype=np.float32, count=3, offset=0x058)

        return cls(
            timestamp=np.float64(timestamp_ns) / 1e9, accel=accel, avel=avel)

from_raw_data `classmethod` ¶

from_raw_data(raw_data: bytes) -> IMUData

Parse raw IMU data from the ZED SDK binary format.

Source code in src/open_svo2/imu.py

@classmethod
def from_raw_data(cls, raw_data: bytes) -> "IMUData":
    """Parse raw IMU data from the ZED SDK binary format."""
    timestamp_ns = np.frombuffer(
        raw_data, dtype=np.uint64, count=1, offset=0x010)[0]

    accel = np.frombuffer(
        raw_data, dtype=np.float32, count=3, offset=0x064)
    avel = np.frombuffer(
        raw_data, dtype=np.float32, count=3, offset=0x058)

    return cls(
        timestamp=np.float64(timestamp_ns) / 1e9, accel=accel, avel=avel)

open_svo2.Intrinsics `dataclass` ¶

Camera intrinsic parameters using the Brown-Conrady (OpenCV) model.

This dataclass represents the calibration parameters for a single camera in OpenCV-compatible format, ready to be passed directly to OpenCV functions.

The camera_matrix is the camera intrinsic matrix in the form:
```
[[fx,  0, cx],
 [ 0, fy, cy],
 [ 0,  0,  1]]
```
where fx/fy are focal lengths in pixels and cx/cy is the principal point.
The dist_coeffs array contains the distortion coefficients in the order (k1, k2, p1, p2, k3) following OpenCV's standard 5-parameter distortion model:
- k1, k2, k3: Radial distortion coefficients (2nd, 4th, 6th order)
- p1, p2: Tangential distortion coefficients

Attributes:

Name	Type	Description
`camera_matrix`	`Float64[ndarray, '3 3']`	camera intrinsic matrix in OpenCV format.
`dist_coeffs`	`Float64[ndarray, 5]`	distortion coefficients in OpenCV order.

Notes

Compatible with cv2.undistort(), cv2.calibrateCamera(), etc.
Distortion coefficients are dimensionless and resolution-independent
Camera matrix scales linearly with image resolution
The distortion model follows OpenCV convention (Brown-Conrady model)

Source code in src/open_svo2/intrinsics.py

@dataclass
class Intrinsics:
    """Camera intrinsic parameters using the Brown-Conrady (OpenCV) model.

    This dataclass represents the calibration parameters for a single camera in
    OpenCV-compatible format, ready to be passed directly to OpenCV functions.

    - The `camera_matrix` is the camera intrinsic matrix in the form:
        ```
        [[fx,  0, cx],
         [ 0, fy, cy],
         [ 0,  0,  1]]
        ```
        where fx/fy are focal lengths in pixels and cx/cy is the principal
        point.
    - The `dist_coeffs` array contains the distortion coefficients in the order
        (k1, k2, p1, p2, k3) following OpenCV's standard 5-parameter distortion
        model:
        - k1, k2, k3: Radial distortion coefficients (2nd, 4th, 6th order)
        - p1, p2: Tangential distortion coefficients


    Attributes:
        camera_matrix: camera intrinsic matrix in OpenCV format.
        dist_coeffs: distortion coefficients in OpenCV order.

    Notes:
        - Compatible with cv2.undistort(), cv2.calibrateCamera(), etc.
        - Distortion coefficients are dimensionless and resolution-independent
        - Camera matrix scales linearly with image resolution
        - The distortion model follows OpenCV convention (Brown-Conrady model)
    """

    camera_matrix: Float64[np.ndarray, "3 3"]
    dist_coeffs: Float64[np.ndarray, "5"]

    @classmethod
    def from_config(cls, cfg: dict) -> Self:
        """Create Intrinsics from a parsed configuration dictionary.

        Args:
            cfg: Zed SDK sensor configuration dictionary.
        """
        camera_matrix = np.array([
            [cfg["fx"], 0.0, cfg["cx"]],
            [0.0, cfg["fy"], cfg["cy"]],
            [0.0, 0.0, 1.0]
        ], dtype=np.float64,)

        dist_coeffs = np.array(
            [cfg["k1"], cfg["k2"], cfg["p1"], cfg["p2"], cfg["k3"]],
            dtype=np.float64)

        return cls(camera_matrix=camera_matrix, dist_coeffs=dist_coeffs)

from_config `classmethod` ¶

from_config(cfg: dict) -> Self

Create Intrinsics from a parsed configuration dictionary.

Parameters:

Name	Type	Description	Default
`cfg`	`dict`	Zed SDK sensor configuration dictionary.	required

Source code in src/open_svo2/intrinsics.py

@classmethod
def from_config(cls, cfg: dict) -> Self:
    """Create Intrinsics from a parsed configuration dictionary.

    Args:
        cfg: Zed SDK sensor configuration dictionary.
    """
    camera_matrix = np.array([
        [cfg["fx"], 0.0, cfg["cx"]],
        [0.0, cfg["fy"], cfg["cy"]],
        [0.0, 0.0, 1.0]
    ], dtype=np.float64,)

    dist_coeffs = np.array(
        [cfg["k1"], cfg["k2"], cfg["p1"], cfg["p2"], cfg["k3"]],
        dtype=np.float64)

    return cls(camera_matrix=camera_matrix, dist_coeffs=dist_coeffs)

open_svo2.Metadata `dataclass` ¶

SVO2 file metadata extracted from MCAP container.

Attributes:

Name	Type	Description
`imu_frequency`	`float`	IMU sampling frequency in Hz (e.g., 200.0).
`zed_sdk_version`	`str`	Version of the ZED SDK used to create the file.
`calib_acc_matrix1`	`Float32[ndarray, '3 3']`	3x3 float32 matrix for accelerometer calibration.
`calib_acc_matrix2`	`Float32[ndarray, '3 3']`	3x3 float32 matrix for accelerometer calibration.
`calib_gyro_matrix1`	`Float32[ndarray, '3 3']`	3x3 float32 matrix for gyroscope calibration.
`calib_gyro_matrix2`	`Float32[ndarray, '3 3']`	3x3 float32 matrix for gyroscope calibration.
`header`	`Header`	Parsed SVO2Header.
`version`	`str`	SVO2 file format version string (e.g., "2.0.3").
`channels`	`dict[str, int]`	Mapping of topic names to channel IDs in the MCAP file.
`timestamps`	`dict[str, UInt64[ndarray, '?N']]`	Dictionary mapping topic names to arrays of uint64 timestamps (in nanoseconds since epoch) for each sensor reading.

Source code in src/open_svo2/metadata.py

@dataclass
class Metadata:
    """SVO2 file metadata extracted from MCAP container.

    Attributes:
        imu_frequency: IMU sampling frequency in Hz (e.g., 200.0).
        zed_sdk_version: Version of the ZED SDK used to create the file.
        calib_acc_matrix1: 3x3 float32 matrix for accelerometer calibration.
        calib_acc_matrix2: 3x3 float32 matrix for accelerometer calibration.
        calib_gyro_matrix1: 3x3 float32 matrix for gyroscope calibration.
        calib_gyro_matrix2: 3x3 float32 matrix for gyroscope calibration.
        header: Parsed SVO2Header.
        version: SVO2 file format version string (e.g., "2.0.3").
        channels: Mapping of topic names to channel IDs in the MCAP file.
        timestamps: Dictionary mapping topic names to arrays of uint64
            timestamps (in nanoseconds since epoch) for each sensor reading.
    """

    imu_frequency: float
    zed_sdk_version: str
    calib_acc_matrix1: Float32[np.ndarray, "3 3"]
    calib_acc_matrix2: Float32[np.ndarray, "3 3"]
    calib_gyro_matrix1: Float32[np.ndarray, "3 3"]
    calib_gyro_matrix2: Float32[np.ndarray, "3 3"]
    header: Header
    version: str
    channels: dict[str, int]
    timestamps: dict[str, UInt64[np.ndarray, "?N"]]

    @staticmethod
    def _read_json_msg(stream, topic: str = "") -> dict:
        _schema, _channel, msg = next(stream, (None, None, None))
        if msg is None:
            raise ValueError(f"No {topic} message found in the SVO2 file.")
        return json.loads(msg.data)

    @staticmethod
    def _get_raw_data(reader: McapReader):
        footer_stream = reader.iter_messages(topics=["svo_footer"])
        footer = Metadata._read_json_msg(footer_stream, topic="svo_footer")

        header_stream = reader.iter_messages(topics=["svo_header"])
        header = Metadata._read_json_msg(header_stream, topic="svo_header")

        return header, footer

    @classmethod
    def from_mcap(cls, mcap: McapReader | str) -> Self:
        """Extract metadata from the MCAP reader.

        Args:
            mcap: file path to a svo2 mcap file or a `McapReader` handle.
        """
        if isinstance(mcap, str):
            with open(mcap, "rb") as f:
                return cls.from_mcap(make_reader(f))

        summary = mcap.get_summary()
        if summary is None:
            raise ValueError("Failed to read summary from the SVO2 file.")
        summary_short = {v.topic: k for k, v in summary.channels.items()}

        header, footer = cls._get_raw_data(mcap)
        timestamps = {
            k: np.array(v, dtype=np.uint64)
            for k, v in footer.items()}
        decoded_header = Header.from_base64(header.get("header", ""))

        # Parse calibration data (each is 18 float32s = two 3x3 matrices)
        calib_acc_raw = base64.b64decode(header.get("Calib_acc", ""))
        calib_gyro_raw = base64.b64decode(header.get("Calib_gyro", ""))

        acc_floats = np.frombuffer(calib_acc_raw, dtype=np.float32)
        gyro_floats = np.frombuffer(calib_gyro_raw, dtype=np.float32)

        return cls(
            imu_frequency=header.get("imu_frequency_hz", 0.0),
            zed_sdk_version=header.get("zed_sdk_version", "unknown"),
            header=decoded_header,
            version=header.get("version", "unknown"),
            calib_acc_matrix1=acc_floats[:9].reshape(3, 3),
            calib_acc_matrix2=acc_floats[9:].reshape(3, 3),
            calib_gyro_matrix1=gyro_floats[:9].reshape(3, 3),
            calib_gyro_matrix2=gyro_floats[9:].reshape(3, 3),
            timestamps=timestamps,
            channels=summary_short
        )

    def consistency_check(self) -> None:
        """Check parsed metadata for consistency."""
        for channel in self.channels:
            if channel.startswith("Camera"):
                m = re.match(r"Camera_SN(\d+)/(.*)", channel)
                if m is None:
                    logger.warning(
                        f"Channel name has unexpected pattern: {channel}")
                elif int(m.group(1)) != self.header.serial_number:
                    logger.warning(
                        f"Serial number mismatch: channel {channel} "
                        f"vs {self.header.serial_number} (from header)")

consistency_check ¶

consistency_check() -> None

Check parsed metadata for consistency.

Source code in src/open_svo2/metadata.py

def consistency_check(self) -> None:
    """Check parsed metadata for consistency."""
    for channel in self.channels:
        if channel.startswith("Camera"):
            m = re.match(r"Camera_SN(\d+)/(.*)", channel)
            if m is None:
                logger.warning(
                    f"Channel name has unexpected pattern: {channel}")
            elif int(m.group(1)) != self.header.serial_number:
                logger.warning(
                    f"Serial number mismatch: channel {channel} "
                    f"vs {self.header.serial_number} (from header)")

from_mcap `classmethod` ¶

from_mcap(mcap: McapReader | str) -> Self

Extract metadata from the MCAP reader.

Parameters:

Name	Type	Description	Default
`mcap`	`McapReader \| str`	file path to a svo2 mcap file or a `McapReader` handle.	required

Source code in src/open_svo2/metadata.py

@classmethod
def from_mcap(cls, mcap: McapReader | str) -> Self:
    """Extract metadata from the MCAP reader.

    Args:
        mcap: file path to a svo2 mcap file or a `McapReader` handle.
    """
    if isinstance(mcap, str):
        with open(mcap, "rb") as f:
            return cls.from_mcap(make_reader(f))

    summary = mcap.get_summary()
    if summary is None:
        raise ValueError("Failed to read summary from the SVO2 file.")
    summary_short = {v.topic: k for k, v in summary.channels.items()}

    header, footer = cls._get_raw_data(mcap)
    timestamps = {
        k: np.array(v, dtype=np.uint64)
        for k, v in footer.items()}
    decoded_header = Header.from_base64(header.get("header", ""))

    # Parse calibration data (each is 18 float32s = two 3x3 matrices)
    calib_acc_raw = base64.b64decode(header.get("Calib_acc", ""))
    calib_gyro_raw = base64.b64decode(header.get("Calib_gyro", ""))

    acc_floats = np.frombuffer(calib_acc_raw, dtype=np.float32)
    gyro_floats = np.frombuffer(calib_gyro_raw, dtype=np.float32)

    return cls(
        imu_frequency=header.get("imu_frequency_hz", 0.0),
        zed_sdk_version=header.get("zed_sdk_version", "unknown"),
        header=decoded_header,
        version=header.get("version", "unknown"),
        calib_acc_matrix1=acc_floats[:9].reshape(3, 3),
        calib_acc_matrix2=acc_floats[9:].reshape(3, 3),
        calib_gyro_matrix1=gyro_floats[:9].reshape(3, 3),
        calib_gyro_matrix2=gyro_floats[9:].reshape(3, 3),
        timestamps=timestamps,
        channels=summary_short
    )

open_svo2.StereoIntrinsics `dataclass` ¶

Stereo camera pair parameters.

Info

Zed uses a convention where the left camera is transformed relative to the right camera which is considered the reference frame.

Attributes:

Name	Type	Description
`left`	`Intrinsics`	Intrinsics for the left camera.
`right`	`Intrinsics`	Intrinsics for the right camera.
`baseline`	`float`	Horizontal separation between cameras in mm.
`ty`	`float`	Translation offset in Y direction (vertical) in mm.
`tz`	`float`	Translation offset in Z direction (depth) in mm.
`cv`	`float`	Convergence angle in radians (angle at which optical axes converge).
`rx`	`float`	Rotation around X axis (pitch) in radians.
`rz`	`float`	Rotation around Z axis (roll) in radians.

Source code in src/open_svo2/intrinsics.py

@dataclass
class StereoIntrinsics:
    """Stereo camera pair parameters.

    !!! info

        Zed uses a convention where the left camera is transformed relative
        to the right camera which is considered the reference frame.

    Attributes:
        left: Intrinsics for the left camera.
        right: Intrinsics for the right camera.
        baseline: Horizontal separation between cameras in mm.
        ty: Translation offset in Y direction (vertical) in mm.
        tz: Translation offset in Z direction (depth) in mm.
        cv: Convergence angle in radians (angle at which optical axes converge).
        rx: Rotation around X axis (pitch) in radians.
        rz: Rotation around Z axis (roll) in radians.
    """

    left: Intrinsics
    right: Intrinsics
    baseline: float
    ty: float
    tz: float
    cv: float
    rx: float
    rz: float

    @classmethod
    def from_config(
        cls, cfg: dict | str,
        mode: str | None = None, height: int | None = None
    ) -> Self:
        """Parse Zed SDK `sensor.conf` contents.

        Args:
            cfg: Zed SDK sensor configuration dictionary or path to dictionary.
            mode: Camera mode (e.g., `FHD1200|FHD|SVGA` for the Zed X).
            height: Image height in pixels, used to infer mode if mode is not
                provided. Must be one of {1200, 1080, 600} corresponding to
                modes {FHD1200, FHD, SVGA} respectively.
        """
        if isinstance(cfg, str):
            with open(cfg, "r") as f:
                cfg = toml.load(f)

        if mode is None:
            if height is None:
                raise ValueError("Either mode or height must be provided")
            mode = cls.infer_mode(height)

        try:
            left = Intrinsics.from_config(cfg[f"LEFT_CAM_{mode}"])
            right = Intrinsics.from_config(cfg[f"RIGHT_CAM_{mode}"])
        except KeyError as e:
            raise ValueError(
                f"Missing camera configuration for mode '{mode}': {e}") from e

        return cls(
            left=left, right=right,
            baseline=cfg["STEREO"]["Baseline"],
            ty=cfg["STEREO"]["TY"],
            tz=cfg["STEREO"]["TZ"],
            cv=cfg["STEREO"][f"CV_{mode}"],
            rx=cfg["STEREO"][f"RX_{mode}"],
            rz=cfg["STEREO"][f"RZ_{mode}"],
        )

    def as_dict(self) -> dict:
        """Convert StereoIntrinsics to a dictionary format."""
        return {
            "left": {
                "camera_matrix": self.left.camera_matrix.tolist(),
                "dist_coeffs": self.left.dist_coeffs.tolist(),
            },
            "right": {
                "camera_matrix": self.right.camera_matrix.tolist(),
                "dist_coeffs": self.right.dist_coeffs.tolist(),
            },
            "baseline": self.baseline,
            "ty": self.ty,
            "tz": self.tz,
            "cv": self.cv,
            "rx": self.rx,
            "rz": self.rz,
        }

    @staticmethod
    def infer_mode(height: int) -> str:
        """Infer Zed camera mode from image height."""
        if height == 1200:
            return "FHD1200"
        elif height == 1080:
            return "FHD"
        elif height == 600:
            return "SVGA"
        else:
            raise ValueError(f"Unrecognized image height: {height}")

as_dict ¶

as_dict() -> dict

Convert StereoIntrinsics to a dictionary format.

Source code in src/open_svo2/intrinsics.py

def as_dict(self) -> dict:
    """Convert StereoIntrinsics to a dictionary format."""
    return {
        "left": {
            "camera_matrix": self.left.camera_matrix.tolist(),
            "dist_coeffs": self.left.dist_coeffs.tolist(),
        },
        "right": {
            "camera_matrix": self.right.camera_matrix.tolist(),
            "dist_coeffs": self.right.dist_coeffs.tolist(),
        },
        "baseline": self.baseline,
        "ty": self.ty,
        "tz": self.tz,
        "cv": self.cv,
        "rx": self.rx,
        "rz": self.rz,
    }

from_config `classmethod` ¶

from_config(
    cfg: dict | str, mode: str | None = None, height: int | None = None
) -> Self

Parse Zed SDK sensor.conf contents.

Parameters:

Name	Type	Description	Default
`cfg`	`dict \| str`	Zed SDK sensor configuration dictionary or path to dictionary.	required
`mode`	`str \| None`	Camera mode (e.g., `FHD1200\|FHD\|SVGA` for the Zed X).	`None`
`height`	`int \| None`	Image height in pixels, used to infer mode if mode is not provided. Must be one of {1200, 1080, 600} corresponding to modes {FHD1200, FHD, SVGA} respectively.	`None`

Source code in src/open_svo2/intrinsics.py

@classmethod
def from_config(
    cls, cfg: dict | str,
    mode: str | None = None, height: int | None = None
) -> Self:
    """Parse Zed SDK `sensor.conf` contents.

    Args:
        cfg: Zed SDK sensor configuration dictionary or path to dictionary.
        mode: Camera mode (e.g., `FHD1200|FHD|SVGA` for the Zed X).
        height: Image height in pixels, used to infer mode if mode is not
            provided. Must be one of {1200, 1080, 600} corresponding to
            modes {FHD1200, FHD, SVGA} respectively.
    """
    if isinstance(cfg, str):
        with open(cfg, "r") as f:
            cfg = toml.load(f)

    if mode is None:
        if height is None:
            raise ValueError("Either mode or height must be provided")
        mode = cls.infer_mode(height)

    try:
        left = Intrinsics.from_config(cfg[f"LEFT_CAM_{mode}"])
        right = Intrinsics.from_config(cfg[f"RIGHT_CAM_{mode}"])
    except KeyError as e:
        raise ValueError(
            f"Missing camera configuration for mode '{mode}': {e}") from e

    return cls(
        left=left, right=right,
        baseline=cfg["STEREO"]["Baseline"],
        ty=cfg["STEREO"]["TY"],
        tz=cfg["STEREO"]["TZ"],
        cv=cfg["STEREO"][f"CV_{mode}"],
        rx=cfg["STEREO"][f"RX_{mode}"],
        rz=cfg["STEREO"][f"RZ_{mode}"],
    )

infer_mode `staticmethod` ¶

infer_mode(height: int) -> str

Infer Zed camera mode from image height.

Source code in src/open_svo2/intrinsics.py

@staticmethod
def infer_mode(height: int) -> str:
    """Infer Zed camera mode from image height."""
    if height == 1200:
        return "FHD1200"
    elif height == 1080:
        return "FHD"
    elif height == 600:
        return "SVGA"
    else:
        raise ValueError(f"Unrecognized image height: {height}")

open_svo2.imu_from_svo2 ¶

imu_from_svo2(
    mcap: McapReader | str, metadata: Metadata | None = None
) -> dict[str, ndarray]

Extract raw IMU data from SVO2 MCAP into a .npz file.

Parameters:

Name	Type	Description	Default
`mcap`	`McapReader \| str`	file path to a svo2 mcap file or a `McapReader` handle.	required
`metadata`	`Metadata \| None`	Optional pre-parsed metadata. If not provided, it will be extracted from the MCAP reader.	`None`

Returns:

Type	Description
`dict[str, ndarray]`	A dictionary containing `timestamps`, `angular_velocity`, and `linear_acceleration` arrays.

Source code in src/open_svo2/convert.py

def imu_from_svo2(
    mcap: McapReader | str, metadata: Metadata | None = None
) -> dict[str, np.ndarray]:
    """Extract raw IMU data from SVO2 MCAP into a .npz file.

    Args:
        mcap: file path to a svo2 mcap file or a `McapReader` handle.
        metadata: Optional pre-parsed metadata. If not provided, it will be
            extracted from the MCAP reader.

    Returns:
        A dictionary containing `timestamps`, `angular_velocity`, and
            `linear_acceleration` arrays.
    """
    if isinstance(mcap, str):
        with open(mcap, "rb") as f:
            return imu_from_svo2(make_reader(f), metadata=metadata)
    if metadata is None:
        metadata = Metadata.from_mcap(mcap)

    topic = f"Camera_SN{metadata.header.serial_number}/sensors"
    stream_iter = mcap.iter_messages(topics=[topic])

    timestamps = []
    angular_velocity = []
    linear_acceleration = []

    for _, _, msg in stream_iter:
        raw = base64.b64decode(json.loads(msg.data)["data"])
        imu = IMUData.from_raw_data(raw)
        timestamps.append(imu.timestamp)
        angular_velocity.append(imu.avel)
        linear_acceleration.append(imu.accel)

    return {
        "timestamps": np.array(timestamps, dtype=np.float64),
        "angular_velocity": np.array(angular_velocity, dtype=np.float32),
        "linear_acceleration": np.array(linear_acceleration, dtype=np.float32)
    }

open_svo2.mp4_from_svo2 ¶

mp4_from_svo2(
    mcap: McapReader | str, output: str, metadata: Metadata | None = None
) -> UInt32[ndarray, N]

Extract video stream from SVO2 MCAP into a standard MP4 container.

Parameters:

Name	Type	Description	Default
`mcap`	`McapReader \| str`	file path to a svo2 mcap file or a `McapReader` handle.	required
`output`	`str`	file path to the output MP4 file.	required
`metadata`	`Metadata \| None`	Optional pre-parsed metadata. If not provided, it will be extracted from the MCAP reader.	`None`

Returns:

Type	Description
`UInt32[ndarray, N]`	Index of the last keyframe, as recorded by the frame footer.

Source code in src/open_svo2/convert.py

def mp4_from_svo2(
    mcap: McapReader | str, output: str, metadata: Metadata | None = None
) -> UInt32[np.ndarray, "N"]:
    """Extract video stream from SVO2 MCAP into a standard MP4 container.

    Args:
        mcap: file path to a svo2 mcap file or a `McapReader` handle.
        output: file path to the output MP4 file.
        metadata: Optional pre-parsed metadata. If not provided, it will be
            extracted from the MCAP reader.

    Returns:
        Index of the last keyframe, as recorded by the frame footer.
    """
    if isinstance(mcap, str):
        with open(mcap, "rb") as f:
            return mp4_from_svo2(make_reader(f), output)
    if metadata is None:
        metadata = Metadata.from_mcap(mcap)

    stream_iter = mcap.iter_messages(
        topics=[f"Camera_SN{metadata.header.serial_number}/side_by_side"])

    try:
        first_msg = next(stream_iter)
    except StopIteration:
        return np.zeros((), dtype=np.uint32)

    _, _, msg = first_msg
    _, frame_size = struct.unpack("<II", msg.data[:8])
    payload = msg.data[8 : 8 + frame_size]
    codec_name = detect_codec(payload)
    start_ts = FrameFooter.from_buffer_copy(
        msg.data[8 + frame_size:]).timestamp

    timestamps = []
    keyframes = []
    with av.open(output, mode='w', format='mp4') as container:
        stream = container.add_stream(codec_name, rate=metadata.header.fps)
        # width is just one camera
        stream.width = metadata.header.width * 2
        stream.height = metadata.header.height
        stream.pix_fmt = "yuv420p"
        stream.time_base = Fraction(1, 1_000_000)

        def message_generator():
            yield first_msg
            yield from stream_iter

        last_pts = -1
        for i, (_, _, msg) in enumerate(message_generator()):
            _, size = struct.unpack("<II", msg.data[:8])
            payload = msg.data[8 : 8 + size]
            footer = FrameFooter.from_buffer_copy(msg.data[8 + size:])
            timestamps.append(footer.timestamp)
            keyframes.append(footer.last_keyframe_index)

            packet = av.Packet(payload)
            pts_us = int(footer.timestamp - start_ts) // 1000

            # Enforce strict monotonicity for MP4
            if pts_us <= last_pts:
                logger.warning(
                    f"Non-monotonic timestamp at frame {i}: "
                    f"{pts_us} <= {last_pts}. Correcting.")
                pts_us = last_pts + 1

            last_pts = pts_us
            packet.pts = pts_us
            packet.stream = stream

            container.mux(packet)

    container.close()

    timestamps = np.array(timestamps, dtype=np.uint64)
    timestamps_meta = metadata.timestamps[
        f"Camera_SN{metadata.header.serial_number}/side_by_side"]
    _check_timestamps(timestamps_meta, timestamps)

    return np.array(keyframes, dtype=np.uint32)

open_svo2.raw_from_svo2 ¶

raw_from_svo2(
    mcap: McapReader | str, output: str, metadata: Metadata | None = None
) -> tuple[UInt64[ndarray, N + 1], Float64[ndarray, N], UInt32[ndarray, N]]

Extract raw video frames from SVO2 MCAP into a binary file.

Raw frames are concatenated to the output file; this should be readable with ffmpeg.

Parameters:

Name	Type	Description	Default
`mcap`	`McapReader \| str`	file path to a svo2 mcap file or a `McapReader` handle.	required
`output`	`str`	file path to the output file.	required
`metadata`	`Metadata \| None`	Optional pre-parsed metadata. If not provided, it will be extracted from the MCAP reader.	`None`

Returns:

Type	Description
`UInt64[ndarray, N + 1]`	Byte offsets of frame boundaries.
`Float64[ndarray, N]`	Timestamps in seconds (Float64).
`UInt32[ndarray, N]`	Index of the last keyframe, as recorded by the frame footer.

Source code in src/open_svo2/convert.py

def raw_from_svo2(
    mcap: McapReader | str, output: str, metadata: Metadata | None = None
) -> tuple[
    UInt64[np.ndarray, "N+1"],
    Float64[np.ndarray, "N"],
    UInt32[np.ndarray, "N"]
]:
    """Extract raw video frames from SVO2 MCAP into a binary file.

    Raw frames are concatenated to the output file; this should be readable
    with ffmpeg.

    Args:
        mcap: file path to a svo2 mcap file or a `McapReader` handle.
        output: file path to the output file.
        metadata: Optional pre-parsed metadata. If not provided, it will be
            extracted from the MCAP reader.

    Returns:
        Byte offsets of frame boundaries.
        Timestamps in seconds (Float64).
        Index of the last keyframe, as recorded by the frame footer.

    """
    if isinstance(mcap, str):
        with open(mcap, "rb") as f:
            return raw_from_svo2(make_reader(f), output, metadata)
    if metadata is None:
        metadata = Metadata.from_mcap(mcap)

    stream_iter = mcap.iter_messages(
        topics=[f"Camera_SN{metadata.header.serial_number}/side_by_side"])

    offsets = []
    timestamps = []
    keyframes = []
    byte_offset = 0

    with open(output, "wb") as f:
        for _, _, msg in stream_iter:
            _, size = struct.unpack("<II", msg.data[:8])
            payload = msg.data[8 : 8 + size]
            footer = FrameFooter.from_buffer_copy(msg.data[8 + size:])

            offsets.append(byte_offset)
            timestamps.append(footer.timestamp)
            keyframes.append(footer.last_keyframe_index)

            f.write(payload)
            byte_offset += len(payload)

    timestamps_ns = np.array(timestamps, dtype=np.uint64)
    timestamps_meta = metadata.timestamps[
        f"Camera_SN{metadata.header.serial_number}/side_by_side"]
    _check_timestamps(timestamps_meta, timestamps_ns)

    return (
        np.array(offsets, dtype=np.uint64),
        timestamps_ns / 1e9,
        np.array(keyframes, dtype=np.uint32),
    )

open_svo2 ¶

open_svo2.FrameFooter ¶

open_svo2.Header ¶

from_base64 classmethod ¶

open_svo2.IMUData dataclass ¶

from_raw_data classmethod ¶

open_svo2.Intrinsics dataclass ¶

from_config classmethod ¶

open_svo2.Metadata dataclass ¶

consistency_check ¶

from_mcap classmethod ¶

open_svo2.StereoIntrinsics dataclass ¶

as_dict ¶

from_config classmethod ¶

infer_mode staticmethod ¶

open_svo2.imu_from_svo2 ¶

open_svo2.mp4_from_svo2 ¶

open_svo2.raw_from_svo2 ¶

from_base64 `classmethod` ¶

open_svo2.IMUData `dataclass` ¶

from_raw_data `classmethod` ¶

open_svo2.Intrinsics `dataclass` ¶

from_config `classmethod` ¶

open_svo2.Metadata `dataclass` ¶

from_mcap `classmethod` ¶

open_svo2.StereoIntrinsics `dataclass` ¶

from_config `classmethod` ¶

infer_mode `staticmethod` ¶