Cameras and Sensors#

DexSuite can render camera observations directly from the simulator.

You can use cameras for:

  • Training vision policies (RGB, depth, segmentation, normals)

  • Debugging (save images, inspect viewpoints)

  • Teleoperation (wrist cameras are often the easiest view)

../_images/placeholder_env.svg

Where cameras show up in observations#

The environment observation has two top-level keys:

  • obs["state"]: robot + task state (torch tensors)

  • obs["cameras"]: camera outputs (torch tensors)

Camera outputs are nested by camera name, then by modality:

front_rgb = obs["cameras"]["front"]["rgb"]

If you disable cameras (cameras=None), then obs["cameras"] is an empty dict.

Quickstart (flat API)#

Enable a front static camera and a wrist camera:

import dexsuite as ds

env = ds.make(
    "lift",
    manipulator="franka",
    gripper="robotiq",
    arm_control="osc_pose",
    gripper_control="joint_position",
    cameras=("front", "wrist"),
    modalities=("rgb", "depth"),
    render_mode=None,
)

obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

rgb = obs["cameras"]["front"]["rgb"]
depth = obs["cameras"]["front"]["depth"]

env.close()

Default behavior#

If you do not pass cameras=... to ds.make, DexSuite enables:

  • ("front", "wrist") cameras

  • ("rgb",) modality

Disable all cameras by passing cameras=None.

Camera names and presets#

Static camera presets live in:

  • Dexsuite/dexsuite/config/env_configs/cameras.yaml under static:

The commonly used ones are:

  • front

  • overhead

  • left_side and right_side

  • left_angled_front and right_angled_front

  • left_angled_back and right_angled_back

Dynamic cameras#

The flat API supports a built-in dynamic wrist camera name:

  • wrist: a camera attached to the gripper root link

For bimanual robots, wrist expands to:

  • left_wrist

  • right_wrist

The offsets for the wrist camera come from:

  • Dexsuite/dexsuite/config/env_configs/cameras.yaml under dynamic: wrist_cam

DexSuite picks the best available offsets for your gripper (or for integrated manipulators).

Modalities#

Modalities are selected with modalities=(...).

Supported modalities:

  • rgb (required)

  • depth

  • segmentation

  • normal

Shapes and dtypes#

DexSuite returns the raw Genesis camera outputs as torch tensors.

Let:

  • B = n_envs

  • H, W be the camera image height and width

Shapes:

  • Single env (B=1): - rgb and normal: (H, W, 3) - depth and segmentation: (H, W)

  • Batched (B>1): - rgb and normal: (B, H, W, 3) - depth and segmentation: (B, H, W)

Dtypes:

  • rgb: uint8 in [0, 255]

  • depth: float32 in meters (non-negative)

  • segmentation: integer IDs (int32)

  • normal: float32 in [-1, 1]

Custom cameras (component API)#

Use the component API when you want full control over camera placement and resolution.

Static camera example:

import dexsuite as ds
from dexsuite.options import CamerasOptions, StaticCamOptions

cameras = CamerasOptions(
    static={
        "my_front": StaticCamOptions(
            pos=(1.2, 0.0, 0.6),
            lookat=(0.4, 0.0, 0.2),
            fov=65.0,
            res=(320, 240),
        ),
    },
    dynamic={},
    modalities=("rgb",),
)

env = ds.make(
    "lift",
    manipulator="franka",
    gripper="robotiq",
    arm_control="osc_pose",
    gripper_control="joint_position",
    cameras=cameras,
    render_mode=None,
)

obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
env.close()

Dynamic camera example (custom wrist offset):

import dexsuite as ds
from dexsuite.options import CamerasOptions, DynamicCamOptions

cameras = CamerasOptions(
    static={},
    dynamic={
        "wrist": DynamicCamOptions(
            pos_offset=(0.00, 0.10, -0.03),
            quat_offset=(1.0, 0.0, 0.0, 0.0),
            res=(224, 224),
        ),
    },
    modalities=("rgb",),
)

env = ds.make(
    "lift",
    manipulator="franka",
    gripper="robotiq",
    arm_control="osc_pose",
    gripper_control="joint_position",
    cameras=cameras,
    render_mode=None,
)

obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
env.close()

Performance tips#

Camera rendering is often the largest cost in the simulation loop.

Common ways to keep things fast:

  • Disable cameras during pure state-based training: pass cameras=None.

  • Keep resolutions small (for example 224x224) when running large n_envs.

  • Prefer fewer cameras over many cameras when you are running in parallel.