Cameras and Sensors#

DexSuite can render camera observations directly from the simulator.

You can use cameras for:

Training vision policies (RGB, depth, segmentation, normals)
Debugging (save images, inspect viewpoints)
Teleoperation (wrist cameras are often the easiest view)

Where cameras show up in observations#

The environment observation has two top-level keys:

obs["state"]: robot + task state (torch tensors)
obs["cameras"]: camera outputs (torch tensors)

Camera outputs are nested by camera name, then by modality:

front_rgb = obs["cameras"]["front"]["rgb"]

If you disable cameras (cameras=None), then obs["cameras"] is an empty dict.

Quickstart (flat API)#

Enable a front static camera and a wrist camera:

import dexsuite as ds

env = ds.make(
    "lift",
    manipulator="franka",
    gripper="robotiq",
    arm_control="osc_pose",
    gripper_control="joint_position",
    cameras=("front", "wrist"),
    modalities=("rgb", "depth"),
    render_mode=None,
)

obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

rgb = obs["cameras"]["front"]["rgb"]
depth = obs["cameras"]["front"]["depth"]

env.close()

Default behavior#

If you do not pass cameras=... to ds.make, DexSuite enables:

("front", "wrist") cameras
("rgb",) modality

Disable all cameras by passing cameras=None.

Camera names and presets#

Static camera presets live in:

Dexsuite/dexsuite/config/env_configs/cameras.yaml under static:

The commonly used ones are:

front
overhead
left_side and right_side
left_angled_front and right_angled_front
left_angled_back and right_angled_back

Dynamic cameras#

The flat API supports a built-in dynamic wrist camera name:

wrist: a camera attached to the gripper root link

For bimanual robots, wrist expands to:

left_wrist
right_wrist

The offsets for the wrist camera come from:

Dexsuite/dexsuite/config/env_configs/cameras.yaml under dynamic: wrist_cam

DexSuite picks the best available offsets for your gripper (or for integrated manipulators).

Modalities#

Modalities are selected with modalities=(...).

Supported modalities:

rgb (required)
depth
segmentation
normal

Shapes and dtypes#

DexSuite returns the raw Genesis camera outputs as torch tensors.

Let:

B = n_envs
H, W be the camera image height and width

Shapes:

Single env (B=1): - rgb and normal: (H, W, 3) - depth and segmentation: (H, W)
Batched (B>1): - rgb and normal: (B, H, W, 3) - depth and segmentation: (B, H, W)

Dtypes:

rgb: uint8 in [0, 255]
depth: float32 in meters (non-negative)
segmentation: integer IDs (int32)
normal: float32 in [-1, 1]

Custom cameras (component API)#

Use the component API when you want full control over camera placement and resolution.

Static camera example:

import dexsuite as ds
from dexsuite.options import CamerasOptions, StaticCamOptions

cameras = CamerasOptions(
    static={
        "my_front": StaticCamOptions(
            pos=(1.2, 0.0, 0.6),
            lookat=(0.4, 0.0, 0.2),
            fov=65.0,
            res=(320, 240),
        ),
    },
    dynamic={},
    modalities=("rgb",),
)

env = ds.make(
    "lift",
    manipulator="franka",
    gripper="robotiq",
    arm_control="osc_pose",
    gripper_control="joint_position",
    cameras=cameras,
    render_mode=None,
)

obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
env.close()

Dynamic camera example (custom wrist offset):

import dexsuite as ds
from dexsuite.options import CamerasOptions, DynamicCamOptions

cameras = CamerasOptions(
    static={},
    dynamic={
        "wrist": DynamicCamOptions(
            pos_offset=(0.00, 0.10, -0.03),
            quat_offset=(1.0, 0.0, 0.0, 0.0),
            res=(224, 224),
        ),
    },
    modalities=("rgb",),
)

env = ds.make(
    "lift",
    manipulator="franka",
    gripper="robotiq",
    arm_control="osc_pose",
    gripper_control="joint_position",
    cameras=cameras,
    render_mode=None,
)

obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
env.close()

Performance tips#

Camera rendering is often the largest cost in the simulation loop.

Common ways to keep things fast:

Disable cameras during pure state-based training: pass cameras=None.
Keep resolutions small (for example 224x224) when running large n_envs.
Prefer fewer cameras over many cameras when you are running in parallel.