Cameras and Sensors#
DexSuite can render camera observations directly from the simulator.
You can use cameras for:
Training vision policies (RGB, depth, segmentation, normals)
Debugging (save images, inspect viewpoints)
Teleoperation (wrist cameras are often the easiest view)
Where cameras show up in observations#
The environment observation has two top-level keys:
obs["state"]: robot + task state (torch tensors)obs["cameras"]: camera outputs (torch tensors)
Camera outputs are nested by camera name, then by modality:
front_rgb = obs["cameras"]["front"]["rgb"]
If you disable cameras (cameras=None), then obs["cameras"] is an empty dict.
Quickstart (flat API)#
Enable a front static camera and a wrist camera:
import dexsuite as ds
env = ds.make(
"lift",
manipulator="franka",
gripper="robotiq",
arm_control="osc_pose",
gripper_control="joint_position",
cameras=("front", "wrist"),
modalities=("rgb", "depth"),
render_mode=None,
)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
rgb = obs["cameras"]["front"]["rgb"]
depth = obs["cameras"]["front"]["depth"]
env.close()
Default behavior#
If you do not pass cameras=... to ds.make, DexSuite enables:
("front", "wrist")cameras("rgb",)modality
Disable all cameras by passing cameras=None.
Camera names and presets#
Static camera presets live in:
Dexsuite/dexsuite/config/env_configs/cameras.yamlunderstatic:
The commonly used ones are:
frontoverheadleft_sideandright_sideleft_angled_frontandright_angled_frontleft_angled_backandright_angled_back
Dynamic cameras#
The flat API supports a built-in dynamic wrist camera name:
wrist: a camera attached to the gripper root link
For bimanual robots, wrist expands to:
left_wristright_wrist
The offsets for the wrist camera come from:
Dexsuite/dexsuite/config/env_configs/cameras.yamlunderdynamic: wrist_cam
DexSuite picks the best available offsets for your gripper (or for integrated manipulators).
Modalities#
Modalities are selected with modalities=(...).
Supported modalities:
rgb(required)depthsegmentationnormal
Shapes and dtypes#
DexSuite returns the raw Genesis camera outputs as torch tensors.
Let:
B = n_envsH, Wbe the camera image height and width
Shapes:
Single env (
B=1): -rgbandnormal:(H, W, 3)-depthandsegmentation:(H, W)Batched (
B>1): -rgbandnormal:(B, H, W, 3)-depthandsegmentation:(B, H, W)
Dtypes:
rgb:uint8in[0, 255]depth:float32in meters (non-negative)segmentation: integer IDs (int32)normal:float32in[-1, 1]
Custom cameras (component API)#
Use the component API when you want full control over camera placement and resolution.
Static camera example:
import dexsuite as ds
from dexsuite.options import CamerasOptions, StaticCamOptions
cameras = CamerasOptions(
static={
"my_front": StaticCamOptions(
pos=(1.2, 0.0, 0.6),
lookat=(0.4, 0.0, 0.2),
fov=65.0,
res=(320, 240),
),
},
dynamic={},
modalities=("rgb",),
)
env = ds.make(
"lift",
manipulator="franka",
gripper="robotiq",
arm_control="osc_pose",
gripper_control="joint_position",
cameras=cameras,
render_mode=None,
)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
env.close()
Dynamic camera example (custom wrist offset):
import dexsuite as ds
from dexsuite.options import CamerasOptions, DynamicCamOptions
cameras = CamerasOptions(
static={},
dynamic={
"wrist": DynamicCamOptions(
pos_offset=(0.00, 0.10, -0.03),
quat_offset=(1.0, 0.0, 0.0, 0.0),
res=(224, 224),
),
},
modalities=("rgb",),
)
env = ds.make(
"lift",
manipulator="franka",
gripper="robotiq",
arm_control="osc_pose",
gripper_control="joint_position",
cameras=cameras,
render_mode=None,
)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
env.close()
Performance tips#
Camera rendering is often the largest cost in the simulation loop.
Common ways to keep things fast:
Disable cameras during pure state-based training: pass
cameras=None.Keep resolutions small (for example 224x224) when running large
n_envs.Prefer fewer cameras over many cameras when you are running in parallel.