Cameras and Sensors =================== DexSuite can render camera observations directly from the simulator. You can use cameras for: - Training vision policies (RGB, depth, segmentation, normals) - Debugging (save images, inspect viewpoints) - Teleoperation (wrist cameras are often the easiest view) .. image:: ../_static/placeholder_env.svg :width: 720 Where cameras show up in observations ---------------------------------------- The environment observation has two top-level keys: - ``obs["state"]``: robot + task state (torch tensors) - ``obs["cameras"]``: camera outputs (torch tensors) Camera outputs are nested by camera name, then by modality: .. code-block:: python front_rgb = obs["cameras"]["front"]["rgb"] If you disable cameras (``cameras=None``), then ``obs["cameras"]`` is an empty dict. Quickstart (flat API) --------------------- Enable a front static camera and a wrist camera: .. code-block:: python import dexsuite as ds env = ds.make( "lift", manipulator="franka", gripper="robotiq", arm_control="osc_pose", gripper_control="joint_position", cameras=("front", "wrist"), modalities=("rgb", "depth"), render_mode=None, ) obs, info = env.reset() obs, reward, terminated, truncated, info = env.step(env.action_space.sample()) rgb = obs["cameras"]["front"]["rgb"] depth = obs["cameras"]["front"]["depth"] env.close() Default behavior ~~~~~~~~~~~~~~~~ If you do not pass ``cameras=...`` to ``ds.make``, DexSuite enables: - ``("front", "wrist")`` cameras - ``("rgb",)`` modality Disable all cameras by passing ``cameras=None``. Camera names and presets ------------------------ Static camera presets live in: - ``Dexsuite/dexsuite/config/env_configs/cameras.yaml`` under ``static:`` The commonly used ones are: - ``front`` - ``overhead`` - ``left_side`` and ``right_side`` - ``left_angled_front`` and ``right_angled_front`` - ``left_angled_back`` and ``right_angled_back`` Dynamic cameras ~~~~~~~~~~~~~~~ The flat API supports a built-in dynamic wrist camera name: - ``wrist``: a camera attached to the gripper root link For bimanual robots, ``wrist`` expands to: - ``left_wrist`` - ``right_wrist`` The offsets for the wrist camera come from: - ``Dexsuite/dexsuite/config/env_configs/cameras.yaml`` under ``dynamic: wrist_cam`` DexSuite picks the best available offsets for your gripper (or for integrated manipulators). Modalities ---------- Modalities are selected with ``modalities=(...)``. Supported modalities: - ``rgb`` (required) - ``depth`` - ``segmentation`` - ``normal`` Shapes and dtypes ----------------- DexSuite returns the raw Genesis camera outputs as torch tensors. Let: - ``B = n_envs`` - ``H, W`` be the camera image height and width Shapes: - Single env (``B=1``): - ``rgb`` and ``normal``: ``(H, W, 3)`` - ``depth`` and ``segmentation``: ``(H, W)`` - Batched (``B>1``): - ``rgb`` and ``normal``: ``(B, H, W, 3)`` - ``depth`` and ``segmentation``: ``(B, H, W)`` Dtypes: - ``rgb``: ``uint8`` in ``[0, 255]`` - ``depth``: ``float32`` in meters (non-negative) - ``segmentation``: integer IDs (``int32``) - ``normal``: ``float32`` in ``[-1, 1]`` Custom cameras (component API) ------------------------------ Use the component API when you want full control over camera placement and resolution. Static camera example: .. code-block:: python import dexsuite as ds from dexsuite.options import CamerasOptions, StaticCamOptions cameras = CamerasOptions( static={ "my_front": StaticCamOptions( pos=(1.2, 0.0, 0.6), lookat=(0.4, 0.0, 0.2), fov=65.0, res=(320, 240), ), }, dynamic={}, modalities=("rgb",), ) env = ds.make( "lift", manipulator="franka", gripper="robotiq", arm_control="osc_pose", gripper_control="joint_position", cameras=cameras, render_mode=None, ) obs, info = env.reset() obs, reward, terminated, truncated, info = env.step(env.action_space.sample()) env.close() Dynamic camera example (custom wrist offset): .. code-block:: python import dexsuite as ds from dexsuite.options import CamerasOptions, DynamicCamOptions cameras = CamerasOptions( static={}, dynamic={ "wrist": DynamicCamOptions( pos_offset=(0.00, 0.10, -0.03), quat_offset=(1.0, 0.0, 0.0, 0.0), res=(224, 224), ), }, modalities=("rgb",), ) env = ds.make( "lift", manipulator="franka", gripper="robotiq", arm_control="osc_pose", gripper_control="joint_position", cameras=cameras, render_mode=None, ) obs, info = env.reset() obs, reward, terminated, truncated, info = env.step(env.action_space.sample()) env.close() Performance tips ---------------- Camera rendering is often the largest cost in the simulation loop. Common ways to keep things fast: - Disable cameras during pure state-based training: pass ``cameras=None``. - Keep resolutions small (for example 224x224) when running large ``n_envs``. - Prefer fewer cameras over many cameras when you are running in parallel. Related pages ------------- - Environment builders (configure cameras interactively): :doc:`../getting_started/environment_builders` - Teleoperation overview: :doc:`teleoperation`