Data Collection

The SO-101 is one of the most common data collection arms in the LeRobot community. This guide covers everything from hardware connections to recording episodes and pushing your dataset to HuggingFace.

Before Recording

Hardware Setup for Recording

The SO-101 data collection setup is simpler than CAN-bus arms — everything runs over USB. Here is what to connect.

📷

Workspace Camera

USB webcam pointed at the workspace from above or from the side. Mount at a fixed position — do not move it between episodes. Verify: ls /dev/video*

🔒

Wrist Camera (optional)

Small USB camera mounted on the end-effector. Adds first-person view. A second USB port required. LeRobot supports multi-camera sync.

🔌

Follower Arm (USB serial)

The arm that executes actions. Connect via USB servo controller. Verify port with ls /dev/ttyUSB*

👤

Leader Arm (USB serial)

A second SO-101 in compliance mode — move it with your hand to drive the follower. Connect on a second USB port. Gives highest quality demonstrations.

No ROS, no kernel drivers: Unlike CAN-bus setups, the SO-101 data collection stack runs entirely over USB serial. You can record on a MacBook or Windows laptop — no Ubuntu required.

Recording Workflow

Step-by-Step Recording Workflow

1

Verify calibration is current

Run calibration before each new session if the arm was disassembled or moved. See Software → Calibration.

python -m lerobot.scripts.control_robot \
  --robot.type=so101 --robot.port=/dev/ttyUSB0 \
  --control.type=calibrate
2

Verify camera feeds

python -c "
import cv2
for i in range(4):
    cap = cv2.VideoCapture(i)
    if cap.isOpened():
        print(f'Camera {i}: OK')
    cap.release()
"
3

Move arm to home position

Place the follower arm at the home position (fully extended, end-effector pointing forward). Reset leader arm to the same position before starting teleop.

4

Set up the task scene

Place objects in their consistent starting positions. Mark the table if needed — consistent initial conditions are critical for policy generalization.

5

Start LeRobot recording

python -m lerobot.scripts.control_robot \
  --robot.type=so101 \
  --robot.port=/dev/ttyUSB1 \
  --robot.leader_arms.main.type=so101 \
  --robot.leader_arms.main.port=/dev/ttyUSB0 \
  --control.type=record \
  --control.fps=30 \
  --control.repo_id=your-username/so101-pick-place-v1 \
  --control.num_episodes=50 \
  --control.single_task="Pick the red block and place it in the bin" \
  --control.warmup_time_s=3 \
  --control.reset_time_s=8

LeRobot prompts before each episode. During warmup you can adjust your grip on the leader arm before recording starts.

6

Review and replay episodes

python -m lerobot.scripts.visualize_dataset \
  --repo_id=your-username/so101-pick-place-v1 \
  --episode_index=0

Delete poor-quality episodes immediately. Check for dropped camera frames, erratic joint velocities, or incomplete task execution.

7

Push to HuggingFace Hub

huggingface-cli login
python -m lerobot.scripts.push_dataset_to_hub \
  --repo_id=your-username/so101-pick-place-v1
Dataset Format

SO-101 Dataset Format

The SO-101 uses the standard LeRobot / HuggingFace dataset format — identical schema to OpenArm, Koch, and other LeRobot arms. This means your datasets are directly compatible with the full LeRobot training ecosystem.

Episode data schema

Fields in each episode Parquet file
observation.state float32[6] Joint positions in degrees (6 DOF — 5 joints + gripper)
observation.images.* video path Reference to frame in MP4 video file per camera
action float32[6] Target joint positions from leader arm
timestamp float64 Unix timestamp in seconds
frame_index int64 Frame number within episode
episode_index int64 Episode number within dataset
next.done bool True on the last frame of each episode
task_index int64 Index into task description lookup table

SO-101 specific notes

The SO-101 action space uses joint positions in degrees (Feetech servo units), not radians. When mixing SO-101 and OpenArm datasets for cross-platform training, normalize both to radians first using the stats in meta/stats.json.

Quality Assurance

Quality Checklist for Collected Data

Run through this after each recording session before pushing to the Hub.

  • 1
    Episode lengths are consistent Outlier-length episodes usually mean the operator paused, the gripper slipped, or recording was interrupted. Keep within ±30% of median length.
  • 2
    No servo velocity spikes The STS3215 servos have limited bandwidth — sudden velocity spikes in observation.state indicate a serial bus dropout. Delete those episodes.
  • 3
    Camera frames are aligned with joint data Check that camera timestamps and joint timestamps are within 20ms of each other. USB serial latency can cause drift over long recordings. Re-sync cameras every 100 episodes.
  • 4
    Leader arm tracking was smooth If the follower lagged noticeably during recording (due to USB serial latency), the action labels will be time-shifted from observations. Replay to check.
  • 5
    Task scene was consistent at start of each episode Objects in the same position and orientation. The SO-101's lower repeatability (vs CAN arms) makes this especially important — variance in initial conditions hurts policy training.
  • 6
    Gripper open/close is clearly recorded The SO-101 gripper state is joint 6. Verify that grasp events show a clear joint position transition (open → closed) in the data, not a gradual drift.
Next Step

Training a Policy from Your Dataset

Once your dataset passes quality checks, train ACT or Diffusion Policy with LeRobot.

Train ACT

python -m lerobot.scripts.train \
  --policy.type=act \
  --dataset.repo_id=your-username/so101-pick-place-v1 \
  --policy.chunk_size=100 \
  --training.num_epochs=5000 \
  --output_dir=outputs/act-so101-pick-place

Train Diffusion Policy

python -m lerobot.scripts.train \
  --policy.type=diffusion \
  --dataset.repo_id=your-username/so101-pick-place-v1 \
  --training.num_epochs=8000 \
  --output_dir=outputs/diffusion-so101-pick-place

Community datasets: The SO-101 has one of the largest community dataset collections in the LeRobot ecosystem. Before collecting your own data, check HuggingFace Hub for existing SO-101 datasets — you may be able to fine-tune from an existing base dataset and save recording time.

Dataset Ready? Start Training.

Push your dataset to HuggingFace and train ACT or Diffusion Policy.