Why ACT Excels at Bimanual Tasks
ACT (Action Chunked Transformers) was originally developed specifically for bimanual manipulation research. Its core insight — that predicting sequences of future actions (chunks) rather than single-step actions reduces compounding error — is especially valuable for bimanual tasks, where a small error in one arm's trajectory can cause a cascade failure in the other arm's execution.
The action chunking mechanism effectively gives the policy a planning horizon. Instead of committing to a single joint command at each 50Hz timestep, ACT plans 100 steps ahead and smooths the execution. For a handoff task, this means the policy can "see" the approach of both arms toward the handoff point as part of a planned sequence, rather than reacting to each frame independently. Empirically, this halves the rate of mid-transfer failures compared to non-chunked approaches on bimanual datasets.
One caution: ACT assumes the demonstrations in your dataset represent a consistent strategy. If different demos show fundamentally different ways of executing the handoff — different arm which initiates, different handoff height — the CVAE component will struggle to encode a single style. Your 100 demos should all execute the same motion strategy.
Training Command
--device cuda flag if you have a GPU. Cloud GPU options (Lambda Labs, Vast.ai) run about $0.50–1.50/hr for the hardware needed.
Reading Bimanual Training Curves
Bimanual training curves differ from single-arm in one important way: you have two action spaces, and the policy must learn to coordinate them. Watch for these patterns in your loss curves (view in TensorBoard at tensorboard --logdir ~/dk1-policies/):
L_reconstruction (overall action loss)
Should decrease from ~3.0 to below 0.4 by 60,000 steps. A plateau above 0.7 after 40,000 steps indicates dataset quality issues — likely too much variance in the handoff timing or position.
L_kl (CVAE regularization)
Starts near 0 and rises slowly to 5–15. If it rises above 30, the CVAE is struggling to find a compact style embedding. This often means your demonstrations have too much behavioral diversity. Consider culling the bottom 20% least consistent demos and retraining.
Action error: left vs. right
If you enable per-arm action error logging (via the training.log_per_action_dim=true override), you will see separate loss curves for the left and right action dimensions. A large persistent gap between the two indicates one arm's demonstrations are more consistent than the other's — review your Unit 4 quality checklist for the lagging arm.
Bimanual-Specific Hyperparameters
| Parameter | Default (single-arm) | DK1 Bimanual Recommended | Why |
|---|---|---|---|
action_dim |
7 | 14 | Two 6-DOF arms + 2 grippers = 14 action dimensions |
chunk_size |
100 | 100 | Same — action chunking is already well-suited to bimanual coordination timescales |
dim_feedforward |
3200 | 3200 | No change needed — the larger action space is handled by the action head, not the transformer width |
num_steps |
50000 | 80000 | Bimanual coordination requires more training steps to converge reliably; 80k is the practical minimum for 100 demos |
batch_size |
32 | 16 | Reduced to fit the larger bimanual dataset samples (dual camera feeds) in GPU memory |
kl_weight |
10 | 10 | Default works well; increase to 20 only if L_kl stays near zero after 30k steps (CVAE not learning) |
Checkpoint Selection
Save checkpoints every 5,000 steps (training.save_freq=5000). Do not assume the final checkpoint is the best. Bimanual policies can overfit at high step counts — the policy learns to reproduce training demonstrations perfectly but loses generalization to the slight real-world variations you will encounter during evaluation.
Select the checkpoint at the step where L_reconstruction reached its minimum before starting to plateau or slightly increase. Usually this is in the 60,000–80,000 step range for 100-demo bimanual datasets. Deploy two checkpoints (the minimum-loss checkpoint and the final one) and compare their real-world performance in Unit 6.
Unit 5 Complete When...
Training has completed 80,000 steps and checkpoints are saved at ~/dk1-policies/cube-handoff-v1/. The final L_reconstruction value is below 0.5. You have identified your best checkpoint based on the loss curves. You understand why the L_kl curve behaves as it does in your run. You are ready to deploy to real hardware in Unit 6 — target success rate on the cube handoff is >60% (bimanual is harder than single-arm, and this is a strong first-run result).