LEGO Bot — EECS 106A Final Project

Introduction

Project Goals

Legos are snap-fit building blocks that can be used to build endless structures, figures, or mechanisms. As engineers, we all grew up playing and building with Legos almost every day, sparking our passion for making things. We wanted to build a robot that assembles Legos — a fitting demonstration of how far our engineering skills have come.

The goal of the project is for the robot to be able to build any given structure from a standard set of Duplo Lego bricks. It should be able to detect these bricks from anywhere on the table and place them in the correct position and order on a build plate. To achieve this goal, we built a custom website where a user can design a custom build from our set of Legos and send it to the robot. Once a build is submitted the UR7e robotic arm will plan out the build order, find all the required bricks, then assemble the set.

Three main components:

An interactive website for a user to design a custom LEGO build from anywhere and send it to the robot
A computer vision pipeline to detect Duplo brick poses, colors, and sizes using a RealSense depth camera
A planning and IK node to generate robot trajectories that assemble the design brick by brick on the build plate

Why a Duplo Assembler?

We chose Duplo bricks because they are much larger than normal bricks and about the perfect size for our UR7e’s gripper. One goal of this project is to develop an interesting computer vision module. We thought the Legos were the perfect balance of simplicity and complexity for this problem. While every brick is rectangular and brightly colored, there are over 10 different (often very similar) colors and orientations the bricks can be in. We also had to design a custom gripper to grip each Lego by the studs because side grips would interfere with neighboring bricks. Lastly, we thought this project would be a fun expression of our love of building and make for an aesthetic result.

Some real-world applications include stacking or packaging robots that need to place many different shaped and sized objects correctly while also sorting by color. A novel gripper attachment would allow the robot to pick up oddly shaped items as well. Controlling from a website may allow people to define how they would like their items packed in a specific space remotely, while also being able to change how it is done quickly. There are implications for workers not having to rewrite code manually for any change, but just using a website to change the final pose of the item and having the robot deal with the rest.

Design

Design Criteria

Our system assumes bricks are placed stud-side up anywhere on a flat table within the reach of the robotic arm, ensuring both the bricks and the ArUco marker remain within the camera's field of view. A successful implementation would mean the robot can correctly pick up and place each brick, following the user's design without knocking over neighbors or misaligning on the build plate. It must also distinguish 10 brick colors under the weak, inconsistent lighting of the 106A lab, detect arbitrary yaw orientations (not just grid-aligned), and position bricks accurately enough to engage Duplo clutch power. We also assume there are no impossible Lego configurations or overhangs in the actual set. Our website enforces these constraints automatically.

Key Components:

UR7e robotic arm
Custom 3D printed stud grippers
RealSense Depth Camera
Set of Duplo Legos with Lego Baseplate
ArUco tag with 3D printed alignment jig

Design Choices

Gripper

The first major design choice we made for the gripper was the decision to pick up the bricks by the studs. In order to allow the robot to place bricks next to each other, we needed the gripper to avoid any potential contact with neighboring bricks. In an ideal world the robot would pick up the studs from the inside since this keeps the profile of the gripper as small as possible, but it also makes the movement tolerances that much tighter. Since we were detecting the starting locations of the bricks using a sometimes noisy computer vision system, we decided to grip the studs like a vice clamp in order to take some weight off the computer vision.

Website

When we first conceptualized the project, we planned to have the robot scan an existing build and replicate it. We decided to move to a digital Lego build designed on a website for several reasons: 1) Reduced scope creep — by narrowing our computer vision tasks we were able to create a more robust brick detection model, which vastly improved our assembly. 2) Real-world parallels — CAD-driven workflows are now standard in manufacturing, and we wanted to mirror that approach for Lego assembly.

Detection Node

The core challenge in our detection node was handling the 106A lab's weak and uneven lighting across 10 similar-looking colors. We first tried depth-only detection, but depth alone cannot distinguish brick colors or handle cases where bricks are stacked — height ambiguity made it unreliable. We then tried 2D color blob detection on the RGB image, which was fast but could not recover brick orientation or handle partial occlusions, and was highly sensitive to shadows and reflections on the brick surfaces.

We ultimately chose to do classification on 3D point cloud clusters colored in LAB color space. LAB separates luminance from the color channels, which made our color ranges significantly more robust to the harsh reflections and inconsistent shadows in the lab compared to HSV. To localize the build plate we use an ArUco marker as the primary method, running 6 detection attempts with different CLAHE preprocessing and threshold parameters to tolerate poor lighting conditions. Once the baseplate frame is locked, we transform the RealSense point cloud into baseplate coordinates and filter it to only the XY footprint of the build area — eliminating the robot arm and surrounding clutter before clustering. DBSCAN then separates individual bricks without requiring us to specify how many are present. For each cluster, shape and yaw are estimated via PCA on the XY extent of the points, which naturally handles off-axis orientations. Height is estimated from the 90th-percentile Z value and used to classify bricks as half, normal, or tall.

Planning and Execution Node

The planning node has three main functions: handle state transitions between robot tasks, plan inverse kinematics for joint moves, and send instructions to all other nodes. For inverse kinematics we used a trajectory model from Lab 5 that took in current and goal joint positions to calculate trajectories using MoveIt. This proved to be a limitation, as the seeding of these trajectories introduced significant variance in end-effector positions. For better precision we should have implemented a PID controller or a Jacobian-based IK solver. For node communication we mainly used topic publishers and subscribers in the ROS2 environment. This approach was simple and relatively clean. ROS2 services were used for inverse kinematics calculations and toggle gripper calls.

Implementation

Hardware

Our setup centers on the UR7e robotic arm, which provides the reach and precision needed to pick bricks from anywhere on the table and place them onto the build plate. A RealSense depth camera is mounted above the workspace, providing both the RGB image and the organized point cloud that the detection node relies on. The build surface is a standard 16×16 stud Duplo baseplate, fixed to the table with an ArUco marker jig 3D printed to sit flush at the corner — this gives the robot a repeatable coordinate frame for the build area at startup.

The most significant custom fabrication was the end-effector gripper. It is a 3D printed vice-clamp style tool that pinches the top studs of each brick from the outside. Because the gripper contacts only the studs and not the sides of the brick, it can lower onto a brick that is already surrounded by neighbors without colliding with them. Several iterations were needed to dial in the finger geometry and compliance — early prints were too rigid and would either miss the studs or crack them; the final design has a slight flex in the fingers to tolerate small positioning errors from the CV system.

Gripper holding a Duplo brick — Gripper holding a brick

Module 1 — Interactive Website and Slicing

The website effectively functions as our project's user interface. Through the website, the user builds their Lego set and then just sends it to the robot, which then builds their set brick for brick. To the user, everything else is a black box. To us, it is the beginning of a long series of steps to get the user's Lego set into the real world. After the user clicks the "Send to Robot" button, the website slices the build, not dissimilar to how slicing software works for a 3D print. The slicer looks at what the user has built and orders each brick by how high above the build plate its studs will be. Then it orders them from left to right, back to front, before sending the sequence to the robot as a JSON file. The brick type, color, location, and orientation are all encoded in the JSON file so the robot knows exactly which brick to pick up, in which order, and where to place it on the build plate.

Interactive builder demo — design a build and send it to the robot

While the website was a core portion of the functionality of our project, we didn't put as many hours into it compared to the other modules. The core purpose of this project was still to exercise and further our knowledge in ROS 2 and robotics engineering, so making the website as water-tight as possible wasn't high on our list of priorities. As such, once the website was capable of keeping track of and placing every type of brick available to us and correctly exporting an ordered list of bricks for the robot to read, we left it as it was. If you go poke around the builder you might notice some... peculiar behaviour from some of the bricks, especially when you try to rotate some of the funkier shapes.

Module 2 — Brick Detection and Pose Estimation

The detection node takes the RealSense point cloud and RGB stream as input and outputs the 6-DOF pose, color, shape, and height of every brick visible on the table. The pipeline runs as follows:

Baseplate localization: On startup, the node attempts ArUco detection on the RGB image using 6 parameter combinations with varying CLAHE preprocessing and threshold windows to handle the weak, uneven lighting in the 106A lab. Once the marker is detected across 3 consecutive frames, the baseplate's position is averaged and locked as a static TF frame (baseplate_frame). If ArUco fails entirely, the node falls back to HSV segmentation of the green baseplate surface.
Point cloud filtering: The organized point cloud is transformed into baseplate_frame coordinates. Points are then filtered to only those within the XY footprint of the build area (plus a 20 mm margin) and within a Z band above the table surface but below the tallest possible brick. This eliminates the robot arm, table edges, and background clutter before any clustering.
Voxel downsampling & DBSCAN clustering: The filtered cloud is voxel-downsampled at 2 mm resolution for speed, then clustered with DBSCAN (ε = 12 mm, min points = 10). DBSCAN was chosen because it does not require specifying the number of bricks in advance and naturally rejects noise as unclustered points.
Per-cluster classification: Each cluster is classified independently. Color is determined by converting the cluster's point colors from BGR to LAB space and matching against pre-tuned LAB ranges for 11 colors — any cluster where fewer than 40% of points match a color is discarded as noise or glare. Shape and yaw are estimated by running PCA on the XY extent of the cluster: the long and short axes are divided by the 16 mm Duplo stud pitch to get stud counts, which are snapped to the four valid brick shapes (1×2, 2×2, 2×4, 2×6). The angle of the PCA long axis gives the brick's yaw, handling arbitrary off-grid rotations. Height is estimated from the 90th-percentile Z value of the cluster and classified as half, normal, or tall.
Pose publication: Each detected brick's centroid (median of top-surface points) and orientation are transformed into base_link frame and published as a PoseArray alongside a JSON metadata string containing color, shape, and height type for the planning node to consume.

Isolated brick point cloud after filtering — Filtered to baseplate footprint

Point cloud with detected brick poses — Detected poses overlaid

The most time-consuming part of this module was color tuning. The LAB color ranges for each brick had to be calibrated against the 106A lab's lighting specifically, as the same brick looks meaningfully different under different illumination. We built a separate HSV/LAB tuner tool to capture live frames and interactively adjust the ranges until all 10 colors were reliably distinguished. The trickiest pairs were orange/red and light blue/mint, which required tightening the ranges significantly — at the cost of occasionally dropping detections on bricks hit by a strong specular reflection. We also noticed the smaller 2x2 bricks were much more error-prone than larger bricks because their small size provides less data for pose and centroid estimation.

Module 3 — Path Planning and Inverse Kinematics

The inverse kinematics node proved the most difficult module to get working reliably. When the planner needed to move the robot, we called MoveIt’s IK solver repeatedly until it found an acceptable solution, which prevented the robot from generating long or impossible trajectories. A few edge cases were hardcoded into the planner: before each move, we selected the brick pose requiring the least yaw rotation, and locked the yaw to the nearest cardinal direction when the orientation was ambiguous. We also defined a hardcoded safe home position that the robot returns to before large moves, preventing the IK planner from taking unsafe paths to distant positions that could endanger the build or nearby people.

One of the worst bugs we encountered was in the interaction between the planning and inverse kinematics nodes. The planning node never shut down our CV node which was very computationally expensive. The CV node combined with the inverse kinematics calculation caused our controller to lag, miss interpolation steps, and eventually crash the robot. We fixed this bug by disabling CV unless the robot is stationary.

System Pipeline

With each module developed, the end-to-end pipeline works as follows:

Receive New Build: The planning node polls JSONBin for the latest build plan — a JSON array of bricks ordered by the website's slicer.
Plan Build Order: The pre-sorted sequence is validated and loaded into memory. Bricks are already ordered bottom-up, left-to-right, back-to-front by the website slicer, so no re-sorting is needed on the robot side.
Scan ArUco Tag: The ArUco node detects the alignment marker on the baseplate jig, averages 30 pose samples, and publishes a locked baseplate_frame TF into the ROS 2 transform tree — establishing a fixed coordinate frame for the build area.
Scan For Next Brick: The brick detector node captures a RealSense depth frame, segments the point cloud to isolate the target brick by color and shape, and estimates its position and yaw relative to base_link.
Pick Up Brick: The IK node computes a pre-grasp approach trajectory, descends, closes the gripper around the brick studs in a vice-clamp grip, then lifts to a safe carry height.
Place Brick onto Build Plate: Using the baseplate_frame TF and the current step's grid position and rotation from the build plan, the IK node computes the target placement pose and deposits the brick onto the plate.
Repeat: Advance to the next brick in the sequence and return to Step 4 until all bricks are placed.

Conclusion

Challenges Encountered

Noisy Camera Data: Centroid position error of about 1mm after aggressive tuning, insufficient for alignment
Inverse Kinematics Failures: MoveIt would often fail an IK solution and hang the robot
Adverse lighting conditions: Sometimes lighting conditions were too poor for classification
LEGO tolerances: The press-fit tolerances of Lego bricks are too tight for a full CV approach
Project scope versus time: Our project scope was very large given the time on a robot shared across many other groups

In this project, we demonstrated our ability to build a complex system that uses computer vision to plan novel Lego arrangements and execute them semi-reliably, all while integrating custom hardware we designed ourselves. We are proud of the final product we produced. The challenges we faced gave us a new appreciation for how hard seemingly simple tasks can be and pushed our problem-solving skills to their limit. We think this project is a great example of our growth as builders — from assembling complex structures with Legos as kids to building a machine that assembles them for us.

Future Improvements

Jacobian-based or PID-controlled IK
The largest contributor to our 15% placement failure rate is variance in MoveIt's seeded IK solutions. Replacing the current motion planner with a Jacobian pseudoinverse solver or a Cartesian-space PID controller would give us deterministic, repeatable end-effector positions rather than relying on trajectory seeds that drift between runs.
Learned brick detection for small bricks
Our point cloud pipeline struggles with 2×2 bricks because their small footprint yields too few cluster points for reliable PCA and centroid estimation. Training a lightweight object detection model (e.g., YOLOv8) on RGB crops of each brick type would provide pixel-accurate center estimates that are robust even when point cloud density is low, and could replace our hand-tuned LAB color ranges with a model that generalizes across lighting conditions.
Wrist force/torque sensor for compliant insertion
Adding an F/T sensor at the wrist would allow the robot to detect when a brick stud has fully engaged the clutch power of the baseplate instead of relying solely on a fixed Z-depth plunge. Force-guided insertion could actively correct for small lateral misalignments during the press, directly addressing the unreliable press fits we observed without requiring tighter CV accuracy.
Scale down to standard Lego bricks
Duplo bricks were chosen because their size is well-matched to the UR7e's gripper, but extending the system to standard 1:1 Lego bricks would require a fully redesigned gripper with ~4× finer positioning tolerances, a higher-resolution camera or macro lens, and significantly more precise IK. Successfully assembling standard Lego sets would dramatically expand the catalog of buildable designs and demonstrate a much higher bar of manipulation precision.

[LEGO BOT] — A Robotic Arm Duplo Assembler!