Team 6 — EECS 106A — UC Berkeley — Spring 2026
Nicholas Gellerman & Braden Wang & Scott Schuster & Joe Mattson
Legos are snap-fit building blocks that can be used to build endless structures, figures, or mechanisms. As engineers, we all grew up playing and building with Legos almost every day, sparking our passion for making things. We wanted to build a robot that assembles Legos — a fitting demonstration of how far our engineering skills have come.
The goal of the project is for the robot to be able to build any given structure from a standard set of Duplo Lego bricks. It should be able to detect these bricks from anywhere on the table and place them in the correct position and order on a build plate. To achieve this goal, we built a custom website where a user can design a custom build from our set of Legos and send it to the robot. Once a build is submitted the UR7e robotic arm will plan out the build order, find all the required bricks, then assemble the set.
We chose Duplo bricks because they are much larger than normal bricks and about the perfect size for our UR7e’s gripper. One goal of this project is to develop an interesting computer vision module. We thought the Legos were the perfect balance of simplicity and complexity for this problem. While every brick is rectangular and brightly colored, there are over 10 different (often very similar) colors and orientations the bricks can be in. We also had to design a custom gripper to grip each Lego by the studs because side grips would interfere with neighboring bricks. Lastly, we thought this project would be a fun expression of our love of building and make for an aesthetic result.
Some real-world applications include stacking or packaging robots that need to place many different shaped and sized objects correctly while also sorting by color. A novel gripper attachment would allow the robot to pick up oddly shaped items as well. Controlling from a website may allow people to define how they would like their items packed in a specific space remotely, while also being able to change how it is done quickly. There are implications for workers not having to rewrite code manually for any change, but just using a website to change the final pose of the item and having the robot deal with the rest.
Our system assumes bricks are placed stud-side up anywhere on a flat table within the reach of the robotic arm, ensuring both the bricks and the ArUco marker remain within the camera's field of view. A successful implementation would mean the robot can correctly pick up and place each brick, following the user's design without knocking over neighbors or misaligning on the build plate. It must also distinguish 10 brick colors under the weak, inconsistent lighting of the 106A lab, detect arbitrary yaw orientations (not just grid-aligned), and position bricks accurately enough to engage Duplo clutch power. We also assume there are no impossible Lego configurations or overhangs in the actual set. Our website enforces these constraints automatically.
The first major design choice we made for the gripper was the decision to pick up the bricks by the studs. In order to allow the robot to place bricks next to each other, we needed the gripper to avoid any potential contact with neighboring bricks. In an ideal world the robot would pick up the studs from the inside since this keeps the profile of the gripper as small as possible, but it also makes the movement tolerances that much tighter. Since we were detecting the starting locations of the bricks using a sometimes noisy computer vision system, we decided to grip the studs like a vice clamp in order to take some weight off the computer vision.
When we first conceptualized the project, we planned to have the robot scan an existing build and replicate it. We decided to move to a digital Lego build designed on a website for several reasons: 1) Reduced scope creep — by narrowing our computer vision tasks we were able to create a more robust brick detection model, which vastly improved our assembly. 2) Real-world parallels — CAD-driven workflows are now standard in manufacturing, and we wanted to mirror that approach for Lego assembly.
The core challenge in our detection node was handling the 106A lab's weak and uneven lighting across 10 similar-looking colors. We first tried depth-only detection, but depth alone cannot distinguish brick colors or handle cases where bricks are stacked — height ambiguity made it unreliable. We then tried 2D color blob detection on the RGB image, which was fast but could not recover brick orientation or handle partial occlusions, and was highly sensitive to shadows and reflections on the brick surfaces.
We ultimately chose to do classification on 3D point cloud clusters colored in LAB color space. LAB separates luminance from the color channels, which made our color ranges significantly more robust to the harsh reflections and inconsistent shadows in the lab compared to HSV. To localize the build plate we use an ArUco marker as the primary method, running 6 detection attempts with different CLAHE preprocessing and threshold parameters to tolerate poor lighting conditions. Once the baseplate frame is locked, we transform the RealSense point cloud into baseplate coordinates and filter it to only the XY footprint of the build area — eliminating the robot arm and surrounding clutter before clustering. DBSCAN then separates individual bricks without requiring us to specify how many are present. For each cluster, shape and yaw are estimated via PCA on the XY extent of the points, which naturally handles off-axis orientations. Height is estimated from the 90th-percentile Z value and used to classify bricks as half, normal, or tall.
The planning node has three main functions: handle state transitions between robot tasks, plan inverse kinematics for joint moves, and send instructions to all other nodes. For inverse kinematics we used a trajectory model from Lab 5 that took in current and goal joint positions to calculate trajectories using MoveIt. This proved to be a limitation, as the seeding of these trajectories introduced significant variance in end-effector positions. For better precision we should have implemented a PID controller or a Jacobian-based IK solver. For node communication we mainly used topic publishers and subscribers in the ROS2 environment. This approach was simple and relatively clean. ROS2 services were used for inverse kinematics calculations and toggle gripper calls.
Our setup centers on the UR7e robotic arm, which provides the reach and precision needed to pick bricks from anywhere on the table and place them onto the build plate. A RealSense depth camera is mounted above the workspace, providing both the RGB image and the organized point cloud that the detection node relies on. The build surface is a standard 16×16 stud Duplo baseplate, fixed to the table with an ArUco marker jig 3D printed to sit flush at the corner — this gives the robot a repeatable coordinate frame for the build area at startup.
The most significant custom fabrication was the end-effector gripper. It is a 3D printed vice-clamp style tool that pinches the top studs of each brick from the outside. Because the gripper contacts only the studs and not the sides of the brick, it can lower onto a brick that is already surrounded by neighbors without colliding with them. Several iterations were needed to dial in the finger geometry and compliance — early prints were too rigid and would either miss the studs or crack them; the final design has a slight flex in the fingers to tolerate small positioning errors from the CV system.
The website effectively functions as our project's user interface. Through the website, the user builds their Lego set and then just sends it to the robot, which then builds their set brick for brick. To the user, everything else is a black box. To us, it is the beginning of a long series of steps to get the user's Lego set into the real world. After the user clicks the "Send to Robot" button, the website slices the build, not dissimilar to how slicing software works for a 3D print. The slicer looks at what the user has built and orders each brick by how high above the build plate its studs will be. Then it orders them from left to right, back to front, before sending the sequence to the robot as a JSON file. The brick type, color, location, and orientation are all encoded in the JSON file so the robot knows exactly which brick to pick up, in which order, and where to place it on the build plate.
While the website was a core portion of the functionality of our project, we didn't put as many hours into it compared to the other modules. The core purpose of this project was still to exercise and further our knowledge in ROS 2 and robotics engineering, so making the website as water-tight as possible wasn't high on our list of priorities. As such, once the website was capable of keeping track of and placing every type of brick available to us and correctly exporting an ordered list of bricks for the robot to read, we left it as it was. If you go poke around the builder you might notice some... peculiar behaviour from some of the bricks, especially when you try to rotate some of the funkier shapes.
The detection node takes the RealSense point cloud and RGB stream as input and outputs the 6-DOF pose, color, shape, and height of every brick visible on the table. The pipeline runs as follows:
baseplate_frame). If ArUco fails entirely,
the node falls back to HSV segmentation of the green baseplate surface.baseplate_frame coordinates. Points are then filtered to only those within the XY footprint of
the build area (plus a 20 mm margin) and within a Z band above the table surface but below the tallest
possible brick. This eliminates the robot arm, table edges, and background clutter before any clustering.base_link frame and published as a
PoseArray alongside a JSON metadata string containing color, shape, and height type for the
planning node to consume.
The most time-consuming part of this module was color tuning. The LAB color ranges for each brick had to be calibrated against the 106A lab's lighting specifically, as the same brick looks meaningfully different under different illumination. We built a separate HSV/LAB tuner tool to capture live frames and interactively adjust the ranges until all 10 colors were reliably distinguished. The trickiest pairs were orange/red and light blue/mint, which required tightening the ranges significantly — at the cost of occasionally dropping detections on bricks hit by a strong specular reflection. We also noticed the smaller 2x2 bricks were much more error-prone than larger bricks because their small size provides less data for pose and centroid estimation.
The inverse kinematics node proved the most difficult module to get working reliably. When the planner needed to move the robot, we called MoveIt’s IK solver repeatedly until it found an acceptable solution, which prevented the robot from generating long or impossible trajectories. A few edge cases were hardcoded into the planner: before each move, we selected the brick pose requiring the least yaw rotation, and locked the yaw to the nearest cardinal direction when the orientation was ambiguous. We also defined a hardcoded safe home position that the robot returns to before large moves, preventing the IK planner from taking unsafe paths to distant positions that could endanger the build or nearby people.
One of the worst bugs we encountered was in the interaction between the planning and inverse kinematics nodes. The planning node never shut down our CV node which was very computationally expensive. The CV node combined with the inverse kinematics calculation caused our controller to lag, miss interpolation steps, and eventually crash the robot. We fixed this bug by disabling CV unless the robot is stationary.
With each module developed, the end-to-end pipeline works as follows:
baseplate_frame TF into the ROS 2 transform tree — establishing a fixed coordinate frame for the build area.base_link.baseplate_frame TF and the current step's grid position and rotation from the build plan, the IK node computes the target placement pose and deposits the brick onto the plate.Our system is able to build simple Lego sets. We have a placement success rate of about 85% due to the tight tolerances of Lego bricks, but a brick detection rate of close to 100%. Our slicer has never produced an invalid build plan. Overall, given our hardware constraints and current software architecture, we think this result is very successful.
Our primary remaining challenges are picking and placing 2×2 bricks. There simply isn't enough point cloud data to reliably estimate the positions and poses of these small bricks. A different CV approach, such as learned detection, would likely yield better results. Reliable press fits on the build plate are also an ongoing challenge — the combined error from centroid detection and the ArUco tag transform is often just enough to prevent the studs from fully engaging.
In this project, we demonstrated our ability to build a complex system that uses computer vision to plan novel Lego arrangements and execute them semi-reliably, all while integrating custom hardware we designed ourselves. We are proud of the final product we produced. The challenges we faced gave us a new appreciation for how hard seemingly simple tasks can be and pushed our problem-solving skills to their limit. We think this project is a great example of our growth as builders — from assembling complex structures with Legos as kids to building a machine that assembles them for us.
The largest contributor to our 15% placement failure rate is variance in MoveIt's seeded IK solutions. Replacing the current motion planner with a Jacobian pseudoinverse solver or a Cartesian-space PID controller would give us deterministic, repeatable end-effector positions rather than relying on trajectory seeds that drift between runs.
Our point cloud pipeline struggles with 2×2 bricks because their small footprint yields too few cluster points for reliable PCA and centroid estimation. Training a lightweight object detection model (e.g., YOLOv8) on RGB crops of each brick type would provide pixel-accurate center estimates that are robust even when point cloud density is low, and could replace our hand-tuned LAB color ranges with a model that generalizes across lighting conditions.
Adding an F/T sensor at the wrist would allow the robot to detect when a brick stud has fully engaged the clutch power of the baseplate instead of relying solely on a fixed Z-depth plunge. Force-guided insertion could actively correct for small lateral misalignments during the press, directly addressing the unreliable press fits we observed without requiring tighter CV accuracy.
Duplo bricks were chosen because their size is well-matched to the UR7e's gripper, but extending the system to standard 1:1 Lego bricks would require a fully redesigned gripper with ~4× finer positioning tolerances, a higher-resolution camera or macro lens, and significantly more precise IK. Successfully assembling standard Lego sets would dramatically expand the catalog of buildable designs and demonstrate a much higher bar of manipulation precision.
4th-year mechanical engineering undergrad focusing on electric powertrains and thermals.
3rd-year mechanical engineering undergrad focusing on robotics, medical devices, systems engineering, and product design
3rd-year mechanical engineering undergrad, minor in EECS, focusing on robot end effectors.
4th-year mechanical engineering undergrad focusing on electric powertrains and fea modeling
Design your own Duplo build below, then click Send to Robot to queue it for assembly in the 106A lab. Use Export Build Plan to download the JSON sequence locally.