Visual Odometry for DJI Tello Drone

A monocular visual odometry pipeline that can be deployed on DJI Tello Drone. For implementation, please visit the GitHub repo for this project!

Visual Odometry Pipeline

The pipeline starts with camera calibration for the RGB camera located on the front of the drone; then the images captured will be converted to grayscale, and subsequently fed to ORB or ShiTomasi feature extractor. Features extracted from the previous frame will be matched in the next frame to find the image coordinates of the same set of features in the previous frame and current frame, which can then be used to solve the epipolar constraint equation for the essential matrix that contains the camera rotation and translation matrix representing the pose transformation between the two frames. Next, the transformation matrices will be stitched together to form the entire camera trajectory.

DJI Monocular Camera Calibration

In this project, 10 pictures of a chessboard of known dimensions–with 2’’ by 2’’ squares–are used to extract the image coordinates of all the corners of the squares using openCV function, set the origin of the world frame to be at the bottom left corner of the chessboard, assign world coordinates to the extracted squares, and solve for the camera intrinsic parameters by calculating through the above procedures.

Feature Extraction

Feature points are identified in each image using open cv ORB and Shi-Tomasi feature extractor to prepare for feature points tracking which allows us to calculate the motion between feature points.

Optimal Flow

The next step would be to track the movement of the same feature points between frames in order to establish correspondences. The Lucas-Kanade Optical Flow is used to track the features movement.

Recover Camera Poses and Construct Trajectory

Trajectory formed using visual odometry on real DJI Drone as shown in the video above.

Odometry result on the KITTI dataset sequence 2.

After finding correspondences of feature points in two consecutive frames using the LK Optical Flow as introduced previously, the camera pose transformation between the two frames can be found by solving the epipolar constraints.

Using all the feature points image coordinates, the essential matrix E can be solved using least square solution. The next step would be to decompose the essential matrix E using singular value decomposition (SVD)

After obtaining all camera transformation matrices between the frames, the R and t can be put together into a SE(3) transformation matrix, and multiplying the SE(3) matrices together gives the camera pose at that time step