“We live in a 3D world, but when you take a picture, it records that world in a 2D image” – Tianfu Wu
North Carolina State University researchers developed a novel technique, called MonoCon, that improves the ability of artificial intelligence (AI) programmes to identify three-dimensional (3D) objects, and how those objects relate to each other in space, using two-dimensional (2D) images.
Researchers including Xianpeng Liu and Tianfu Wu, North Carolina State University; and Nan Xue, Wuhan University brought this technique to light in a research paper, titled “Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection.”
Talking about one of its usages, using the 2D images received from an onboard camera, the research could aid the AI used in autonomous vehicles in navigating with respect to other vehicles. The work is not only limited to usage in the autonomous vehicle sector but has applications for robotics and manufacturing.
MonoCon builds on a substantial amount of existing work aimed at helping AI programs extract 3D data from 2D images. Many of these efforts train the AI by “showing” it 2D images and placing 3D bounding boxes around objects in the image. These boxes are cuboids, which have eight points – think of the corners on a shoebox. During training, the AI is given 3D coordinates for each of the box’s eight corners, so that the AI “understands” the height, width and length of the “bounding box,” as well as the distance between each of those corners and the camera.
The training technique uses this to teach the AI how to estimate the dimensions of each bounding box and instructs the AI to predict the distance between the camera and the car. After each prediction, the trainers “correct” the AI, giving it the correct answers. Over time, this allows the AI to get better and better at identifying objects, placing them in a bounding box, and estimating the dimensions of the objects.
“What sets our work apart is how we train the AI, which builds on previous training techniques,” Wu says. “Like the previous efforts, we place objects in 3D bounding boxes while training the AI. However, in addition to asking the AI to predict the camera-to-object distance and the dimensions of the bounding boxes, we also ask the AI to predict the locations of each of the box’s eight points and its distance from the centre of the bounding box in two dimensions. We call this ‘auxiliary context,’ and we found that it helps the AI more accurately identify and predict 3D objects based on 2D images.
“The proposed method is motivated by a well-known theorem in measure theory, the Cramér–Wold theorem. It is also potentially applicable to other structured-output prediction tasks in computer vision.”
The researchers tested MonoCon using a widely used benchmark data set called KITTI.
Source: indiaai.gov.in