Navigation auf uzh.ch
Davide Scaramuzza, you have won the IEEE Kiyo Tomiyasu Award. What does this honor mean for you personally and for your career as a researcher?
Davide Scaramuzza: The IEEE Kiyo Tomiyasu Award is an IEEE Technical Field Award. It was established in 2001 to recognize outstanding early to mid-career contributions to technologies holding the promise of innovative applications. It’s one of the most prestigious career awards by the IEEE, the Institute of Electrical and Electronics Engineers. This is the third time that the award has been given to a robotics researcher, with the previous two having gone to researchers at MIT and UC Berkeley. Therefore, I consider this award a testament to the excellent research and technology transfer my team members conducted during my 12-year career at the University of Zurich, demonstrating our impact on fundamental studies and real-world applications.
Can you tell us more about the specific contributions that led to you receiving this award?
The award recognizes two key contributions: agile visual navigation of micro drones and low-latency robust perception with event cameras.
Regarding the former, in 2009, when I was fresh off my PhD at ETH Zurich, we showed for the first time that a small drone with four motors, or quadcopter, could fly all by itself, without GPS, by using a single onboard camera and an inertial sensor. This result marked the beginning of what is now known as vision-based drone navigation. This work inspired the navigation algorithms of the NASA Mars helicopter and many other drone companies. Later, when I became a professor at UZH, I started to work on the “agile” visual navigation of microdrones. Until then, only expert human pilots could fully exploit the aircrafts’ capabilities. By augmenting the traditional robotic control architecture with learning-based methods, we showed that a drone could, for the first time, fly as agile as expert pilots, including performing acrobatic maneuvers or dodging obstacles at high speed. These works are unique because the neural network models were trained entirely in simulation. This was a premiere and inspired many other works beyond aerial robotics, such as ground robotics and robot manipulation. Finally, in June 2022, our AI drone, Swift, challenged and won against the two drone-racing world champions in the first AI-vs-human race, a result published in Nature.
And what is the other key contribution?
In terms of low-latency robust perception with event cameras, standard cameras are the preferred sensors for cars and drones but they suffer from a bandwidth-latency tradeoff: higher framerates reduce perceptual latency but increase bandwidth demands, while lower framerates save bandwidth at the cost of missing vital scene dynamics due to increased perceptual latency. All this limits the vehicle’s ability to perform well at high speeds, thereby decreasing its safety. Event cameras are bioinspired sensors that overcome these problems. First commercialized by UZH spinoff IniVation in 2008, they are focal-plane sensors that output per-pixel brightness changes asynchronously with microsecond resolution and only a fraction of the bandwidth of standard cameras. They also feature sub-millisecond latency at all speeds. However, standard computer-vision algorithms can’t be applied due to their asynchronous nature. My team contributed low-latency, robust perception algorithms using event cameras and popularized these sensors in the robotics and computer vision communities.
Our research is important because it adds aircraft perception and safety: until not too long ago, drones used only GPS to navigate, while they can now navigate where GPS is not available.
Why is this research important, and what practical applications does it enable?
One of the most exciting things about research is that it can inspire applications in fields outside those they were originally conceived for. Our research results have been transferred to our lab spinoffs and products from many other companies, from drones to automobiles, professional photo cameras, augmented and virtual reality headsets, and mobile devices. Our first spinoff, Zurich-Eye, became Facebook-Meta Zurich and developed the head-tracking software and hardware of the world-leading virtual-reality headset Meta Quest. Fotokite builds tethered drones for first responders. Another, SUIND, builds autonomous drones for precision agriculture.
Let’s dig into the details a bit. Regarding our research on visual navigation, a rotorcraft is the only robot that, without the ability to see, can’t remain still in flight unless it uses GPS. A legged robot or an automobile can afford not to see to stand still; its feet or wheels are already on the ground. But a drone that doesn’t see can’t hover! This is because it needs to observe landmarks in the environment to avoid crashing. To understand what I mean, try to stay balanced on the tip of one foot with your eyes closed. You won’t manage. And neither does a drone. The ability to see is not only essential to moving around, but it is also critical to staying balanced. Our research is important because it adds aircraft perception and safety: until not too long ago, drones used only GPS to navigate, while they can now navigate where GPS is not available, which is critical in logistics, inspection, mobility and search and rescue missions.
So you worked to refine visual perception?
Over the past years, we worked on refining the vision algorithms of our drones and made them more accurate and faster so that they could run on the drone’s onboard computer, whose power is extremely constrained. What we didn’t expect is that this would be invaluable for augmented reality and virtual reality (AR/VR) applications, where visual positioning software processing the cameras and inertial sensors onboard the headset must run fast and robustly on the onboard computer. Interestingly, the AR/VR industry uses the same sensors and hardware we were using for our drones.
Another big surprise was that our low-latency sense-and-avoid algorithms based on event cameras would be interesting to automotive companies. Over the past three years, we have worked with top-tier companies to explore the use of event cameras to increase the safety of drivers and other traffic participants.
Why is it important to push the boundaries of autonomous flight? What new opportunities are opening up as these technologies evolve?
Making autonomous drones faster means extending their productivity. This is crucial in logistics, inspection, mobility, search and rescue, and disaster mitigation. For example, drones are faster than ambulances in delivering blood samples and medicines to rural villages in Rwanda and Ghana. The use of fast drones is also being explored to prevent or suppress wildfires, a global concern; the idea is to direct drones to a location pinpointed by sensors spread in a forest, extinguish the fire while it’s still small, and then fly back to the base station.
The big problem in extending a drone’s productivity is endurance, which is how long a drone can fly before it runs out of battery. A small rotorcraft’s battery is limited to 20 to 30 minutes. While new batteries are being developed, we have proposed a different idea: to fly faster. Indeed, a drone that flies faster can fly longer distances, since, counter-intuitively, a rotorcraft consumes less power in forward flight than it does when hovering. However, this is only valid up to a certain speed, called optimal speed, beyond which flying faster no longer makes sense. The optimal speed depends on the drone type, the trajectory, the task at hand, and the wind. Currently, all commercial rotorcrafts fly much slower than the optimal speed, but we know future drones will reach this speed. It’s simply necessary if you want to increase their productivity, from delivering more cargo to performing more inspections with one battery charge. However, to fly fast, there are a few challenges to overcome, like latency, motion blur, and aerodynamic effects. Our research on agile visual navigation and low latency perception with event cameras aimed to address these challenges.
What advances in AI and robotics are needed to further improve the autonomy of drones?
Some of the impressive feats of our research were made possible by deep neural networks. However, neural networks suffer from limited interpretability and require a lot of data to be trained. The former is a problem studied by the broad machine-learning community. To overcome the latter, we resorted to learning from simulation with zero or minimal fine-tuning in the real world. However, simulation can only capture so much of reality. The simulation-to-reality gap exists because it is very difficult to model a robot, its environment, and its physics. Also, it is impossible to simulate all nuances of the real world. The alternative would be to learn directly from interactions with the environment like humans and animals do. This is what we are currently aiming at.
What further innovations can we expect to see in the field of autonomous flight?
The research community is investigating new materials and designs, including shape morphing designs, to improve the agility and efficiency of autonomous flight. New sensors and methods are also being explored to improve the range of sense and avoid capabilities, which are key in cargo deliveries. New positioning and communication methods are also being developed for drone swarms, which are key in search and rescue missions and disaster mitigation. Another key innovation is aerial manipulators: drones equipped with arms for inspecting and maintaining infrastructure like bridges and power lines, which are risky for human workers operating at high altitudes.
What projects are you currently pursuing?
We’ve just finished a four-year European project on infrastructure inspection using fast drones. Hundreds of thousands of bridges are at risk of collapse worldwide, and over 100,000 km of power lines in Europe are seldom inspected. A few months ago, we started a new EU project on fast inspection of container ships. Container ships transport most of our food and goods worldwide but must be inspected regularly to avoid delays. Drones can make an impact here too, so this new project is looking into new methods to shorten the inspection time and improve inspection quality. In parallel, we are also working on the next generation of VR/AR headsets and automotive perception.
Drones are currently a hot topic in the media, especially in connection with their use in war zones. How do you deal with the problem that findings from your research can be used in such contexts?
As a researcher who uses drones as a research platform, I am deeply aware of the dual-use nature of our innovations. While our work primarily aims to enhance drones' safety, efficiency and utility in civilian applications such as automotive, inspection, logistics, mobility, agriculture, disaster mitigation, and search and rescue, I recognize that the same findings could be applied in military contexts. Unfortunately, this is a problem that affects all robotics and AI research as a whole.
To address this, I advocate for and engage in ethical research practices. This involves clear communication of the intended use cases of our research and actively participating in discussions on regulations and guidelines that govern the use of AI and drone technologies. Additionally, my team and I work closely with policymakers, ethicists and industry leaders to ensure that the benefits of our research are maximized while the potential for harm is minimized.
It is also crucial to foster public discourse about the ethical use of technology. By participating in educational outreach and policy-making efforts, we aim to influence the responsible development and deployment of drone technologies globally.