Ann Arbor tech startup Voxel51 has been gaining huge traction over the past year for their advances in computer vision. Through their innovative AI system, Voxel51 is able to extract insights from videos and offer solutions for road obstructions, accidents, and more on a horizontal scale. In 2019, they were selected as one of the top five tech AI startups in the U.S. by TechCrunch. We spoke with Voxel51 CEO and cofounder Jason Corso about his 20 years in the computer vision industry and how his company’s AI system is helping improve road safety in fleet management.

Can you tell us how Voxel51 got started?
I’m actually a full-time professor of electrical engineering and computer science at the University of Michigan in Ann Arbor. In 2014, I had a student named Brian Moore who now is actually the cofounder of Voxel51. In 2016, we were discussing cloud-based services for computer vision. When looking at larger companies like Google and Amazon, we saw a limitation on their ability to process video.
The two of us formed a small LLC and wrote a grant to the government about a video analytics platform. We wanted to see if it was possible to make something like this at scale from a technological point of view. We ended up winning the grant and dove deeper over the next 18 months to build out that underlying video processing platform. Then in 2018, once we decided the technology was viable, we built a business around it. My whole career has been on video understanding in the research world and I just wanted to see what more impacts could be made to take that type of research and bring it into practice.
How is Voxel51 making a difference in the computer vision industry?

We see ourselves really as an enabler. Our technology is able to bring our video understanding algorithm, or our customers’ own algorithm, and then allow them to touch a lot of video on scale. Most of the players like us in the field are focused specifically on the capabilities of the enablement. They tend to be content providers. So when we work with a customer, we take no ownership over the data customers bring to the platform. They’re just giving us license to process their data and they retain ownership of the results as well.
On the technology side, some of the differentiation comes from the fact that we are specifically video first. Rather than looking individually at separate frames and then trying to track that through the entire video, our video first algorithm looks at chunks of the video at the same time and is able to overcome local ambiguities. So if I see a car in five frames and then in the sixth frame it’s half occluded by a bus, our algorithm knows it’s the same car beforehand. A frame or image based algorithm often makes those types of mistakes.
Secondly, our underlying API (application programming interface), like how customers use our technology, is also built specifically for video. A user has some data, like an image, and they send the data through an API and wait for the results. You can do that with images or for very short videos. But when you get to videos in the real world that last 10 minutes, an hour, or are live streamed, the results are going to take forever to process. So we really had to build out our platform to do that on a horizontal scale.
So how is this technology applied in the real world?

We’re bringing these capabilities primarily into the automotive sector where they’re getting a lot of videos on the road from many vehicles. Each individual company has these videos uploaded into the data center, whether it’s cloud based or a physical data center on premises. Some of that data is usable in some way for their scientists, usability engineers, and so on. But right now, the only way they can find what data is useable is if they want to search for things like the GPS location of a video, weather, or time of day.
Where we come into play is we process all that video for them and then provide a rich, semantic index of the content of that video. What we call the first order of things is detecting if a vehicle is a sedan or a Hundai as well as detecting road signs and painted markings. Second order analytics that add a lot of value in the automotive space are things like tailgating or near-missed pedestrian accidents. From a safety point of view, those are the most relevant across the research and engineering side of the automotive sector.
How is this tech improving safety in the automotive sector?
In fleet management scenarios, you have a vehicle, a driver, and a dash cam monitoring the roads. Whenever the vehicle does something sudden, like a hard brake or going over a bump, that’s captured by the inertia sensors and gets sent to the cloud. As I understand it, our customers are getting hundreds of thousands of these a day. That’s too much video for a human to look at. So our very concrete value add there is to automatically triage the videos. So if it was just a bump and the truck was going straight on the road, no human has to look at it. 50% or more of the time, this is the case. These are used to improve safety and training in different scenarios, as well as compliance and liability.
Can you tell me a little bit about your life before Voxel51?

Part of the reason Voxel51 exists is that I’ve been an entrepreneur my entire life. Before college, I was a programmer in Manhattan at a database company. I received my PHD in computer science from John Hawkins in 2005 and my first faculty position was in 2007. My research group has been funded across different problem spaces from the National Science Foundation, the Army Research Office, and the National Institute of Health.
The goal of my entire career is to figure out how we can build algorithms and mathematical models that have the visual intelligence that humans do and can learn over time. After twelve or so years of faculty, I got kind of an itch. I had written a lot of papers and thought about what those words could become and I didn’t want to hand it all off to researchers. I wanted to create it myself.
What was it that drove you to create this program?
I think we all have a responsibility to use our abilities to the betterment of our environment. For example, the grant program that we wrote initially is a public safety grant program. It was all about how we could help first responders get to locations faster. We worked with Baltimore City police, which has a city watch program. They had roughly 800 hundred cameras and only about 8 retired officers watching them. That meant each person had to watch 100 cameras at a time. The whole program is really about preventative engagement.
Now if there’s a pipe bursting, they can get officers there quickly. Or if somebody falls in the harbor they can send rescue personnel over. At the very beginning, Voxel51 started as a way of bringing our expertise in a way that could value society. I think what we’re doing right now more broadly remains faithful in that way.
Do you feel like Ann Arbor is a great place for tech startups?

Absolutely. When I first moved here around 6 years ago, I didn’t see much activity from tech startups. Now they’re everywhere you look. We occupy the third floor of the Kerrytown Market building and just below us is another tech startup. Out the window right now I can see a robotic food delivery service. There’s a lot of talent around here, but we also have access to people in Detroit and the whole southeast Michigan area. There are also many resources. I’m part of what’s called the Ann Arbor Entrepreneur’s Fund where a group of entrepreneurs come together once a month to share war stories and help each other out. Ann Arbor is a booming tech startup.
What do you think about Voxel51’s mission to improve road safety? Let us know down in the comments.
This article originally published on GREY Journal.