What is synthetic data?
Synthetic data is a technique or a set of techniques for generating data where not enough data exists to train Artificial Intelligence (AI). Synthetic data takes a lot of forms. There is text or tabular data, which is used for things like patient health records. Then there is data comprising synthetic images and videos, which is what we specialize in.
Can you tell us about your recent contract with AFWERX?
AFWERX is a US Air Force program designed to foster innovation. A lot of technology is being developed by commercial companies, and AFWERX allows the Air Force to bring this innovation to the force and test it out. Through our contract with AFWERX, we bring in our RAIC (Rapid Automatic Image Categorization) technology that allows us to use synthetic data generation at the backend to build AI models quickly. By synthetic data generation I mean you can actually grow more data through generative AI. RAIC allows us to look at image data — imagery captured by satellite or airplane — to efficiently develop AI models.
When building an AI model, you typically have to go through a very long process of labeling the data. Then you need to train the model, validate it and eventually deploy it. The problem is that this process, specifically labeling, can take several months. Some of the big labeling companies have a large number of people labelling data and putting boxes around specific objects of interest. Even then it takes a long time. RAIC allows you to forgo this process. In a single click, you can pick an object that you want to identify, and then build an AI model with about 60 seconds of human time. This will allow people to build an AI model when they need it — not build something that they can only use six months later. With RAIC, we can identify objects at a much quicker speed and find them on large maps.
How will technologies like AI impact geospatial and Earth Observation industries?
The way that Earth Observation used to be done years ago, and to some extent is still done, is that humans would look at the imagery and find patterns. This makes a lot of sense — humans are very good at this job. But the problem with that is that there is a huge amount of data coming in on a daily basis. I believe that it has become impossible for humans to ever look through all of that data and analyze it the way that we used to. So AI, ML, synthetic data, or high speed and clustered computing — all of that comes together and takes what is an intractable task for humans and makes it usable for them.
How critical is it to have a sovereign and secure data environment?
This is something we think about a lot. Data security means that whether for geospatial, medical, or defense, you don’t want to share the data. However, in each of these industries, a lot of people are sending out the data to external groups that can then use it for human labelling and building AI models. This is another area where synthetic data can provide a lot of value. For example, we can share synthetic medical data and not have to worry about data security because none of that data is from real people. It’s the same in the geospatial industry. If we don’t have to share actual data from some of the satellites, then synthetic data can also be used as a way to train models without giving away information about location or resolution.
You have recently received the Rolex National Geographic Explorer of the Year award for your work in nature conservation. How are you using geospatial technologies in this effort?
We have used geospatial in a couple of different ways for nature conservation. We have done a lot of scanning from fixed wing aircraft and geospatial imagery for conservation work in the Democratic Republic of Congo, as well as in Botswana. In those cases, we were using data to count animals and look for signs of poachers. For example, to find areas where there is environmental disruption or where illegal charcoal manufacture is underway. We also carried out helicopter-based LIDAR scanning of Mt. Everest to look at how the vegetation is growing higher up in the mountains due to climate change. These conservation use cases are similar to how one would use geospatial data for commercial purpose.
How do you foresee the Geo-Intelligence community’s future?
I think the information coming from the geo-intelligence space is going to increase radically over the next few years. It is incredible to see companies like Maxar putting up satellites that can cover multiple locations on the earth multiple times a day. There is a whole bunch of processing you can do on that temporal speed. So, I think geospatial intelligence is going to become much more common, and we are going to use it to understand everything from real estate values to defense preparedness.
But the place where it is going to be used the most is with modern AI algorithms, which are growing in their size with larger parameters. We are moving from numbers like 20M tunable parameters to billions of tunable parameters, and each of these parameters needs more and more data. Synthetic data can provide the data needed for these tunable parameters. We are now at a point where we are able to grow synthetic data to enable really high performing geospatial AI.
What other sectors are you looking to get into in the near future?
One area that we are starting to work in a lot is healthcare. It’s actually a very similar application to geospatial. We are looking at microscope slides of human tissue from brain tumors, and these slides are basically maps. You have this huge amount of data about a tissue sample, and we are able to grow synthetic data to increase the training data size. So, we are working on that with a couple of hospitals across the US, led by the University of Michigan.
In addition to defense and geospatial, another industry that we will grow into is security. One of the things that we can do with synthetic data is that we can generate a near infinite amount of scenarios to better detect anomalies. So, for example, with enough data, your security camera might be able to tell you if your garage is on fire, which is something that it isn’t trained for today.
What would you say to people who fear that AI will take away their jobs?
At Synthetaic, we almost always end up creating AI decision support tools, so this is never a risk for us. As humans, we are so incredible at matching patterns and making decisions. The best case scenario is AI and humans working together. Of course, it’s a real risk that AI is going to take jobs away in some industries where processes are automated, and we are already seeing that. But I think we are a long way away from AI literally taking away our jobs. In the coming future, it will be more about AI making our jobs better the way computers have done.
What all challenges do you see in the geospatial AI industry in terms of policy or collaboration with the users?
One of the challenges that we see in geospatial AI is that often people only hear AI and think that it’s going to solve all problems. However, we feel at this stage AI can only do what human have been doing for years — more quickly, yes, but still the same core tasks. A lot of times in the industry, we still need to educate people about this. AI doesn’t give you super powers. We still run into setting the expectations about where AI is right now for object detection and analytics. Another area that is really challenging as far as geospatial intelligence goes is that to run AI across vast areas, which is what people get excited about, it takes a lot of processing and imagery. High resolution geospatial data can be really expensive to acquire and analyze, especially across massive areas. Running 30cm data across 5000 sq. km sounds like a good idea until you run the data cost, so that’s something else we need to educate people about. I think the cost will come down as we have more and more satellites in orbit.
Is there anything more that you would like to add?
The only thing that I would like to add is that this is an exciting time for geospatial intelligence and AI professionals. Geospatial is such a natural fit for AI and for synthetic data. I think in ten years, this field is going to look extraordinarily different based on the additional imagery and the development of AI.
Source: geospatialworld.net