Here are listed the most important research articles. It is a hand-curated, chronological list of the most current AI and data science advancements, with a link to a more in-depth article.
Controllable Image Generation
Deep generative models make it possible to create high-resolution images that look like real photos. But this isn’t enough for many applications. Content creation also needs to be able to be controlled. Several recent works try to figure out how to separate the factors that cause differences in the data, but most of them only work in two dimensions and ignore that our world is three-dimensional.
The researchers show GIRAFFE, a new way to make images that we can control. Their main idea is to add a representation of how a 3D scene is put together to the generative model. The researchers can separate individual objects by showing scenes as compositional generative neural feature fields. Later figure out their shape and appearance without being told to do so.
Moreover, when you put this scene representation together with a neural rendering pipeline, you get a fast image synthesis model that looks real. Their tests showed that their model could separate individual objects, move, rotate, and change the camera’s position in the scene and split them.
Evaluating Large Language Models
Here, the researchers talk about Codex, a GPT language model using code from GitHub that was available to the public and look at how well it can write Python code. A separate production version of Codex runs GitHub Copilot.
In addition, the researchers looked into whether or not it was possible to train large language models to make code bodies that work correctly from natural language docstrings. By fine-tuning GPT on code from GitHub, their models did well on a set of problems written by people about as hard as easy interview questions. The researchers improve the performance of a model by training it on a distribution that is more like the evaluation set and by getting more samples from it.
Furthermore, the researchers also found that it was easy to teach a model to do the opposite task of making docstrings from code bodies and that these models had similar performance profiles. On HumanEval, a new test set created by researchers to measure how well synthesizing programs from docstrings works, their model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. When you look closely at their model, you can see that it has some issues, like it’s hard to describe long chains of operations in docstrings and it’s hard to link operations to variables. Lastly, the researchers discuss the possible effects of powerful code generation technologies on safety, security, and the economy.
Recognizing People in Photos
Recognizing people in user-generated content is challenging because the content is so different from one user to the next. People can be shown in any size, with any lighting, in any pose, and with any facial expression. We can take images with any camera. When someone wants to see all of their photos of a particular person, they need a complete knowledge graph that includes photos where the person is not posing.
This approach is especially true when taking pictures of moving scenes, like a child popping a bubble or people raising a glass to toast. Ensuring that the results are fair is another challenge and an essential requirement for automatic person recognition. In addition, the researchers want everyone, regardless of skin colour, age, or gender, to have the same experience.
This latest improvement, available in Photos on iOS 15, makes it much easier to recognize people. For example, using private, on-device machine learning, we can correctly identify people with extreme poses, accessories, or covered faces. Furthermore, we can also match people whose faces aren’t visible by checking their faces.
Source: indiaai.gov.in