“2022 seems to be a turning point for computer vision and NLP in many respects”, says Data AI expert Robert Meisner. Meisner is Lead Product Owner at ERGO Technology & Services and responsible for the AI Factory. As he is also a lover of good cinema, the developer has taken a look at what impact the latest achievements in the field of artificial intelligence will have on the future of the film industry.
2022 seems to be a turning point for computer vision and NLP in many respects. It is not without reason that a large part of this year's ERGO Tech Trend Radar has been devoted to trends and solutions in the area of Data & AI. According to Stanford's AI Global Index in 2021, the highest AI investment globally came through private investment (totalling around $93.5 billion), followed by mergers and acquisitions (around $72 billion), public offerings (around $9.5 billion), and minority stakes (around $1.3 billion). Private investments more than doubled compared to 2020; this was the most significant year-on-year increase since 2014.
Such dynamics in investments allow companies and scientific institutions to undertake research and development projects that are riskier and bolder. The area I watch closely is computer vision and natural language processing (NLP). It is worth noting that this is also an area of enormous interest for ERGO. With the help of the AI Factory platform, ERGO's data scientists today are building models of better quality than those available on the market.
If you want to find out more about AI Factory and ERGO's AI use cases, select the “Data & AI” section in the Tech Trend Radar application and read about the following trends and innovations:
Taking advantage of the holiday period, let me discuss a topic not directly related to our work 😀 As a lover of good cinema, let me envision the impact of the latest achievements in the field of AI on the future of the film industry.
The first viral project that everyone should know about is DALL·E 2. DALL·E 2 has been developed by OpenAI to generate digital images from natural language descriptions. DALL·E 2 is trained on hundreds of millions of captioned images from the internet. What is mindblowing is that it can create original, realistic images and art from a text description. What’s more, it can even combine concepts, attributes and styles. I am the proud owner of a Border collie mix dog and a Russian blue cat, so I decided to use them for “my” artwork. While my first experiments with a cosmic nebula style were unsuccessful, the next ones surpassed my expectations (despite some errors).
You can see more examples of images generated by DALL·E 2 here:
Does this mean the end of art as we know it? Certainly, DALL·E 2 is a technology that may be considered disruptive in the art world. Another one that, in my opinion, gives even more photorealistic results is Imagen by Google.
Interestingly, as of July 2022, access to both models is restricted to pre-selected beta users primarily due to ethical and safety concerns. I won’t go into the issues here because this topic deserves a separate post.
So far, generating 3D scenes has proved trickier due to the sheer computing power required. But here, too, we are seeing some progress, if not a breakthrough: this year at Stanford University in California, Eric Ryan Chan and his colleagues created the EG3D computer model. It uses a machine learning algorithm called a generative adversarial network (GAN) to generate faces in high resolution together with an underlying geometric structure.
Tools like this could eventually help CGI artists or software developers working on game content.
What else do you need besides beautiful scenes to create a memorable movie? Of course, each film must have a script and a good plot. Can AI write an exciting and believable script? Yes, it can. And this statement is so 2019 because that's when GPT-3 (or Generative Pre-trained Transformer) was built by Silicon Valley's OpenAI. Since then, people with GPT-3 have written many novels, and some have even been published.
To name a few:
AI-generated stories can be engaging and surprising! Based on the GPT model, the first version of AI Dungeon (sometimes called AI Dungeon Classic) was designed and created by Nick Walton of Brigham Young University's “Perception, Control, and Cognition” deep learning laboratory in March 2019 during a hackathon. It is a text-adventure game that uses AI to create a unique storyline in response to player decisions and actions.
Are there other models like GPT-3? Yes: Gopher, Chinchilla, PaLM, and most recently, BLOOM (arguably the podium of large language models), to name a few.
So now, with the help of AI, we can prepare a film script and generate its storyboard. However, it is not enough to create an Oscar-worthy concept, as we will need models that can animate whole scenes and combine them into a logical sequence. And even when we have these technologies in our hands, we will have to deal with a new set of problems, such as analysing, mitigating, and dealing with harmful algorithmic bias.
So what does the future of cinema look like for me? Film production costs will become drastically lower. The film industry will become available to filmmakers without huge budgets or the film studios behind them. I imagine a director, screenwriter, cinematographer, and even an actor (see DeepFake) all in one – and it could be you.
With a short description of the plot, you can generate a good quality script with the help of AI. Based on that, you will generate individual scenes. Again, you will be able to fine-tune and detail each scene with the use of AI. It will take weeks, not months or years, to produce a professional film. The cinema will become interactive, where no story told on the screen will be the same.
Even though the technologies mentioned above are not enough to generate a full-length movie, they are a good start and a fantastic inspiration for data scientists, data engineers, machine learning engineers, AI scholars and academics. With their help, in the years to come, we will be surprised again with innovations that we haven’t even dreamed of today.
Text: Robert Meisner, Lead Product Owner at ERGO Technology & Services