Technology In Films: A Look Through The Ages

Prof P. J. Narayanan, Director, IIITH walks  through the technological developments that have taken place over the last few decades in the movie making process while giving us a glimpse of what is yet to come.  

author-image
CIOL Bureau
New Update
Tech in Films
We’ve witnessed a digital transformation in almost every aspect of life, like the way digital transactions in the monetary world have replaced cash, the filmmaking world is no different. There too, we moved from the analog tothe digital world. One no longer carries boxes of reels to theatres for screening; content is unlocked perhaps by a password and streamed via the click of a button. In this way, movies have been transformed albeit with the ultimate goal of entertaining people. From the first commercial movie created by the Lumiere brothers in 1895 to the first entirely computer-animated film, The Toy Story in 1995 (a good 100 years later), we now have a whole spectrum of things such as live action, VFX, CGI, some digital content mixed with actual footage and so on.
Today’s movies are way more sophisticated in terms of graphics and other effects than what was possible in 1995. Interestingly, when these films started coming, the technologists thought that they would make all the movies and that there was no need for directors and other creative folks. However, in the end movies are all about the stories and the creativity involved and not about the graphics or the visual effects. While the latter may attract some people for some time, it’s essential to attract all the people all of the time. So fundamentally movies are about the stories they tell. Good visuals of course enhance the impact while bad visuals can detract from the impact.

Plot Matters

If you look at Pixar, Disney, DreamWorks Inc, and others, not every movie they made became fully digital. They instead went for movies with strong and unusual storylines that tell you about yourself in ways that you cannot ina real movie. Take the movie Inside Out for example. It’s so beautiful where you analyse and visualise emotions. You can’t make that movie with real actors and a real camera. Avatar and Avatar 2 tried to move such movies into the human world but it was still very much a make-believe world because it’s difficult to make such movies with people who look like you. There is a big technological gap between the technology that computer graphics uses to generate these visuals and what can be shot with a good camera. That’s one reason why 3D movies come back every few years because you can attract few more people to the theatres with a novelty factor. But if the storyline is bad or not gripping enough, the movie is not going to succeed just because there is this 3D stereoscopy.

Computer Graphics vs. Computer Vision

As far as technologies go, we have computer vision and computer graphics related to movies. The two are sort of complimentary to each other. Computer vision is about taking a photograph of this room and telling how many people are in the room, what they are doing, what kind of clothes they are wearing – the kind of questions that humans would have about a scene. To answer these questions, computer vision needs to reduce the level of detail to an extent. While facial hair and wrinkles are real, computer vision would like to wish away those things because fine details can hinder recognition. Computer graphics on the other hand has a reverse problem. It generates realistic images of a virtual world represented in a computer. It has had 50-60 years of a long history. Those of you familiar with computer games will know that games have very good graphics but lack the realism that a camera can capture. Broad details and a cartoonish feel is very easy to generate. The fine details require a lot of work.
A while ago, I was in conversation with Edwin Catmul, the former President of Pixar for ACM India about the process of making movies in Pixar. He said it’s very tedious; movies like Soul and Elio are more expensive to create and produce today and take more time rather than less. The Creative Directors need to get the details just right and getting it right takes a lot of effort, manual tweaking of the graphic system. There are open source low-end tools but at the same time there are sophisticated ones that can manipulate and create the required effects. But in the end, the process remains tedious.

Image-based Modelling

How does one automate the process of graphics creation? The computer vision folks were already in the game a long time ago. It began with trying to create a 3D model out of an image of a real object, say a toy or an image ofa room. Back in 1995, I was part of a project where we created a studio by mounting 51 cameras on a dome. These 51 B&W regular industry-grade cameras were connected to 51 VCRs. One of the hardest problems was using the remote in such a way that all the cameras would turn on or off. Each tape had to then be taken out and digitised frame by frame and then perform the actual computer vision processing. We spent 2-3 years to build something that had not been done before – crude 3D models generated out of the computer vision system of 1995 where possibilities were endless. This was done for 3 person basketball. But the original players could be placed in any other setting, the basketball match could be watched from any seat in the stadium, from any point of view and so on.
There were 7-8 research groups in the world that started setting up similar studios. Cameras developed better algorithms and researchers got better ideas too. In 2016, Microsoft Research unveiled Fusion 4D which made possible real time capture of challenging scenes and their playback. Twenty years later, it is possible to fuse characters from two different virtual worlds. And this sort of a multi-view Kinect capture setup has been done inexpensively in one of our own labs. Kinect was a motion-sensing input device released by Microsoft in 2010. The devices generally contained RGB cameras, and infrared projectors and detectors that could map depth through either structured light or time of flight calculations, which was in turn used to perform real-time gesture recognition and body skeletal detection, among other capabilities. They also contained microphones that could be used for speech recognition and voice control.

Computer Vision In Recent Years

In 2023, a notable conversation between Mark Zuckerberg and Lex Fridman took place in the Metaverse where they both appeared as photorealistic avatars. They were miles apart but appeared to be talking in-person. It required each of them to sit in their studio to get a full body scan. This way, the nuances of how their faces reacted to light and other nitty-gritties were captured and processed accordingly. Now, capturing in 3D became a hobby of computer vision people even if the quality was not professional. Soon there was a new method called the NeRF or a neural radiance field. It’s a neural network that can reconstruct complex 3D scenes from a partial set of 2D images. This year’s edition of CVPR saw the Best Paper Award going to a similar concept – the Visual Geometry Grounded Transformer -that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views.
For researchers in computer vision, the focus is on getting the quality better and the processing time down to a few seconds. While there is a great deal of potential for everyday common use, it is unlikely to be accepted by film professionals soon. It is good for quick mockups or visualisation but not production quality. When we were doing virtualised reality, the dream was to offer control of the camera to the viewer. For instance, it is easy to understand how to watch a basketball game from any seat in the stadium, but can you do something creatively like that in the movies? Today, of course GenAI is stealing the show. I asked it to create an image of a professor in his 60s, of Indian descent on the podium in IIITH delivering a talk on ‘Technology in cinema’ and it did! Similarly you can use it to createshort creative and realistic clips from mere text prompts for fun. All of these are good but if you want to make a movie where the storyline has to be consistent with people looking the same, it has a long way to go.

AI-Driven Dubbing

Based on the research out of our own lab (the Centre for Visual Information Technology) at IIITH, we have a startup now – Sync Labs by Prof. CV Jawahar’s students that can generate high quality speech from only lip movements. I gave them a picture of myself and they found a short recent video clip of me online. From that they created a video that has me speaking in different languages – Hindi, Malayalam and English. This is just to show that there are several aspects to film making. Firstly, it is just like an enterprise and a lot of technology can be used to make the film making process faster. Large digital assets are created for each movie and there is a lot of footage too. How can we manage them efficiently? While these questions may not involve image or sound generation, they are very much a part of the process and probably far more tractable in a predictable way from the academic side.
 
Modern technology including AI has a big impact on films. While a lot of impressive developments have taken place in the last 2-3 decades, the future has more in store surely. Every new tool will enhance the movie making process. The standards go up and the bar shifts higher with every enhancement. However these tools are best looked at as tools to enhance productivity and to channel creativity differently. They should not be viewed as tools to replace people because AI is unlikely to create its own movies that will change the industry. Story and creative expression of that story will remain at the heart of the movie making process. AI and any other technology can only enhance it in a way that we all like.  
digital-transformation