When OpenAI unveiled the large-language model ChatGPT in November 2022, little did anyone suspect the enormous impact it would have: Never before has the term "game changer" been more justified for a technological development. We have long known that it did not remain a hype. But there are still problems that generative AI like ChatGPT & Co. should urgently solve.
In the past nine months, the number of AI tools has exploded. ChatGPT is not only extraordinarily successful itself, but has also acted as a door-opener for a new range of products, although many AI tools already existed before. The number of websites with an .ai top-level domain alone has almost doubled since ChatGPT was launched - and the trend is rising sharply.
Many AI tools have already gone through several stages of development and have become more versatile as a result. ChatGPT, for example, has made a big leap with the upgrade of the language model to GPT-4 (Generative Pre-Trained Transformer 4). While GPT-3 was trained with 175 billion parameters, its successor already has 100 trillion.
In addition, users of the paid version of ChatGPT-Plus have been able to use numerous plug-ins for some time to expand the range of functions according to their own needs. At the latest, this has eliminated what was probably the biggest point of criticism from the launch phase, because ChatGPT also has access to current information from the web via these plug-ins.
There has been no really loud criticism since then. Nevertheless, there are some points of criticism that we should reflect on:
AI applications have also achieved a breakthrough because they run on special mainframe computers. In contrast to conventional mainframes, these primarily use graphical processing units (GPUs) instead of the classic central processing units (CPUs). The advantage is easily explained: a GPU, which is used in every conventional computer, can execute a large number of processes almost in parallel. So, in layman's terms, they are multitasking talents.
If AI applications were to run on CPU-based mainframes instead, the response behaviour would hardly be adequate. Yet it is precisely the real-time responses that account for a large part of the enthusiasm for AI tools. Even more energy-intensive, however, is the training of LLMs. There are no official figures on the energy consumption of the GPT-4 model, but estimates put it at more than 1,200 megawatt hours. To put this in perspective, this could supply about 120 households for a year.
Even with a high share of renewable energy, the carbon footprint is likely to be significant. The data centres are located in the USA and are partly powered by renewable energies. However, their share of total electricity generation in the US is only 21.5 per cent (as of 2022). In addition, the many GPUs have a negative impact on the CO2 balance, as their production is energy-intensive.
The high energy consumption brings with it another problem:
Assuming that the worldwide conversion of electricity generation to renewable energies continues, the CO2 problem should ease in the next few years. However, this does not apply to a consequential problem: high computing power not only consumes a lot of electricity, but also a lot of water for cooling the servers.
Scientists at the Universities of California Riverside and Texas Arlington have calculated in a study that an average chat with ChatGPT, in which between 25 and 50 questions are answered, causes half a litre of water to evaporate. Admittedly, that doesn't sound like much, but with an estimated 100+ million users and 1.6 trillion monthly page views, the cooling water consumption adds up enormously.
And here, too, the mass-used GPU chips play an important secondary role. Their production also consumes a lot of water, especially for cleaning the blanks. The largest chip manufacturer TSMC in Taiwan alone uses 150,000 cubic metres of water every day. To put that in perspective, you could fill 750 standard swimming pools with that.
In contrast to the energy problem, the high water consumption will become even more problematic in the future. Climate change will have a significant impact on water as a resource. Already today, we are confronted with water shortages and rising prices in many regions. A more sustainable use of water therefore seems inevitable.
A completely different problem is the quality of the output content. With the update from GPT-3.5 to GPT-4.0, there was initially a noticeable jump in quality. Since the GPT are designed to continuously learn after basic live training, changes over time are literally inevitable. However, one could assume that the quality will always improve.
In the case of ChatGPT, however, this assumption cannot be confirmed. Scientists at Stanford University and the University of California, Berkeley, have shown in a study that both GPT-3.5 and GPT-4.0 have changed considerably in a relatively short time after their release. In some areas, both models even performed significantly worse than when they were introduced.
Particularly striking are, among others, the mathematical test questions for which there are clear answers. While GPT-4.0 was still able to correctly determine prime numbers with a rate of 84 per cent in March 2023, the rate dropped to only 51.1 per cent in June. The output of the programme code also deteriorated massively in the same period. While more than 50 per cent of the code snippets could be executed at the beginning, the value subsequently fell to 10 per cent.
Experts had expected this phenomenon, known as "AI drift", but not at this speed and to this extent. The reasons for this are complex and need to be researched further. What is clear, however, is that the inclusion of user input represents a dilution of the training data. In addition, certain instructions could lead to a shift in the model's priorities. For example, the instruction "write for a conservative audience" could create a bias (distortion of reality).
The US scientists also urge continuous monitoring of LLMs because they expect another problem as early as the next generation: AI content itself becomes part of the database - if it contains biases and errors, the models could collapse qualitatively.
Despite all the enthusiasm about the impressive performance of ChatGPT & Co., we should use generative AI very consciously and intelligently. The enormous hunger for resources and the qualitative limitations make unrestricted use seem unwise, at least for the time being.
Text: Falk Hedemann