Is ChatGPT the solution to everything?

    Published: May 12, 2026

    Key Points in Brief

    Actually, nothing has changed: with large amounts of data and enormous computing power, deep neural networks (deep learning) demonstrate impressive capabilities. They can seemingly understand images, audio, video, and text, or several simultaneously in a multimodal fashion, with what appears to be superhuman ability, and can create original works. ChatGPT in particular has generated great enthusiasm around the world in a very short time. For many people, this chatbot is currently the epitome of artificial intelligence.  

     

    Is ChatGPT the solution for everything? We would like to provide an overview to simplify the introduction to the topic and to support considerations and discussions.

    History and relevant components of ChatGPT

    The technological basis of ChatGPT is Large Language Models (LLM). These predict the probability of word sequences, sentences, etc.

    LLMs can not only support basic activities in Natural Language Processing (NLP) (e.g. concept recognition, negation recognition, relation recognition), but also perform more complex activities such as:


    1. Classification of a text (e.g. the author's emotional tone)

    2. Translation of a text (e.g. from one language to another)

    3. Completion of a text (e.g. to answer a question)

    LLM can thus have similarities with pre-processing pipelines, databases, recommender systems, virtual agents, etc. - with the impressive extra that they can be intuitively operated by speech.

    LLMs are a further development ofartificial neural networks. Artificial neural networks were invented back in 1943 and are motivated by the structure of the human brain. An artificial neuron has any number of numerical inputs, calculates a mathematical function and delivers a numerical output. The mathematical function is described by weights (parameters), a weighted sum of the inputs and a linear activation function (Russel & Norvig, 2002). See also Figure 1.

    ChatGPT – Darstellung eines Neurons

     

    Neurons can perform calculations between input and output, i.e. solve problems (e.g. logic A and B / A or B). The problem-solving ability of such neural networks is limited. For example, the problem "A and B" and the problem "A or B" can be solved, but not the problem "A exclusive-or B".

    This is why deep neural networks are usually used; they consist of:

    1.  Many (hidden) layers of interconnected neurons (hidden layers)

    2. Non-linear activation functions

    3. Training procedures for learning the weights

    In the simplest case, the neurons in a lower layer are fed forward and fully connected to the neurons in the layer above (feedforward layer).

    A training procedure assigns initial weights, compares the calculated output with the desired output on sample data and then adjusts weights depending on their influence (Rumelhart et al. 1986).

    Traditional NLP applications used words in a very "one-dimensional" way. A word was either present in a sentence or text, "1", or not, "0".

    With deep learning, words can be represented as numerical vectors (embeddings) by many dimensions that describe their semantics. In such spaces of vectors, words with similar meanings lie close together. In addition, relative properties are also obtained, e.g. if you subtract the vector for "woman" from the vector for "queen" and add the vector for "man", you obtain the vector for "king" (Mikolov et al. 2013). See also Figure 2.

    ChatGPT – Wortvektoren

     

    Not only words, but also letters or parts of words can be represented as vectors (tokens). Special algorithms divide texts accordingly (tokenizer). Word sequences, sentences, documents and any text can also be represented as a set of vectors (encoder). The probability of subsequent texts can then be calculated (decoder). The context of a word (previous and subsequent words) can be taken into account (Attention). The architecture of the so-called transformer is particularly efficient due to its parallel computability (Vaswani et al., 2017).

    Language models that can calculate the probability of texts can thus be trained directly on any collection of texts (non-supervised, because not specifically manually annotated). Training is carried out, for example, from right to left (Radford et al., 2018) or bidirectionally (Devlin et al., 2018). Input words are converted into word vectors and compared with other relevant words before/after the respective word in a deep neural network consisting in particular of so-called attention layers. A new representation is then obtained in each case, which can now be used further depending on the application, e.g. to translate a word into another language or to predict the next word in a generated text. See also Figure 3.

    ChatGPT – Transformer Baustein

     

    For example, deep learning has shown impressive results in text comprehension tasks, similar to those previously shown on images (Ruder, 2018).

    These language models can then be used iteratively to create new texts by iteratively asking the language model for the most probable word (generative artificial intelligence).

    In addition, these language models are an effective basis (pretraining) for transferring learned semantics to other areas (transfer learning) and improving specific skills through targeted training - and thus subsequent further refinement of the parameters (fine-tuning). This targeted learning usually takes place with manually annotated data (supervised). The more text data (= amount of data) and the more careful the training (= computing power), the more pronounced these properties become (scaling). For example, LLM can perform NLP activities with hardly any contextual information - just a few preceding words (prompts) (Brown et al., 2020).

    LLMs such as GPT-2 / 3.5 / 4 (OpenAI), LLaMA (Meta), BERT (Google), Alpaca (Stanford) are more or less openly trained, exchanged and refined by various institutions and used as a basis for further refinements in various NLP applications (Foundation Language Models).

    The open language model LLaMA was trained in its largest version with 65 billion parameters (weights) on 1.4 trillion word parts with 2048 A100 GPUs from NVIDIA, each with 80 GB of graphics memory, in 21 days. Meta estimates that the development and training cost a total of 2.638 MWh of energy.

    Many LLMs, such as in ChatGPT, are also trained tounderstand instructions and conversations (instruction)with considerable manual effort.

    This is done in two ways: Firstly, by manually creating pairs of input (request) and desired output (response) for supervised training. Secondly, through another language model that evaluates (rewards) responses to requests. This reward model again requires manually created pairs of input (request+response) and evaluation (or ranking in the case of several possible responses). The second training can then be carried out automatically by randomly selecting queries, evaluating them by the reward model and using them to further train the large language model (RLHF, Reinforcement Learning from Human Feedback).

     

    The perfect solution for you

    We look forward to a no-obligation conversation and would be happy to work with you to determine which product offers you the greatest added value. Let us together make better decisions, faster.

    Strengths and weaknesses of LLM

    Only with time will you get a comprehensive picture of LLM's strengths and weaknesses.

    Some strengths are already clear at this point, such as: 

    1. Intuitive operation through purely textual input (in any language). Language models have revolutionized the way we communicate with machines.
    2. Impressive ability to automatically understand the meaning of texts (at least seemingly in the sense of a "Turing test" when imitating human intelligence).
    3. Impressive ability to generate coherent texts on a given topic that are also (most of the time) factually accurate, meaningful, and comprehensible to users.

    Equally, some weaknesses are already clear: 

    1. High effort required to store targeted information. The properties of a database or knowledge base are lacking for this purpose: a formal query language, a formal output format, and deterministic outputs (repeated identical inputs may produce different outputs, as texts are interpreted and generated based on probabilities). This task will continue to be reserved for dedicated systems such as databases, search engines, data warehouses, and knowledge bases.
    2. Unreliable or even harmful outputs. Worse than varying outputs, incorrect, biased, or otherwise undesirable responses are possible when faced with unforeseen or atypical inputs. Control mechanisms, ongoing testing, and similar measures are necessary.
    3. A limited context that is taken into account during interpretation and generation. For example, GPT-3 can consider a context window of 2,048 word fragments. The longer the context, the more it may be necessary to organize iterative calls to large language models through higher-level logic.
    4. Like other machine learning methods, LLMs are difficult to interpret. Explainability, as found in logical reasoning, and corresponding causal inference are hard to capture in the required probabilities. Other technologies, such as knowledge graphs, are better suited for this purpose and can be used in a complementary manner.


    Use cases with (and without) added value

    Here too, it will only be possible to evaluate the benefits of large language models in specific use cases over time.

    Some use cases with potential for LLM:

    Basically any NLP application can benefit from LLM, for example chatbots, voicebots and virtual agents. Virtual agents often consist of separate components for natural language processing (NLP), dialog management, knowledge base and natural language generation (NLG). Large language models, especially when trained on specific tasks, interactions or conversations, promise simple architectures, all-encompassing text understanding and intuitive interactions with humans.

    It remains to be seen how well this approach will prove itself in applications:

    Can explicit information be extracted from text data (possibly even combined with images, audio, video) with higher quality, with a higher level of detail, with less effort, and used for automation?

    Can hidden information, including knowledge, be recognized earlier and used with added value?

    Can activities be carried out faster, more reliably, more comprehensibly, etc. in the interaction between man and machine? Perhaps even interdisciplinary, across workflows or processes, and with existing media breaks (from voice, to text, to video, to voice again).

    There is enough under-considered text data, necessary knowledge and suboptimal workflows and processes in industry, public authorities, medicine, science and much more.

    As an example, some questions from medical research and development projects at Empolis:

    Medical information for examinations and treatments takes up a lot of time and resources, and often too many patients' questions remain unanswered.

    In diagnostic disciplines such as radiology, diagnostic reports are either written or dictated in free text or questionnaires are filled out. Depending on user preference, more structured information or coherent texts are required to support the fast and error-free exchange of information between medical staff.

    Many decisions on aftercare options such as rehab, home care, etc. are made too late, delaying discharges and the optimal further treatment of patients.

     

    Important considerations when applying LLMs are:

    1. An adequate amount of text data (in multiple languages if necessary)
    2. Experts/users for annotating data
    3. Access to appropriate computing power
    4. Operational processes for testing, deployments, updates, etc.

    If these conditions are not met, the sustainable added value of LLM is likely to be low.

    It needs to be evaluated in more detail how well general LLM can be used directly and where its limits lie. These limits can be shifted through refinements and targeted training. This is where the trade-off between cost and benefit is relevant.

    Promising combinations of LLM with other technologies

    Access (on-premise or cloud) and refinement of LLM is facilitated by various providers (e.g. Hugging Face, SpaCy), platforms and the open source commitment of many people (Patel et al., 2023).

    When it comes to using LLM in your own applications, this can become complex. The offer is almost unmanageable. Depending on the provider, customer data (whether for training or usage) is in turn used to further develop the language models. Due to the effort involved in storing information, it must be carefully evaluated whether the effort is worthwhile. Quality optimization, tests, etc. are essential and must be carefully planned and implemented.

    ChatGPT is another impressive example of the power of deep learning in natural language processing (NLP). Are ChatGPT and other LLMs the solution to everything? No, because the starting point for the application of deep learning has not changed: Large amounts of data, ideally elaborately curated by many people, and great computing power are required for impressive results.

    LLM then enables far more intelligent systems. However, not in isolation, but in combination with other systems: from databases, knowledge bases and test procedures to knowledge graphs.

     

    Acknowledgements

    Parts of this work were supported by the BMWi in the project MyReportCheck (FKZ: ZF4513702TS8) and by the BMBF in the projects DIAMED (FKZ: 16SV8322), KIPA (FKZ: 16SV9045), and KIAFlex (FKZ: 16SV9004). Many thanks to Marian Thull, Isabel Schmittel, Máté Maros, Ralph Traphöner, Krzysztof Janiszewski, and Nico Hermeling for their valuable feedback on a draft of the blog article.

    References

    Russell, S. and P. Norvig. 2002. Artificial Intelligence: A Modern Approach, 2nd edition. Prentice Hall.

    Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013 – Workshop Track Proceedings, 1–12.

    Vaswani, Ashish, et al. „Attention is all you need.“ Advances in neural information processing systems 30 (2017).

    Radford, Alec, et al. „Improving language understanding by generative pre-training.“ (2018).

    Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. „Learning representations by back-propagating errors.“ nature 323.6088 (1986): 533-536.

    Devlin, Jacob, et al. „Bert: Pre-training of deep bidirectional transformers for language understanding.“ arXiv preprint arXiv:1810.04805 (2018).

    Brown, Tom, et al. „Language models are few-shot learners.“ Advances in neural information processing systems 33 (2020): 1877-1901.

     

    Empolis