Healthcare analytics in medicine
Published: March 13, 2026
Every day, millions of medical data records are generated worldwide, for example in the form of findings or doctor's letters. Each report is read and processed by up to 32 people. Once the treatment has been completed, the data record ends up in an archive, which even in 2018 often only exists in the form of paper files. Digital data becomes analog files or, at best, inhospitable and unstructured mountains of data. When this information is archived, it disappears from view and is at best only retrieved and analyzed by medical students as part of a dissertation.
The question inevitably arises as to what else could be done with this data. What information is hidden in this treasure trove of data and what insights and possibly instructions for action could be derived from it? How can diagnosis and treatment be improved? What framework conditions does data protection set?
Data analyses from "small" to "medium" to "large"
If you take a small approach to this task, you would probably analyze a single finding or doctor's letter. Any additional insights gained would have an impact on the treatment of the patient in question. However, it is quite unlikely that new hidden findings can be found in individual doctor's letters or findings. It is highly likely that the doctor has named everything that needs to be named. If he has forgotten to note something or overlooked something, even the best machine cannot deduce this. However, it is possible to support the doctor in the preparation of findings and provide assistance if he or she inadvertently omits certain facts that are usually mentioned.
If one broadens the horizon, i.e. thinks on a medium scale, and places the individual findings in relation to the patient's medical history, new findings may well emerge. With certain prior knowledge from the patient's medical history, the treatment derived from a finding may appear completely different. As we will see later, it is unfortunately not so easy to record, let alone analyze, a patient's medical history without gaps, even with the best will in the world.
If you look at the medical data with a view to the big picture, e.g. all the radiological findings of cranial CT scans in a clinic over time, you can retrospectively compare the reasons for examinations, i.e. the indication, with the assessment of the examination results. It may then be possible to recognize that certain indications led to the vast majority of examinations not being required. The analysis of large amounts of data, big data, over time also allows, among other things, an understanding of the information without direct knowledge of personal data. In this case, anonymizing findings or doctor's letters does not usually stand in the way of gaining knowledge.
The disease as a case
In everyday medical practice, examinations are viewed as individual, self-contained cases. Even the medical history of a single patient is usually not available in a single file, but is spread across specialists and hospitals. Now you might think that, at least in theory, it should be possible to bring this data together using a health insurance ID. However, if a person with statutory health insurance receives private treatment or is admitted to an emergency room as part of an accident at work, cases of a completely different nature arise. In the first case, health insurance plays no role; in the second case, the employers' liability insurance association is the primary point of contact.
Hospital information systems (HIS) sometimes allow data records on a patient to be merged, but only at the explicit request of the user. Often enough, there is no time at all in an emergency department to check whether the patient already exists in the HIS and whether the new case can be assigned to this existing patient or whether a new case can be created. From conversations with doctors, we know that new cases are usually created and that merging them into one patient does not take place, or only very rarely.
Since the billing of inpatient and, in some cases, daypatient treatment services has been covered by flat rates per case according to the G-DRG (German Diagnosis Related Groups) classification system since 2004, it is often not even necessary for hospital operators to merge the individual cases of a patient. The focus is only on the current case, the history is - from a business point of view - of secondary importance.
The road to an electronic patient file (ePA) is still a long one. In March 2017, a study by Bitkom and the Bavarian TelemedAllianz found that 66% of Germans would use an ePA. Six months later, an AOK survey of people with statutory health insurance revealed that as many as 78% of respondents would use an ePA and 77% of respondents would ideally also like to decide for themselves which doctors have access to the data in their ePA. However, the reality is still different. In a recent survey, Bitkom found that 47% of doctors still use pen and paper for their correspondence (e.g. doctor's letters) and 34% even keep a patient file in paper form.
According to the E-Health Act passed in 2015, the electronic patient file has been named as a central component of the telematics infrastructure. From 2019, the ePA should therefore be available to those with statutory health insurance. However, only doctors will then have access to it. It should be noted that the legislator has not yet provided any specific details on what cross-institutional patient files could look like. However, this uncertainty primarily means that a whole range of players in the healthcare sector are working on the ePA and developing their own solutions.
The realization remains that it is currently difficult or even impossible to analyze the medical history of individual insured persons as a whole and thus provide decision support for new cases concerning the same patient.
Think big - the retrospective view of data
As we have seen above, the targeted processing of patient data (e.g. in the form of an ePA) has been virtually impossible to date. It is not yet possible to predict what the framework conditions will be like in a few years' time. As a result, the anonymous processing of large volumes of data is currently an option.
The retrospective analysis of data is a promising way of deriving recommendations for future action from past findings. With the help of the Empolis Healthcare Analytics Cloud and its built-in Natural Language Processing (NLP), it is possible, for example, to analyze a hospital's radiological findings from recent years en bloc and, based on this, to carry out statistical evaluations with the help of suitable tools. This then makes it possible to answer specific questions, such as: "How has the range of indications changed?", "Are there certain frequencies at certain times of the year?", "Have the assessments for certain indications changed over time?", "What conclusions and recommendations for action can be derived from this?"
Medical texts are annotated with specialist terms in the HealthcareAnalytics Cloud. RadLex, which is a controlled terminology for radiology, is used for this purpose. It standardizes and supplements other standards and lexicons. It also covers additional information such as gender, the presentation of a preliminary examination or the administration of contrast media.
Think Small - support for quality assurance and reporting
Although the really big approaches, such as an ePA, are still a long way off, this does not mean that it is not possible to use NLP in everyday medical practice. If, for example, the further processing of medical texts is to be simplified instead of a retrospective view, it is possible to index individual findings, laboratory reports, doctor's letters etc. in real time while they are being written using the Healthcare Analytics Cloud. Texts enriched in this way can be processed more easily and in a more targeted manner and, above all, can be found again if required. Below are two use cases that are thus possible.
RadLex annotation for quality assurance
Suitable annotation of findings using RadLex can be used to establish prospective quality assurance through completeness checks. If the system the doctor is working with executes rules on the facts recognized in the text, it can be checked, for example, whether a certain other fact was also mentioned when a certain symptom was mentioned, for example: "The patient complains of shortness of breath. Were the lungs examined for shadows?", "A preliminary examination was mentioned, but no date was given" or "Contrast medium was given, but the type of examination is CT instead of CTA."
ICD and OPS coding for billing support
Another way of using NLP is to determine possible ICD or OPS codes when creating a document in order to facilitate subsequent processing and billing. This is interesting against the background of the G-DRG, as this in turn is based on ICD-10-GM and OPS. The allocation of a case-based fee according to G-DRG to a case is based on the findings and case data created during the course of treatment. Diagnoses and procedures are coded by specialists using the ICD-10-GM and OPS medical classifications and submitted for billing. By supporting coding with intelligent systems, coding specialists are relieved, standard cases are processed and submitted for billing more quickly and the billing of complicated cases is made easier. The Healthcare Analytics Cloud acts as a kind of first reader that never gets tired and works at a consistently high quality.
Data protection: important but not an obstacle
Medical data is usually also personal data. It is therefore not surprising that the discussion surrounding the processing of medical data is strongly characterized by calls for high data protection standards. In conjunction with the GDPR, this demand, which is valid in itself, is causing uncertainty on the part of data producers. What is allowed? What must be taken into account? Which data may be processed where and how?
The analysis results of the Empolis Healthcare Analytics Cloud are always anonymous as a matter of principle, even if the original data was not necessarily anonymous. The extracted data are general medical facts that are never unique to a patient, either individually or in combination, and are therefore not identifiable. It is therefore not possible to draw conclusions about individual persons.
Basically, one can say in general terms: Yes, it is possible to have medical and therefore personal data processed by service providers. Against the backdrop of the GDPR and any special features under national data protection law, there are a few points to bear in mind, which we will discuss in more detail in a later article.
Conclusion
Immense amounts of medical data are produced worldwide every day, which in most cases are used purely for the treatment of the respective patient. Further use for quality assurance, research, teaching and improving patient care does not or only very rarely take place, especially as it is difficult to integrate the different data sources. The electronic patient record is intended to partially solve these problems, but is not yet in sight, at least here in Europe, with the exception of a few isolated solutions as a cooperation between hospitals and health insurance companies. Nevertheless, the existing and new medical data produced every day is a valuable treasure that can be made accessible. So let's start small and establish NLP solutions on your data, for example to improve quality, facilitate reporting and billing or make recommendations for the future.
Using Natural Language Processing in the Empolis Healthcare Analytics Cloud, we can offer you decision-making support, whether prospectively when processing current case data or retrospectively when analyzing your data archive.
Although data protection plays a major role in this context, contrary to popular opinion, it is not a reason for excluding the processing of medical texts on site or in the cloud. Data protection can be fully taken into account by means of suitable contracts (data transfer agreement or order processing) or support with the anonymization of findings.
The Perfect Solution for you
We look forward to a non-binding consultation and will be happy to work with you to determine which product provides the greatest value for your needs. Let’s make better decisions together, faster.