Automatización de la codificación y selección de causas de muerte en Perú: estudio descriptivo, 2016-2019

Vargas-Herrera, Javier; Miki, Janet; Wong, Liliana López; Monzón, Jorge Miranda; Villanueva, Rodolfo; Vargas-Herrera, Javier; Miki, Janet; Wong, Liliana López; Monzón, Jorge Miranda; Villanueva, Rodolfo

doi:10.1590/s2237-96222023000300005.en

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Epidemiologia e Serviços de Saúde

versión impresa ISSN 1679-4974versión On-line ISSN 2237-9622

Epidemiol. Serv. Saúde vol.32 no.3 Brasília 2023 Epub 05-Sep-2023

http://dx.doi.org/10.1590/s2237-96222023000300005.en

RESEARCH NOTE

Automated coding and selection of causes of death in Peru: a descriptive study, 2016-2019

Automatización de la codificación y selección de causas de muerte en Perú: estudio descriptivo, 2016-2019

Javier Vargas-Herrera (orcid: 0000-0002-1910-602X)¹, conception and design , analysis and interpretation , drafting and reviewing , integrity, Janet Miki (orcid: 0000-0002-6894-8346)², Liliana López Wong (orcid: 0000-0001-5737-209X)³, Jorge Miranda Monzón (orcid: 0000-0003-1212-8223)³, Rodolfo Villanueva (orcid: 0000-0003-4945-2158)⁴

^¹Universidad Nacional Mayor de San Marcos, Unidad de Telesalud, Lima, Peru

^²Vital Strategies, Programa de Registro Civil y Estadísticas Vitales, Lima, Peru

^³Ministerio de Salud del Perú, Oficina General de Tecnologías de la Información, Lima, Peru

^⁴Universidad Alas Peruanas, Escuela de Ingeniería de Sistemas e Informática, Lima, Peru

Study contributions

Main results

It could be seen good performance of the software for the automatic selection of the underlying cause of death, increasing from 69.6% in 2016 to 78.8% in 2019. There was a correlation between this result and the use of online death certificates by physicians.

Implications for services

Automatic coding and selection of causes of death improve productivity and timeliness of information, contributing to the quality of the country’s information system.

Perspectives

It is necessary to analyze the agreement between the medical terms in the software dictionaries used in South American countries in order to improve standardization and comparability of information on causes of death.

Keywords: Causes of Death; Mortality Records; International Classification of Diseases; Health Information Systems; Information Technology; Descriptive Epidemiology

Abstract

Objective:

to describe software performance in the automatic selection of the underlying cause of death in Peru, between 2016 and 2019.

Methods:

this was a descriptive study on the software performance in the automated selection of the underlying cause of death over the years (chi-square test for trend) and the correlation between the type of death certificate and software performance (correlation coefficient and coefficient of determination).

Results:

a total of 446,217 death certificates were analyzed; the proportion of death certificates with the underlying cause of death increased from 69.6% in 2016 to 78.8% in 2019 (p-value < 0.001); it could be seen a direct linear correlation between electronic death certificates and software performance (correlation coefficient = 0.95; R² = 0.89).

Conclusion:

the software showed good performance in the automatic selection of the underlying cause of death, with a significant increase between 2016 and 2019.

Keywords: Causes of Death; Mortality Records; International Classification of Diseases; Health Information Systems; Information Technology; Descriptive Epidemiology

Resumen

Objetivo:

describir el desempeño de un software en la selección automática de la causa básica de muerte en Perú, entre 2016 y 2019.

Métodos:

estudio descriptivo de la tendencia del desempeño de un software para seleccionar la causa básica de muerte a través de los años (chi cuadrado de tendencia) y la correlación entre los certificados de defunción electrónicos y el desempeño del software (coeficientes de correlación y determinación).

Resultados:

se analizaron 446.217 certificados; la proporción de certificados con causa básica de muerte aumentó de 69,6% en 2016 a 78,8% en 2019 (p-valor < 0,001); se observó una correlación lineal directa entre certificados electrónicos y el desempeño del software (coeficiente de correlación = 0,95; R² = 0,89).

Conclusión:

el software presentó un buen desempeño en la selección de la causa básica de muerte y aumentó significativamente entre 2016 y 2019.

Palabras-clave: Causas de Muerte; Registros de Mortalidad; Clasificación Internacional de Enfermedades; Sistemas de Información en Salud; Tecnologías de la Información; Epidemiología Descriptiva

INTRODUCTION

Mortality information is useful for measuring the impact of health interventions.¹ The World Health Organization (WHO) has established a set of guidelines and rules for the coding and selection of the underlying cause of death, aimed at medical certification; however, there are concerns about the reliability of this coding.²^)-(⁵ Inter-coder agreement is higher in countries where specific coder engagement and retention policies have been implemented, reaching nearly 80%.²^),(⁶ Similarly, automated coding software achieves comparable performance.

Incorrect use of coding and definition of the underlying cause of death affects data quality and its comparability between countries. There is software for automating these notifications, which is increasingly being adopted by countries,⁷^)-(¹⁰ indicating a trend in the use of artificial intelligence in this process.¹¹ In Latin America, there are experiences of implementing such software, from Mexico to Brazil, Chile, Colombia and Peru.¹²

Specifically, in Peru, in 2017, the Sistema Informático Nacional de Defunciones (SINADEF) was implemented, enabling death certificates to be performed via electronic forms in real time, improving the quality and timeliness of notifications.¹³^),(¹⁴ The Peruvian Ministry of Health (MINSA) also decided to adopt the Iris software, developed by an international consortium led by the German Institute of Medical Information and Documentation, which uses an algorithm based on the rules of the International Statistical Classification of Diseases and Related Health Problems 10^th Revision (ICD-10), in order to automate the coding and selection of the underlying cause of death.¹⁵

The objective of this research note was to describe software performance in the automatic selection of the underlying cause of death in Peru between 2016 and 2019.

METHODS

Study design

This was a descriptive study on software performance in the automatic selection of the underlying cause of death in Peru, between 2016 and 2019. This performance was defined as the software’s ability to obtain the underlying cause of death

Setting

By 2016, all deaths were documented on paper-based death certificates, transcribed into a desktop software called Vital Events and submitted to MINSA as files. In 2017, the Web-based SINADEF was implemented. This system allows death certificates to be registered in two ways: either directly typed online by physicians; or transcribed from paper-based formats. In 2018, the Iris software was adopted, and since 2016, mortality databases have been processed using this application to determine the underlying causes of death. The Iris dictionary was adapted with 12,246 medical terms in natural language, using the causes of death directly filled in by doctors as a reference.

Participants

This study included deaths that occurred in Peru between 2016 and 2019.¹⁶ Undeclared deaths and those that were not available at the time of data processing using the software were excluded.

Variables

The variables investigated were as follows: processed death certificate (with underlying cause of death; without underlying cause of death); recorded medical terms (with ICD-10 code; without ICD-10 code); type of error on the death certificate rejected by the software (syntax; code; system); type of death certificate (paper-based format; online); and year of death (2016 to 2019).

Data sources and measurement

The data source was comprised of death certificate databases covering the period from 2016 to 2019, provided by the MINSA in spreadsheet format. The data were processed using Iris on the following dates: 2016 mortality database on 06/01/2018; 2017 database on 04/26/2019; 2018 database on 6/20/2020; and 2019 database on 6/22/2021.

Bias control

Mortality database records underwent quality control to remove any potential duplicate records or modify records with inconsistent data.

Statistical methods

The variables obtained after processing using Iris were presented in simple frequency distribution tables. The trend analysis was performed using the chi-square test for trend. The Iris performance index (number of death certificates with underlying cause of death divided by the total number of death certificates) and the Iris performance index in ICD-10 coding (number of medical terms with ICD-10 codes divided by the total number of medical terms) were considered dependent variables; and the independent variable was the year of death. Pearson’s correlation coefficient and the coefficient of determination (R²) were used to analyze the linear correlation between the type of death certificate and Iris performance. The significance level used was 5%. Microsoft Excel® 2016 software was used for the analyses.

Ethical aspects

The study was based on the analysis of variables included in the mortality databases of the MINSA, also available on the National Open Data Platform https://www.datosabiertos.gob.pe/, which do not contain information that would allow the identification of deceased individuals.

RESULTS

Between 2016 and 2019, a total of 446,217 deaths of residents in all regions of Peru, recorded in the MINSA mortality database, were analyzed. This amount corresponded to 67% of the estimated deaths for the study period. Deaths that were not registered on the mortality system at the time of processing were excluded (Figure 1).

Figure 1 Selection process of studied deaths, Peru, 2016-2019.

It could be seen a progressive increase in the software performance index, with the percentage of processed death certificates ranging from 69.6% in 2016 to 78.8% in 2019 (p-value < 0.001) (Table 1).

Table 1 Software performance in the selection of the underlying cause of death and death certificate processing (n = 446,217), Peru, 2016-2019

Software performance	2016		2017		2018		2019		p-value^a
Software performance	N	%	N	%	N	%	N	%	p-value^a
Processed death certificates									< 0.001
With underlying cause of death	67,697	69.6	86,848	71.8	79,979	74.8	95,405	78.8
Without underlying cause of death	29,542	30.4	34,174	28.2	26,926	25.2	24,653	21.2
Total	97,241	100.0	121,024	100.0	106,905	100.0	121,047	100.0
Processed medical terms									< 0.001
With ICD-10 codes	222,446	87.2	321,904	89.8	299,988	91.1	346,635	92.7
Without ICD-10 codes	32,641	12.8	36,623	10.2	29,401	8.9	27,297	7.3
Total	255,087	100.0	358,527	100.0	329,389	100.0	373,932	100.0
Death certificates rejected by type of error									< 0.001
Syntax error	-	-	744	2.3	545	2.1	582	2.4
Code error	26,819	90.8	30,337	94.0	24,616	94.9	23,130	93.8
System error	2,725	9.2	1,185	3.7	790	3.0	941	3.8
Total	29,544	100.0	32,266	100.0	25,951	100.0	24,653	100.0
Type of death certificate									< 0.001
Paper-based format	96,605	100.0	85,986	71.0	32,227	30.1	25,397	21.0
Online	636	0.0	35,038	29.0	74,678	69.9	95,650	79.0
Total	97,241	100.0	121,024	100.0	106,905	100.0	121,047	100.0

a) Chi-square test for trend.

There was also an increasing trend in the software performance index in ICD-10 coding, with a progressive increase in the proportion of this performance: from 87.2% in 2016 to 92.7% in 2019 (p-value < 0.001). However, the highest proportion of errors in the records that the software failed to process were coding errors (Table 1).

It could be seen a direct linear correlation between the proportion of death certificates directly filled out by physicians on SINADEF and the Iris performance: Pearson’s correlation coefficient = 0.95; R² = 0.89 (Figure 2).

Notes: Pearson’s linear correlation = 0.95; Coefficient of determination R² = 0.89.

Figure 2 Correlation between the proportion of death certificates produced in electronic format and the software performance index in the selection of the underlying cause of death, Peru, 2016-2019

DISCUSSION

During the study period, the software performance increased due to the progress in its ability to code the terms used by physicians for reporting causes of death with ICD-10. There was a correlation between the proportion of declarations directly filled out by the physicians and the software performance. The implementation of SINADEF played an important role in this process, enabling the development of a dictionary adapted to the Peruvian context and contributing to an increasing number of medical terms each year. Studies have shown improvements achieved after training physicians in filling out death certificates, and in the quality of the data they record,¹⁷^),(¹⁸ in addition to the possibility of improving the software performance.

One limitation of this study lies in the lack of an analysis of the agreement between the software and the application of rules by experienced coders. In Peru, this type of analysis is difficult because, before the implementation of SINADEF, most death certificates were coded by employees without formal training in ICD-10. A second limitation is related to the fact that, in this study, the Iris performance was not analyzed by sociodemographic variables or disease group.

The highest performance of the software was around 80%. A study conducted in São Paulo, in 2010, with a sample of 666 deaths aimed at testing the software Portuguese dictionary, found a performance of 95%.¹⁹ In the Netherlands, during a study on the implementation of an automated coding system with data from 134,262 deaths that occurred in 2009, there was an increase in performance from 17% in the first batch to 69% in the last batch, after a series of improvements in the dictionary.²⁰ In Spain, a study to assess the impact of automating cause of death records on mortality in the autonomous community (geopolitical macro-region) of Navarra, based on 5,060 deaths that occurred in 2014, identified a performance of 90%.²¹ When evaluating the use of Iris in a small sample of deaths in Burkina Faso, a performance of 90% was found.¹⁰

In this study, automatic coding of medical terms for causes of death was 93%, while in Italy, in 2016, this proportion was 78%.²²

Most of the errors that led the software to reject a death certificate occurred during coding: typos, spelling errors, or errors with unusual characters.¹⁹ However, the software also faces challenges in accurately coding external causes of death, because forensic medical examiner use a wide range of causes of death, which affects the efficiency of the dictionary.²⁰

There is a global movement towards automated selection of the underlying cause of death. Nearly all countries in the European Union use Iris. In Latin America, the software is being implemented in several countries. In Brazil, it has been integrated into a mobile application for doctors, aiming to improve the completion of the Death Certificate (DC).²³^),(²⁴

It can be concluded that there is a trend of improvement in the performance of the software for selecting the underlying cause of death in Peru. This improvement seems to be associated with the implementation of the SINADEF and the optimization of the dictionary of medical terms. Further studies on Iris are needed to assess the impact of the software on mortality statistics. Taking into consideration that its implementation in the region will enhance data comparability, it is necessary to study the agreement between the medical terms of the dictionaries used in South American countries.

REFERENCES

1. Suthar AB, Khalifa A, Yin S, Wenz K, Ma Fat D, Mills SL, et al. Evaluation of approaches to strengthen civil registration and vital statistics systems: a systematic review and synthesis of policies in 25 countries. PLoS Med. 2019;16(9):e1002929. doi: 10.1371/journal.pmed.1002929 [ Links ]

2. Antini C, Rajs D, Muñoz-Quezada MT, Mondaca BAL, Heiss G. Reliability of cause of death coding: an international comparison. Cad Saude Publica. 2015;31(7):1473-82. doi: 10.1590/0102-311X00099814 [ Links ]

3. Минаева А, Вайсман К. The peculiarities of coding and the determination of the primary cause of death from the diseases induced by the human immunodeficiency virus in accordance with ICD-10. Sud Med Ekspert. 2015;58(2):27-9. doi: 10.17116/sudmed201558227-29 [ Links ]

4. Winkler V, Ott JJ, Becher H. Reliability of coding causes of death with ICD-10 in Germany. Int J Public Health. 2010;55(1):43-8. doi: 10.1007/s00038-009-0053-7 [ Links ]

5. Gamage USH, Adair T, Mikkelsen L, Mahesh PKB, Hart J, Chowdhury H, et al. The impact of errors in medical certification on the accuracy of the underlying cause of death. PLoS One. 2021;16(11):e0259667. doi: 10.1371/journal.pone.0259667 [ Links ]

6. Harteloh P, Bruin K, Kardaun J. The reliability of cause-of-death coding in The Netherlands. Eur J Epidemiol. 2010;25(8):531-8. doi: 10.1007/s10654-010-9445-5 [ Links ]

7. Eckert O. Electronic coding of death certificates. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2019;62(12):1468-75. doi: 10.1007/s00103-019-03045-2 [ Links ]

8. Rey G. Death certificate data in France: production process and main types of analyses. Rev Med Interne. 2016;37(10):685-93. doi: 10.1016/j.revmed.2016.01.011 [ Links ]

9. Barro SG, Rey G, Staccin P. Study of the usability of an automated coding software for causes of death in an African context. Stud Health Technol Inform. 2019;264:1978-9. doi: 10.3233/SHTI190743 [ Links ]

10. Rey G, Bounebache K, Rondet C. Causes of deaths data, linkages and big data perspectives. J Forensic Leg Med. 2018;57:37-40. doi: 10.1016/j.jflm.2016.12.004 [ Links ]

11. Falissard L, Morgand C, Roussel S, Imbaud C, Ghosn W, Bounebache K, et al. A deep artificial neural network-based model for prediction of underlying cause of death from death certificates: algorithm development and validation. JMIR Med Inform. 2020;8(4):e17125. doi: 10.2196/17125 [ Links ]

12. Pan American Health Organization. istema de Codificación Automatizada de causa de muerte “Iris” [Internet]. Washington: Pan American Health Organization; 2015 [cited 2023 May 29]. Available from: Available from: https://www3.paho.org/relacsis/index.php/es/biblioteca/1126-documentos-sistema-de-codificacion-automatizada-de-causa-de-muerte-iris/ [ Links ]

13. Vargas-Herrera J, Ruiz KP, Nuñez GG, Ohno JM, Pérez-Lu JE, Huarcaya WV, et al. Resultados preliminares del fortalecimiento del sistema informático nacional de defuncionesev Peru Med Exp Salud Publica. 2018;35(3):505-14. doi: 10.17843/rpmesp.2018.353.3913 [ Links ]

14. Vargas-Herrera J, Monzón MJ, Wong LL, Ohno JM. La cobertura de muertes con certificación médica en el Perú, 2012-2019. An Fac med. 2022;83(2):123- 9. doi: 10.15381/anales.v83i2.23011 [ Links ]

15. Iris Institute. Iris user reference manual V5.8.1S2 [Internet]. Bonn: Iris Institute; 2022 [update 2022 Feb 3; cited 2023 Feb 15]. Available from: Available from: https://www.bfarm.de/SharedDocs/Downloads/EN/Code-Systems/iris-institute/manuals/iris-user-reference-manual-v5-8-1s2_pdf.html;jsessionid=A6DF42A5C4F65BD129823DB7BCDE50E5.internet281?nn=922496&cms_dlConfirm=true&cms_calledFromDoc=922496 [ Links ]

16. Instituto Nacional de Estadística e Informática. Boletín Especial nº 24 - Perú: estimaciones y proyecciones de la población nacional, por año calendario y edad simple, 1950-2050. Lima: Instituto Nacional de Estadística e Informática; 2019 [cited 2023 May 23]. Available from: Available from: https://www.inei.gob.pe/media/MenuRecursivo/publicaciones_digitales/Est/Lib1681/ [ Links ]

17. Miki J, Rampatige R, Richards N, Adair T, Cortez-Escalante J, Vargas-Herrera J. Saving lives through certifying deaths: assessing the impact of two interventions to improve cause of death data in Perú. BMC Public Health. 2018;18(1):1329. doi: 10.1186/s12889-018-6264-1 [ Links ]

18. Vargas-Herrera J, Meneses G, Cortez-Escalante J. Physicians’ perceptions as predictors of the future use of the national death information system in Peru: cross-sectional study. J Med Internet Res. 2022;24(8):e34858. doi: 10.2196/34858 [ Links ]

19. Martins RC, Buchalla CM. Automatic coding and selection of causes of death: an adaptation of Iris software for using in Brazil. Rev Bras Epidemiol. 2015;18(4):883-93. doi: 10.1590/1980-5497201500040016 [ Links ]

20. Harteloh P. The implementation of an automated coding system for cause-of-death statistics. Inform Health Soc Care. 2020;45(1):1-14. doi: 10.1080/17538157.2018.1496092 [ Links ]

21. Floristán YF, Osinaga JD, Prieto JC, Perez JA, Moreno-Iribas C. Coding Causes of Death with IRIS Software. Impact in Navarre Mortality Statistic. Rev Esp Salud Publica. 2016;90:e1-e9. [ Links ]

22. Orsi C, Navarra S, Frova L, Grande E, Marchetti S, Pappagallo M, et al. Impact of the implementation of ICD-10 2016 version and Iris software on mortality statistics in Italy. Epidemiol Prev. 2019;43(2-3):161-70. doi: 10.19191/EP19.2-3.P161.055 [ Links ]

23. Suárez LC. Primer bienio de estadísticas de mortalidad con el codicador automático Iris de causas de muerte. Gac Sanit. 2018;32(1):5-7. 10.1016/j.gaceta.2016.11.009 [ Links ]

24. Ishitani LH, Cunha CCD, Ladeira RM, Corrêa PRL, Santos MRD, Rego MAS, et al. Evaluation of a smartphone application to improve medical certification of the cause of death. Rev Bras Epidemiol. 2019;22(Suppl 3):e190014.supl.3. doi: 10.1590/1980-549720190014.supl.3 [ Links ]

ACKNOWLEDGEMENT We would like to express our gratitude to Dr. Benjamin Clapham from Vital Strategies for his support in developing this article, and Dr. Juan Cortez-Escalante and Dr. Jaqueline García de Almeida Ballestero for reviewing the translation of this article into Portuguese.

Received: March 29, 2023; Accepted: July 07, 2023

^{Correspondence:} Javier Vargas-Herrera. ^E-mail: jvargash@unmsm.edu.pe

^{AUTHOR CONTRIBUTIONS}

Vargas-Herrera J and Miki J collaborated with the study conception and design, analysis and interpretation of the results, drafting and critical reviewing of the manuscript content. Wong LL, Monzón JM and Villanueva R collaborated with data analysis and interpretation, and critical reviewing of the manuscript content. All authors have approved the final version of the manuscript and declared themselves to be responsible for all aspects of the work, including ensuring its accuracy and integrity.

^{CONFLICTS OF INTEREST}

The authors declare they have no conflicts of interest.

^{Associate editor:}

Taís Freire Galvão - https://orcid.org/0000-0003-2072-4834

This is an open-access article distributed under the terms of the Creative Commons Attribution License