USING DEEP CONVOLUTIONAL NEURAL NETWORKS WITH SELF-TAUGHT WORD EMBEDDINGS TO PERFORM CLINICAL CODING
Keywords:
Learning systems, supervised learning, unsupervised learning, machine learning, natural language processingAbstract
Clinical coding represents the transposition of clinical findings and diagnostics into codes contained in the International Classification of Diseases (ICD). This represents a very important task for the standardization of disease diagnoses and payment of clinical bills. To perform such task, hospitals assign the role of “clinical coder” to the person responsible for reading the whole clinical documentation and assigning the ICD codes accordingly. This task, however, is very time-consuming and the uncertainty that is related to natural language can introduce mistakes in coding. It is also known that wrong coding can lead to delays in paying process, and in some cases financial and legal disruption. The objective of this research is to propose a model to automate Clinical Coding by using clinical discharge summaries. These texts, written in Brazilian Portuguese, were transformed into word embeddings and then fed into a classifier based on a Deep Convolutional Neural Net. Given the imbalance in data, we’ve trained and tested the model using a stratified k-fold approach (k = 10) with cost-sensitive learning, obtaining on our best model an average F-score of 0.97 with standard deviation of 0.04. We also tested the model against a balanced augmented database, obtaining 82,9% of final accuracy. These results show that our model outperforms some of the recent models developed for similar tasks. Since we have not taught the algorithm any rules of language or coding, these results suggest that clinical coding can be automated by Deep Learning based approaches that uses self-taught word embeddings.