Improved verification of suicidal ideation and suicide attempt through natural language processing

This study presents a scalable Natural Language Processing approach that receives as input a list of text expressions describing a clinical result of interest (result query), examines all clinical notes from the EHR, and calculates the degree of relevance of the result for each patient with text expressions in the input. Notes. The output of this NLP system is a categorized list of patients as potential outcomes of interest so that the most relevant patients in the list are ranked at the top. All methods were performed in accordance with relevant guidelines and regulations. The study was approved by the Institutional Review Board (IRB) at Vanderbilt University Medical Center (VUMC) with waiver of approval (IRB #151156).

Clinical population

The clinical data used in this study was extracted from a synthetic derivative, which is a research-oriented data repository containing a de-identified version of VUMC’s electronic health records.16. As of December 2021, this repository stores >200 million banknotes for more than 3.4 million patients. Specific data items extracted from synthetic derivatives include clinical notes, psychiatric figures, demographics data, International Classification of Diseases, 9th/10th revision, and clinical modification (ICD-9/10-CM) billing codes.

A data-driven approach to guide the choice of suicide inquiring terms

We relied on a data-driven approach to automatically extract text expressions describing suicidal thoughts and attempted suicide. Similar to our previous work13we used word2vec from Google ( to recursively expand an initial list of two related keywords, suicide And the “suicidal”. In short, we first trained the gram skip model of word2vec17 On 10 million papers randomly sampled from synthetic derivations to learn word motifs for each word in the observation set. Preprocessing of these notes included encoding, converting symbols to lowercase, and excluding low-frequency symbols and punctuation. To form the model, we used a vector dimension of 100, and context window sizes 5 and 15. Next, we calculated cosine similarity between seed motifs and combined all non-primary words and selected the highest-ranked words as new seed words and potential candidates for suicide query terms. Finally, we analyzed a manually generated seed list to suggest queries for the two suicidal outcomes.

Recovery of suicidal thoughts and attempted suicide

We applied an information retrieval model to classify patients according to their relevance to each suicide query generated in the previous step. The system architecture was designed as a vector space model in which input queries and patients were represented as multidimensional vectors of words or word expressions. Here, each patient vector was extracted from an identification document that included all patient notes. The patient’s fitness for a suicide outcome was measured as similarity between the contralateral patient vector and the suicide query vector using the inverse frequency cosine frequency (TF-IDF) scale. Specifically, for the degree of similarity between the suicide query and the patient yquery term weight I In the patient’s description document y It was calculated as follows:

$$ w_{i,j} = tf_{i,j}\cdot {\text{log}}\frac{N}{{df_{i}}}$$

where fulfilledIJ is the number of occurrences of the term I In the patient’s description document y (term frequency), defenderI is the number of patients whose corresponding descriptive documents contain the term I (document frequency), and n It is the total number of patients in the electronic health record.

For each retrieved patient, we also implemented confirmation strategies based on the frequency of the rejected query terms in the patient notes.18And the1920. To assess whether negation improves recall of suicidal thoughts and attempted suicide, we extracted additional classifications where each patient had at least one positive affirmative query term in patient notes. Thus, these classifications do not contain patients in whom all of the query terms mentioned in their observations were invalidated. Patients were selected and ranked using the phenotype retrieval (PheRe) software package, available at

Model evaluation

Model performance for both suicidal ideation and suicide attempt was assessed on patient groups extracted from three sources of information: (1) the best NLP-derived patients, (2) randomized patients with ICD10CM codes for harmful thoughts and behaviors by self, and (3) randomly selected patients with psychiatric forms of suicide assessment. Only a limited set of psychological models for assessing suicide was available in synthetic derivatives because not all structured forms can currently be broadly identified without risking unintended re-identification. Each patient was reviewed by manual analysis (reviewers KR, RA) of the entire patient record and disputes were resolved by a physician with experience in medicine and in Graph Validation for Suicide Research (CGW). Agreement of internal auditors was measured using the Cohen’s kappa statistic. In general, a patient was manually classified as a condition if the patient’s corresponding notes contained any evidence of suicidal intent or intent to die from self-injurious behavior4. Patients with ICD codes for self-injurious thoughts and behavior were also asked to have supporting information in their notes to be classified as conditions. In cases where the patient denied a suicide attempt but the clinician documented an attempt, chart reviewers followed the provider’s judgment and assigned the case label.

The assessment consists of comparing patients’ assessments through manual review with assessments automatically generated by the NLP system, ICD10CM codes, and psychological models of suicidal ideation and attempted suicide. For unclassified patients, we measured performance values ​​in terms of accuracy (P) or positive predictive value (PPV), recall (R), and F1 score (F1). For the classified patient lists generated by the NLP system, we reported the exact recall curves, the accuracy of the best-ranked K patients (P@K), and the area under the exact recall curve (AUPRC), which was estimated based on average precision measurement21. We used a bootstrap procedure to calculate 95% confidence intervals (CIs) for AUPRC estimators using experimental quantitation of resampled data generated by 1000 bootstrap replicates.22,23.

A poorly supervised approach to naming cases of suicidal ideation and suicide attempt

The main objective of this study was to perform a high-resolution extraction of suicidal ideation and incidences of suicide attempts from all patients extracted by the NLP system. Since we designed the NLP system to rank patients most relevant to the two suicide-related outcomes at the top of each list, we suggested solving this task by finding a cut-off value, K, for a given target accuracy, P@K, and then selecting the best K-ranked patients from the list. recovered as cases. In our experiments, we extracted K values ​​as P@K = 90% and P@K = 80%.

To calculate P @ K for any K in an ordered list (denoted as patient[1..N]where \(K\le N\)), we designed a poorly supervised approach that assigns a case label to each patient in the list with a specific confidence or probability value (Fig. 1). This approach combines a small group of patients classified as cases or non-cases with the remaining group of patients not classified in the classified list. We defined the initial classified group to include all patients from the manually validated classified list or who had psychiatric forms of suicidal ideation and assessment of suicide attempts. Based on our evaluation, we assumed that each patient from this initial group was classified as a case or non-state with a high degree of confidence (or with probability \ (p = 1 \)). This is specified by resultValidation The procedure is in Figure 1.

shape 1
shape 1

A poorly supervised method for assigning a case label to an ordered list of patients retrieved by an NLP system.

The probability of case assignment to an unregistered patient was calculated according to his rank in the list and the availability of relevant ICD codes in his registry (Fig. 1, lines 13–21). Specifically, for each patient in the sorted list, we initially calculated the probability of relevance (denoted by \(P_{{{\text{rank}}}}\)) is proportional to the patient’s rank in the list as shown in lines 1-8 in Figure 1. As noted, \(P_{{{\text{rank}}}}=1\) for the first patient on the list; and then, \(P_{{{\text{rank}}}}\) It decreases monotonously to 0, which corresponds to the probability of fit of the last patient on the list. Furthermore, based on the evaluation made in this study and our previous work4we thought \(P_{{{\text{ICD}}9}}\) And the \(P_{{{\text{ICD}}10}}\) As probabilities of a suicidal outcome for each patient with at least one ICD10CM and ICD9CM, respectively. We hypothesized that these odds are zero for patients who do not have ICD codes for self-injurious thoughts and behaviors. When both NLP ranks and ICD codes were considered, we calculated the probability of patient assignment K To name the status as \(p_{{{\text{NLP}}+{\text{ICD}}}}\left(k \right) = \max \left({p_{{{\text{Rank}}}} \left(k \right), p_{{{\text {ICD}} 9}}, p_{{{\text {ICD}} 10}}} \right)\) As shown by line 14 in Figure 1. Thus, using this probability and a random variable u Created from the standard standardized distribution, assigning the label to the patient K It was performed as indicated on lines 15-20. In addition, to assess the contribution of ICD codes to the selection of suicidal thoughts and incidences of suicide attempts, we applied a similar poorly supervised approach using only \(P_{{{\text{rank}}}}\) Case referral possibilities. This NLP-based state assignment method was implemented by replacing line 14 in Figure 1 with the code \(p_{{{\text{NLP}}}}\left(k \right) = p_{{{\text {Rank}}}\left(k \right)\). In particular , \(p_{{{\text{NLP}} + {\text {ICD}}}}\) And the \(P_{{{\text{NLP}}}}\) A minimum value of 0.5 can also be set assuming that each patient in the sorted list has at least an equal chance of being randomly assigned to a condition. However, this approach will not contribute to the selection of the best K cases at P@K = 90% or P@K = 80% and will essentially increase the number of cases in the lower half of the list of classified patients as \(q_{{{\text{rank}}}}\left (k\right) <0.5\). The ICD9CM and ICD10CM codes for self-injurious thoughts and behaviors used in this study are listed in Tables S1 through S4.

Leave a Comment