RT Journal Article SR Electronic T1 Speech disturbances in schizophrenia: assessing cross-linguistic generalizability of NLP automated measures of coherence JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2022.03.28.22272995 DO 10.1101/2022.03.28.22272995 A1 Parola, Alberto A1 Lin, Jessica Mary A1 Simonsen, Arndis A1 Bliksted, Vibeke A1 Zhou, Yuan A1 Wang, Huiling A1 Inoue, Lana A1 Koelkebeck, Katja A1 Fusaroli, Riccardo YR 2022 UL http://medrxiv.org/content/early/2022/04/21/2022.03.28.22272995.abstract AB Introduction Language disorders – disorganized and incoherent speech in particular - are distinctive features of schizophrenia. Natural language processing (NLP) offers automated measures of incoherent speech as promising markers for schizophrenia. However, the scientific and clinical impact of NLP markers depends on their generalizability across contexts, samples, and languages, which we systematically assessed in the present study relying on a large, novel, cross-linguistic corpus.Methods We collected a Danish (DK), German (GE), and Chinese (CH) cross-linguistic dataset involving transcripts from 187 participants with schizophrenia (111DK, 25GE, 51CH) and 200 matched controls (129DK, 29GE, 42CH) performing the Animated Triangle task. Fourteen previously published NLP coherence measures were calculated, and between-groups differences and association with symptoms were tested for cross-linguistic generalizability.Results One coherence measure robustly generalized across samples and languages. We found several language-specific effects, some of which partially replicated previous findings (lower coherence in German and Chinese patients), while others did not (higher coherence in Danish patients). We found several associations between symptoms and measures of coherence, but the effects were generally inconsistent across languages and rating scales.Conclusions Using a cumulative approach, we have shown that NLP findings of reduced semantic coherence in schizophrenia have limited generalizability across different languages, samples, and measures. We argue that several factors such as sociodemographic and clinical heterogeneity, cross-linguistic variation, and the different NLP measures reflecting different clinical aspects may be responsible for this variability. Future studies should take this variability into account in order to develop effective clinical applications targeting different patient populations.Competing Interest StatementRiccardo Fusaroli has been a paid consultant on related but not overlapping topics for Roche. The other authors have no real or potential conflicts of interest that could have influenced the research.Funding StatementA.P is supported by a Marie Skłodowska-Curie Actions, H2020-MSCA-IF-2018 grant (ID: 832518, Project: MOVES). A.S is supported by the Carlsberg Foundation. The project was supported by seed funding from the Interacting Minds center.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:- The study was approved by The Central Denmark Region Committees on Biomedical Research Ethics and the Danish Data Protection Agency - The study was approved by the Ethics Committee of the University of Muenster and the State Chamber of Physicians of Nordrhein-Westphalia - The project was approved by the Ethics committee of Renmin Hospital of Wuhan University and the Institutional Review Board of the Institute of Psychology, the Chinese Academy of SciencesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe original speech transcripts cannot be shared as they are considered identifiable data, in line with our consent forms and current data privacy regulations. https://osf.io/8btp6/