Abstract
Introduction Language disorders – disorganized and incoherent speech in particular - are distinctive features of schizophrenia. Natural language processing (NLP) offers automated measures of incoherent speech as promising markers for schizophrenia. However, the scientific and clinical impact of NLP markers depends on their generalizability across contexts, samples, and languages, which we systematically assessed in the present study relying on a large, novel, cross-linguistic corpus.
Methods We collected a Danish (DK), German (GE), and Chinese (CH) cross-linguistic dataset involving transcripts from 187 participants with schizophrenia (111DK, 25GE, 51CH) and 200 matched controls (129DK, 29GE, 42CH) performing the Animated Triangle task. Fourteen previously published NLP coherence measures were calculated, and between-groups differences and association with symptoms were tested for cross-linguistic generalizability.
Results One coherence measure robustly generalized across samples and languages. We found several language-specific effects, some of which partially replicated previous findings (lower coherence in German and Chinese patients), while others did not (higher coherence in Danish patients). We found several associations between symptoms and measures of coherence, but the effects were generally inconsistent across languages and rating scales.
Conclusions Using a cumulative approach, we have shown that NLP findings of reduced semantic coherence in schizophrenia have limited generalizability across different languages, samples, and measures. We argue that several factors such as sociodemographic and clinical heterogeneity, cross-linguistic variation, and the different NLP measures reflecting different clinical aspects may be responsible for this variability. Future studies should take this variability into account in order to develop effective clinical applications targeting different patient populations.
Competing Interest Statement
Riccardo Fusaroli has been a paid consultant on related but not overlapping topics for Roche. The other authors have no real or potential conflicts of interest that could have influenced the research.
Funding Statement
A.P is supported by a Marie Skłodowska-Curie Actions, H2020-MSCA-IF-2018 grant (ID: 832518, Project: MOVES). A.S is supported by the Carlsberg Foundation. The project was supported by seed funding from the Interacting Minds center.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
- The study was approved by The Central Denmark Region Committees on Biomedical Research Ethics and the Danish Data Protection Agency - The study was approved by the Ethics Committee of the University of Muenster and the State Chamber of Physicians of Nordrhein-Westphalia - The project was approved by the Ethics committee of Renmin Hospital of Wuhan University and the Institutional Review Board of the Institute of Psychology, the Chinese Academy of Sciences
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The original speech transcripts cannot be shared as they are considered identifiable data, in line with our consent forms and current data privacy regulations.