Logo Psy-Insight:
Explainable Multi-turn Bilingual Dataset for Mental Health Counseling

Anonymous Team * ,

Anonymous University
*Equal Contribution Corresponding Author
"Understanding why people suffer, how they change, and how to help them live satisfying lives is a fascinating and important undertaking."
—- Irvin Yalom

What is our purpose?

We provide a professional-level Chinese-English dialog dataset, which alleviates the scarcity of long context and multi-turn dialogue data in mental support. Our dataset covers various labels, including emotion, strategy, psychotherapy, and step-by-step explanations. These data can support multiple emotional support tasks. We provide baseline experiments on two tasks for comparative analysis. This website offer detailed docs for preprocessing, download, evaluation guide, copyright , and data sources.

How do we collect data?

The workflow of data construction. We clean psychologial data from other raw pretained datasets (eg,. MNBVC, PsyArxiv). The Copyright and datasource information of raw datasets are archived, with detailed licenses provided. We used keywords provided by psychological experts filter out professional-level psychological counseling dialogues and explanations. Then we mapped these explanations to dialogs' labels. Finally, two students manually checked the data and remove irrelevant dialogues.

The figure shows the process of label mapping. We apply sliding windows algorithm for label mapping process. From raw data, we extract the dialogue and then map the labels to the dialogue. Totally, we collected 10000+ dialogue-label pairs in Psy-Insight.

How do we utilize data?

Three types of generation methods for ablation experiments. To utilize the step-by-step explanation labels, we design an ablation experiment to explore whether it is helpful to let the model generate reasoning before generating the response. In ablation experiments, we designed task 1~3. Task 1 is the baseline task using pure dialog data. Task2 and Task3 use explanation labels. These tasks correspond to the various results in the ablation experiment results.

The principle of the RAG experiment. We design RAG experiment to explore the effect of retrieval. In this experiment, we use chat history as input to retrieve the similar dialogue cases in dataset. Then, the model will concatenate the retrieved information and chat history to generate the response.

The results of the experiments on previous 4 task types. Models will be asked to generate response by previous 4 types of data format. We fine-tuning these models on Psy-Insight multi-task labels. Addtionally, mix-instruction means a model fine-tuned sequentially on tasks 1, 2, and 3. The results show that the model can generate more reasonable responses when using the explanation labels.

How do we evaluate data?

📝 Evaluation Details

Our evaluators are eleven psychology students and two psychological counselors. They were asked to score 1-5 in interactivity, helpfulness, comfort, and explanability. We also calculated Cronbach's alpha to measure the reliability of human evaluation. This website [https://anonymous.4open.science/r/Psy-Insight-F65E/expert-eval/README.md] open-sources the evaluation guides, metrics, score range, reliability analysis, and all the expert comparison results (scores & comments).

Where can I obtain dataset statistics and data examples?

Github

Our github repository [https://anonymous.4open.science/r/Psy-Insight-F65E] open-sources the dataset statistics, data examples, and all the code to generate the dataset.



Further work: Psy-COT Graph

First Slide

A small sub-graph of Psy-COT graph. The Psy-COT maps events and strategies to dialogue units, preserving causal and temporal relationships.



First Slide

A small sub-graph of Psy-COT graph in Neo4j. Our graph and dataset are open-sourced on Github. You can try and interact with Psy-COT by buttons on homepage.



Second Slide

Previous Works on Knowledge Graph (KG).

Psy-COT has similarities to commonsense graphs and knowledge graphs. Figure 1 and2 shows how Psy-COT differs from previous studies. Unlike knowledge graphs, which focus on entities, Psy-COT emphasizes counseling descriptions and logic relationship ("Disappointed" vs Expand Client’s Client’s relationship).





Multi-Level Indexing for Retrieval

Psy-COT features the construction of two specialized indexing structures, one for COT (Chain of Thought) nodes and another for dialogue nodes. This design allows for more precise information retrieval, enhancing the efficiency and accuracy of finding relevant strategies and dialogues in psychological counseling conversations. By distinguishing between the content of the chain of thought and dialogue content, Psy-COT enables more accurate vector retrieval, thereby boosting the performance of LLMs.



Visualizing the Thought Process

Psy-COT presents a graphical representation of the thought process in psychological counseling dialogues, allowing therapists to intuitively understand the reasoning behind AI models. It displays semi-structured counseling conversations alongside step-by-step annotations that capture the reasoning and insights of therapists, enhancing the transparency of AI decision-making. Unlike traditional knowledge graphs, Psy-COT emphasizes the logical chain of causality in events and the temporal evolution of strategies in counseling, rather than just the inclusion relationships between entities.






Take away



  • 1. Most previous emotional supported datasets were collected based on English web posts. Multi-turn mental support dialogues, especially those in Chinese, are particularly lacking in data.
  • 2. The Psy-Insight is a muliti-turn and long context dataset, with labels for emotions \ strategy \ psychotherapy\ summary \ expression \ step-by-step reasoning.
  • 3. Our experiments show that brief reasoning before generating response improves performance in mental support, but excessie reasoning reduces response quality.
  • 4. We invite 11 psychological students and 2 expert counselors to compare human counseling and GPT-synthetic responses. Their scores and comments are valuable for further research in mental support, eg,. DPO,RLHF.







  • 1. Data.
  • 2. Paper.
  • 3. Copyright\ Data Source.
  • 4. Evaluation.
  • 5. Data Example.
  • 6. Other.



BibTeX


      @article{Psy-Insight,
        author       = {Anomyous until 2023-11-15},
        title        = {Psy-Insight: Dataset},
        journal      = {},
        volume       = {},
        year         = {2024},
        url          = {},
        doi          = {}
      }      

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.