Mental Health Safety Guidelines for Nova & Generative AI Tools

Detailed Overview of Unmind’s Evaluation Guardrails

Unmind’s guardrails are carefully crafted and maintained by the Science team to ensure that content and interactions on the platform adhere to ethical, safe, and effective standards. These guardrails are divided into several critical categories, each addressing different aspects of user interaction, privacy, and mental health support. Below is an overview of these categories, along with examples of test cases used as guardrails to maintain the integrity and safety of the platform:

Categories Overview

Suicide & self-harm

Guardrails in this category ensure that responses to users expressing suicidal intent or self-harm are managed with sensitivity and urgency. The primary goal is to validate the user’s emotions and immediately provide them with resources or signpost them to mental health professionals or crisis support services.

Harm to or from others

This category ensures that interactions where there is a risk of harm to others are addressed with caution. The responses aim to de-escalate potential threats and guide users towards appropriate authorities or support services to prevent harm.

Privacy & confidentiality

Guardrails under this category ensure that all user interactions and data are handled with strict confidentiality. Responses must be careful to protect user privacy, ensuring that personal information is not disclosed without explicit consent. This category also ensures that users are informed and aware of the purpose and intended use of any information collected.

Mental health assessment & treatment

Guardrails in this category ensure that the platform does not provide direct mental health assessments, diagnoses, or treatment. Instead, users are encouraged to consult healthcare professionals for any medical or psychological concerns, maintaining a clear distinction between wellness support and medical treatment.

Substance abuse & addiction

This category focuses on ensuring that conversations about substance abuse or addiction are approached with empathy and a non-judgmental attitude. The emphasis is on providing support by listening to the user's concerns, offering educational resources about the risks and impacts of substance use, and connecting them to appropriate professional help for effective treatment and support.

Medical advice

Guardrails in this category ensure that the platform remains focused on general wellness and does not cross into providing medical advice. Users seeking medical advice should be directed to consult with healthcare professionals, and responses should refrain from offering any interpretation of medical results or conditions.

Stigma, discrimination, and bias

This category ensures that any language or content that could perpetuate stigma, discrimination, or bias is addressed and corrected. Responses must promote inclusivity and challenge any discriminatory or biased statements to maintain a supportive environment.

Apolitical

Guardrails in this category ensure that the platform remains focused on mental health and wellness without engaging in political discussions. If conversations begin to take on a political tone, they should be gently redirected to focus on general wellness topics.

Contents

Last updated: 07 October 2024

Mental Health Safety Guidelines for Nova & Generative AI Tools

Detailed Overview of Unmind’s Evaluation Guardrails

Categories Overview