top of page
Feasibility study towards AI-based identification of adolescent mental health discourse on social media

Senior Data Scientist, Informatics

Person using mobile phone with apps being displayed in a holographic manner extending beyond the phone

Overview


The overall goal of the project was to extend a previous feasibility study towards AI-based identification of mental health discourse among adolescents on social media. Previous work, carried out in collaboration with UNICEF, has established the feasibility of the topical component, that is detecting social media posts relating to mental health. However, the age component presented challenges, seeing as it is far more difficult to accurately attribute age based solely on produced text. This is especially pronounced when the topic is fixed, as topic is one of the key context cues in author profiling.


The specific gap the EMH Seed fund project was aimed to address was two-fold:

  1. Lack of relevant, social media

  2. Feasibility assessment of detection of discourse


The hope had been, should results be promising, to provide a proof-of-concept for developing a tool for timely, population-level lens on youth mental health. Such a tool, when combined with traditional social research methods, could aid prioritisation efforts, e.g. at UNICEF. Further, it could provide basis for novel research at the intersection of data mining, social science and psychology; e.g. examining temporality, network effects, geography, etc.


Outcomes


The key outputs from the project are:

  • An age-annotated dataset of social media posts, based on a snapshot of data from the selected platform, Reddit, together with the computational pipeline for automated age annotation (specific to the platform).

  • Assessment of performance for two popular lightweight Large Language Models, GPT-3.5 and Mistral Instruct v.0.2, under a few hand-crafted prompts (the typical and most straightforward manner of interacting with LLMs).

  • Summary of work & results in Power Point:
    Technical report for internal consumption & developers, e.g. follow-up work
    (Version for the broader audience in discussions with UNICEF)


The age annotated dataset and pipeline is a minor contribution, and is intended as foundation for any further development of the overarching approach; it may also be of interest to computational NLP communities. Main result is the evaluation of feasibility of LLM-based identification of adolescent mental health discourse. In terms of reported metrics the performance of LLMs is mixed, exhibiting different trade-offs with different prompting configurations. However, considering that any developed tool would not be a sole decision aid, these results show promise for a big-data approach to monitoring social media for adolescent mental health discourse.


Future Directions


The planned-for next step is a presentation to UNICEF and possibly further collaboration to move beyond feasibility assessment / proof-of-concept and towards operationalising. It is important to note that operationalising may be difficult in the near term, following recent changes to several social media platforms.


In parallel to carrying out the project, the main researcher also supervised an MSc student on another direction extending the original work. Together with the results obtained here, these may lead to a publication.


Finally, the overall positive assessment of feasibility opens the door to further downstream research. Several directions are possible, e.g. temporality or network effects, as mentioned already. Of particular interest in the research group are novel / advanced data mining approaches towards building a causal understanding of drivers and progression of mental health issues from unstructured data. We are particularly interested in the Wellcome Mental Health Award: Transforming early intervention for anxiety, depression and psychosis in young people funding, but would require a larger team with complementary expertise.

Join Our Network!

Subscribe to our mailing list for regular updates and access to our monthly newsletter. We use Dotdigital for our email communications, you can review their privacy policy here. By default, UoE affiliates are also added to our Teams page & Members webpage. Please email us to opt-out.

  • Link to Email EMH
  • EMH Bluesky profile
bottom of page