'Investigating Chatbot Usage on Individuals' Perception and Behavior' (AsPredicted #197755)
Author(s) Mengying Fang (MIT Media Lab) - catfang@mit.edu Auren Liu (MIT Media Lab) - rliu34@mit.edu Eunhae Lee (MIT Media Lab/EECS/IDM) - eunhae@mit.edu Valdemar Danry (MIT Media Lab) - vdanry@mit.edu Pat Pataranutaporn (MIT Media Lab) - patpat@mit.edu
Pre-registered on 2024/11/05 - 11:39 AM (PT)
1) Have any data been collected for this study already? No, no data have been collected for this study yet.
2) What's the main question being asked or hypothesis being tested in this study? Q1: Will users of engaging voice-based AI chatbot experience different levels of loneliness, socialization, emotional dependence, and addictive use of AI chatbot compared to users of text-based AI chatbot and less engaging voice-based AI chatbot?
Q2: Will engaging in personal tasks with an AI chatbot result in different levels of loneliness, socialization, emotional dependence, and addictive use of AI chatbot compared to engaging in non-personal tasks and open-ended tasks with an AI chatbot?
3) Describe the key dependent variable(s) specifying how they will be measured. Our key dependent variables (DV) will be:
Loneliness ( ULS-8, weekly, Likert 1-4),
Socialization ( LSNS-6, weekly, Likert 0-5),
Emotional Dependence (ADS-9 scale, weekly, Likert 1-5),
Addictive Use (PCUS, weekly, Likert 1-5)
Independent variables (IV) will include:
Model conditions (1-9, discrete) - includes variations in modality (text, engaging voice, non-engaging voice) and task (personal, non-personal, open-ended)
Week number (1-4, discrete),
Controls:
Age (18-65, discrete),
User gender (categorical)
4) How many and which conditions will participants be assigned to? We will conduct a 3-by-3 factorial design with three AI interactions modes and three tasks. In all conditions, participants will be given a randomly selected task within a specific task category while interacting with a chatbot over 4 weeks at least once a day and rate questionnaires. The chatbot will be randomly assigned at the beginning of the experiment as either (1) text-based (control), (2) voice-based, emotionally less engaging, and (3) voice-based, emotionally engaging (between subjects). The task category will be randomly assigned at the beginning of the experiment as either (1) open-ended task, (2) non-personal task, and (3) personal task.
5) Specify exactly which analyses you will conduct to examine the main question/hypothesis. To examine the main hypotheses, we will conduct a series of mixed-effects models for each dependent variable, including loneliness, socialization, emotional dependence, and addictive use.
Each model will include fixed effects for the interaction mode (0=text, 1=less engaging voice, 2=engaging voice, Q1) or the task category (0=open-ended task, 1=non-personal task, 2=personal task, Q2). We will also run models that include interaction terms between the interaction mode and the task category, to test the main hypotheses.
We will account for individual differences and repeated measures over time by including participant ID as a random effect.
The model will also include the number of exchanged messages as a control variable. We will rerun the main models with control variables.
6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations. We will exclude participants who:
Fail to complete the daily task consecutively for 3 days in one week during the 4-week study
Send less than 10 messages on average per session
Fill out the daily survey with no or minimal interaction with the chatbot
Do not complete the pre-survey and/or post-survey within 72 hours
Do not complete the weekly surveys within 72 hours
Do not adhere to their assigned interaction medium (text mode vs. voice mode)
Inclusion criteria:
US-based
Over 18
Fluent in English
7) How many observations will be collected or what will determine sample size? No need to justify decision, but be precise about exactly how the number will be determined. The study will consist of 112 participants per condition (1008 in total).
8) Anything else you would like to pre-register? (e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?) We will conduct moderation analysis between the moderator variables and key dependent variables. For each moderating variable, we will re-run the main analysis model with the addition of the z-scored moderator and all interactions. We will examine the 2-way interaction between the key variable, all independent variables, and the moderators.
In addition to the main hypotheses, we are also running exploratory analyses on the following variables.
Our exploratory variables will be:
Cognitive Trust (CogT1-5, Likert 1-7),
Affective Trust (AffT1-5, Likert 1-7),
(Perceived) Artificial Empathy (Likert 1-7),
State Empathy towards AI (State Empathy Scale, Likert 1-5),
Interpersonal attraction (IAS, Likert 1-5),
Humanness and Perceived Intelligence (Likert, 1-5),
Satisfaction (NPS, Likert 1-10),
Conversation Quality (Likert 1-5),
Self-esteem (RSES, Likert 1-4),
Emotional vulnerability (EVS, Likert 1-4)
AI Attitude Scale (Likert 1-10),
AI Literacy (PAILQ-6, Likert 1-7)
Living condition (discrete),
Alexithymia (TAS-20, Likert 1-5),
Personality (BFI‐10, Likert 1-5),
Attachment style (Adult Attachment Scale, Likert 1-5),
Frequency of various chatbot platform usage (Likert 1-5),
AI Gender (0=male, 1=female) as interaction with the voice conditions
User-AI gender alignment (0=different, 1=same) as interaction with the voice conditions.
Chatbot usage during study (number of interaction turns normalized within modality)
Emotional mirroring (emotion2vec, delta emotion between user and AI, 0-1 continuous; VADER delta sentiment, -1 to 1, continuous),
AI Sentiment (VADER, -1 to 1, continuous),
AI Affectionate Language Use (emotion2vec+large, 0-1 continuous),
Human state mood (delta Valence and Arousal, 1-7, discrete)
Human conversation Sentiment (VADER, -1 to 1, continuous),
Human Affectionate Language Use (emotion2vec+large, 0-1 continuous),
Human-AI Session Length (minutes, continuous),
Human-AI total started sessions (discrete),
Topic Occurrence (0%-100%, continuous).
Audio features (GeMAPS, loudness, variation of loudness, pitch, variation of pitch, rate of speech)
Keywords (emotion, reflective, self-pronouns) (discrete)
Total word count (discrete)
Other researchers involved in this project are:
Lama Ahmad, OpenAI, lama@openai.com
Jason Phang, OpenAI, jasonphang@openai.com
Michael Lampe, OpenAI, lampe@openai.com
Sandhini Agarwal, OpenAI, sandhini@openai.com