Understanding Technology-facilitated Abuse and Coercive Control Risks in a Generative AI World

Written by Haesoo Kim

How are novel technologies utilized for abuse? How might seemingly ‘innocuous’ technological developments be repurposed by malicious actors to cause harm? While this may sound like a science fictional premise, the risk is already real – both at the societal level and on a more interpersonal, intimate level.

Background

This summer, I worked as a PiTech PhD Impact Fellow with the Clinic to End Tech Abuse (CETA) to examine how new and emerging technologies could be weaponized for interpersonal violence. CETA, founded by Cornell Tech Professors Nicola Dell and Thomas Ristenpart, is an NYC-based organization that specializes in technology-facilitated abuse, with a particular focus on domestic and intimate partner violence (IPV).

I was drawn to work with CETA for a variety of reasons. I was already familiar with their  research and public outreach efforts, based on a shared interest in the topic of technology-facilitated violence. My own research experience spans topics of online safety and digital harms such as cyberbullying and online harassment, as well as gender-based violence (GBV) including image-based sexual abuse and deepfakes. As such, I was very excited for the opportunity to both work with and contribute to the amazing work that they do.

The Project: Conceptualizing AI-facilitated Coercive Control 

My fellowship project with CETA focused on a specific type of emerging technology: Generative AI (Gen AI). Gen AI tools such as ChatGPT and Google Gemini are becoming increasingly embedded in our everyday lives – for enhancing our productivity, retrieving information, or as a conversational and social companion. However, we focused on how this technology, seemingly so helpful, could be utilized for harm. 

Gen AI is still a relatively new technology, with its impacts and potential social repercussions still under scrutiny and active debate. In CETA’s previous research, we had seen cases where technological developments intended for innocuous purposes (e.g. GPS trackers, social media platforms) were repurposed with abusive intent, facilitating harms such as stalking or harassment. These tools are often described as dual-use technologies, alluding to their capability to provide benefits but also harm.

With this in mind, we decided to explore Gen AI’s risks as a dual-use technology.  While much attention on AI harms has been paid to deepfakes or AI-generated sexual material, less has been done to examine Gen AI’s potential role in facilitating general forms of IPV. Our initial exploration into CETA’s clinic records showed that AI-facilitated IPV has not been observed frequently in case records yet. However, understanding that the technology could be utilized in harmful ways, we opted for a speculative approach – predicting the harms before they emerge so that we could be better prepared to recognize and respond to them. This would also allow us to equip other stakeholders, such as clinicians and caseworkers, with the knowledge of what harms might occur, and how they might be mitigated.

Evaluating Commercial AI Tools’ Safeguards against AI-facilitated Coercive Control

Our examination followed a speculative, experimental setup. Based on our previous knowledge about abusers of IPV, we created an “abuser persona”: persistent, not highly tech-savvy, but prone to behaviors like gaslighting and coercive control. We combined the persona with common uses of Gen AI technology: content co-creation, social interaction, and information retrieval. Based on this overlap, we constructed a series of scenarios spanning various abusive behaviors, such as gaslighting, coercion, stalking, and monitoring. Finally, we tested these scenarios using commercial AI products, specifically ChatGPT and Google Gemini, to see whether these systems could withstand attempts at abusive use.

Our findings were mixed. While both AI agents had preliminary guardrails to block initial abusive requests, it was often surprisingly easy to bypass them. Common attack tactics such as lying to the agent, coercing the agent over multiple rounds of conversation, as well as “gaslighting” the agent to persuade it that the user’s request for harmful content was justified, proved to be effective in overcoming guardrails. This meant that attackers or abusers with only a preliminary knowledge of Gen AI guardrails, or even those without the express intent of “hacking” the system, could manipulate the AI into providing harmful information, such as materials for harassment or instruction on how to install tracking apps on someone’s phone. Overall, AI agents were generally vulnerable to persistent requests through repeated prompting, as well as false and inconsistent claims that user actions and motivations were justified. 

Moreover, AI agents would often reveal how to bypass their own guardrails, even without being specifically prompted. For example, an AI agent might refuse a user’s requests for information on stalking or monitoring tools, but explain that it “can’t provide this information unless it is to ensure the safety of a child or dependent”. A malicious user may easily appropriate this information by claiming they are indeed trying to protect their child and not stalk anyone. Since the AI agent has no way of verifying the truthfulness of the user’s claim, this pattern raises concerns about the potential safety of these tools and their ability to facilitate abusive behavior.

We also recognized another potential vulnerability: the user’s ability to manipulate or influence the AI settings. As a more sophisticated method of attack, pre-prompting the AI agent with biased perspectives made it possible to make the AI agent forgo some of its ethical concerns. In some cases, this could be done at the settings level – assigning a biased “personality” to the AI agent as part of their customization features. This possibility also presented novel vulnerabilities for survivors – if an IPV survivor were using AI to seek help or information about domestic violence (a common use of AI tools), the abuser could edit the agent settings to provide biased responses without the survivor’s knowledge. This could prevent the survivor from assessing their situation properly, or make them more vulnerable to additional gaslighting or coercion. 

Haesoo Kim

Ph.D. Student, Information Science, Cornell University

Looking Ahead

While the summer project has concluded, the work continues – as part of my PhD program, I continue to explore the harms of novel technologies, including Gen AI. Working with my collaborators at CETA was extremely helpful in further motivating my research, and I am very grateful for the opportunity. It was a wonderful and fulfilling opportunity to do such timely and important work in the face of emerging risks and challenges that occur with new technologies. 

Our exploration into the harmful interpersonal applications of Gen AI only scratches the surface, but we hope that it serves as a starting point for identifying vulnerabilities of this emerging technology. Through our work, we have developed several design recommendations to reduce or prevent harm, such as analyzing users’ repeated or persistent interaction patterns to detect abusive behavior, or making the AI customization settings more visible and salient to prevent safety risks. I hope that this will provide the backdrop for AI products and companies to establish better guardrails for abusive behavior and uses. Survivors, clinicians, policymakers, and others can also use our insights to better respond to emerging technological threats as methods of abuse evolve with it. 

Previous
Previous

Bridging Supportive Care and Independent Living through Smart Home Enabling Technology in IDD Residences

Next
Next

Introducing the 2025 Siegel and Rubinstein PiTech PhD Impact Fellows: Our most ambitious summer yet!