LibGuides: History: Uploading content to LLMs

Copyright, licenses and rights

While AI tools can summarize text or create imagery from data, it is important to understand where the text or data comes from, who owns the rights, and how the AI model uses it. This is a concern for both subscription based and Open Access materials. The Library advises caution in these areas:

Types of Materials for Consideration

It can be challenging to determine whether materials come from subscription sources or are Open Access, particularly when on campus using IP authentication. Regardless, it’s crucial to identify who owns the content and understand the rights attached to it.

Adding Published Papers/Articles to an LLM

In theory, you may rely on the Text and Data Mining exception under the Copyright, Designs and Patents Act (CDPA) 1988 for non-commercial research purposes. This would likely apply only to large language models (LLMs) developed by QUB staff. However, uploading non-openly licensed papers (such as those not under a Creative Commons license) to third-party tools like ChatGPT or CoPilot may not qualify as non-commercial use. These platforms may access subscription articles they are not legally entitled to and potentially use the data to further train their models, raising significant legal concerns.

Copyright Considerations

If published content is in the public domain or licensed openly, it may be possible to use it with an AI platform. However, even with a Creative Commons license (e.g., CC BY NC ND), which restricts commercial use, sharing this content with an AI could violate the terms, as many AI platforms operate commercially.

What Does the AI Do With the Content?

There is ongoing uncertainty regarding what AI models do with ingested content. It is unclear whether uploaded text is re-ingested into the model for future training. Given that this is a likely outcome, careful consideration is needed when sharing sensitive or copyrighted materials.

Privacy and Ownership

Uploading private data to an AI tool may inadvertently reveal confidential information. For example, survey data uploaded to AI could potentially expose identities, violating confidentiality agreements with participants. Always assess the ownership and privacy implications before using AI for such tasks.

Risk and Judgment Call

Uploading a single article for summary is generally low-risk, but bulk uploading subscription-based content to third-party LLMs is not advisable. It's essential to exercise caution and good judgment in managing AI use with licensed or sensitive materials.

Ethical Considerations in Using Generative AI

Academic Integrity:

The University provides guidance on AI and integrity and academic rigour, and assessment and feedback along with a statement in relation to AI and Research.

Citing AI:

Please see the Citing and Referencing AI section of this guide.

Privacy & Data Concerns:

The University provides guidance on using AI responsibly which includes advice on Privacy and Security

Staff and students are encouraged not to input sensitive information or personal details when using AI tools and models, as AI platforms may store or process this data.

Further information is available from:

JISC - A pathway towards responsible, ethical AI

The Alan Turing Institute Artificial intelligence (Safe and Ethical)