GPT-4chan (2022)

Seong · 4 June 2022 14:05

Summary: Yannic Kilcher, a machine learning researcher and YouTuber, trained a text generation model using a /pol/ dataset, and then deployed it on the actual board using 10 bots. The bots' posts convinced most users, and at one point constituted over 10% of the board's total posts.

Details:

Model Description

GPT-4chan is a language model fine-tuned from GPT-J 6B on 3.5 years worth of data from 4chan's politically incorrect (/pol/) board.

Training data

GPT-4chan was fine-tuned on the dataset Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board.

Training procedure

The model was trained for 1 epoch following GPT-J's fine-tuning guide.

Intended Use

GPT-4chan is trained on anonymously posted and sparsely moderated discussions of political topics. Its intended use is to reproduce text according to the distribution of its input data. It may also be a useful tool to investigate discourse in such anonymous online communities. Lastly, it has potential applications in tasks suche as toxicity detection, as initial experiments show promising zero-shot results when comparing a string's likelihood under GPT-4chan to its likelihood under GPT-J 6B.

Limitations and Biases

This is a statistical model. As such, it continues text as is likely under the distribution the model has learned from the training data. Outputs should not be interpreted as "correct", "truthful", or otherwise as anything more than a statistical function of the input. That being said, GPT-4chan does significantly outperform GPT-J (and GPT-3) on the TruthfulQA Benchmark that measures whether a language model is truthful in generating answers to questions.

The dataset is time- and domain-limited. It was collected from 2016 to 2019 on 4chan's politically incorrect board. As such, political topics from that area will be overrepresented in the model's distribution, compared to other models (e.g. GPT-J 6B). Also, due to the very lax rules and anonymity of posters, a large part of the dataset contains offensive material. Thus, it is very likely that the model will produce offensive outputs, including but not limited to: toxicity, hate speech, racism, sexism, homo- and transphobia, xenophobia, and anti-semitism.

Due to the above limitations, it is strongly recommend to not deploy this model into a real-world environment unless its behavior is well-understood and explicit and strict limitations on the scope, impact, and duration of the deployment are enforced.