AI vs Human — Detect LLM generated text

Lejdi Prifti
5 min readJan 8, 2024

First article of 2024! Let’s detect LLM generated text!

In this article, I will walk you through a step-by-step guide on how you can create a deep learning model able to detect if a text is generated by an LLM or written by a human. I’ll provide you access to a few HuggingFace datasets, and to improve our performance, we’ll apply BERT. Can you detect if this text was generated by an LLM or not?

What is BERT?

BERT stands for Bidirectional Encoder Representations from Transformers and is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide
range of tasks, such as question answering and language inference, without substantial task specific architecture modifications.

Source: https://arxiv.org/pdf/1810.04805.pdf

Load the dataset

Before we go more in depth into BERT, let’s start by loading the datasets first. We are going to work with two unique datasets from HuggingFace. One of them is the Ivvy Panda essays, which will provide us with human written text. The other dataset is the Big Brain 4k, which offers LLM generated text. To load the datasets, we need to have the datasets package installed.

!pip install datasets

Now we can download our data. In every ML project, data is the key ingredient.

from datasets import load_dataset
generated_dataset = load_dataset("perlthoughts/big-brain-4k")
ivvy_dataset = load_dataset("qwedsacf/ivypanda-essays")

Preprocess the dataset

Data processing is the next step, which comes after loading the datasets. Not all data arrives in the format we would like it to. We must thus clean, process, and arrange them.

We are going to convert the datasets into DataFrame, drop the unnecessary columns and add an additional column named generated to represent if the text is written by a human or generated by an LLM. If the text is written by a human, the the value of generated is 0. On the contrary, if the text is generated by an LLM, the value of generated is 1.

import pandas as pd
generated_df = pd.DataFrame(generated_dataset['train'])
generated_df = generated_df.drop(columns = {'system', 'prompt'})
generated_df = generated_df.rename(columns={'output': 'text'})
generated_df['generated'] = 1
ivvy_df = pd.DataFrame(ivvy_dataset['train'])
ivvy_df = ivvy_df.drop(columns = {'SOURCE', '__index_level_0__'})
ivvy_df = ivvy_df.rename(columns = {'TEXT':'text'})
ivvy_df['generated'] = 0

Up until this point, we have created the DataFrame for each of the datasets. We have dropped the unnecessary columns and added the generated column with the respective value. Now let’s combine the two DataFrames into a single result_df. In addition, we shuffle the result_df a little bit so that the values are not in chronological order.

result_df = pd.concat([generated_df,
ivvy_df
], ignore_index=True, sort=False)
result_df = result_df.dropna(axis=1)
result_df = result_df.sample(frac=1, random_state=42).reset_index(drop=True)

Split the dataset

Next step is to split a dataset into training and validation sets. In our case, test_size=0.2 means that 20% of the data will be used for validation, and the remaining 80% will be used for training. train_text and train_labels represent the training set’s input text and corresponding labels, while val_text and val_labels represent the validation set’s input text and corresponding labels.

from sklearn.model_selection import train_test_split
train_text, val_text, train_labels, val_labels = train_test_split(result_df["text"].to_numpy(),
result_df["generated"].to_numpy(),
test_size=0.2)

Moreover, we create a more efficient type of data by utilizing TensorFlow’s tf.data.Dataset module. The from_tensor_slices method is employed to convert these arrays into slices along the first dimension, creating pairs of text and label elements. Subsequently, the batch(128) operation is applied to group these pairs into batches of size 128, facilitating efficient processing during training. Lastly, the prefetch(tf.data.AUTOTUNE) call is used to optimize the pipeline’s performance by prefetching data asynchronously during training, reducing potential input pipeline bottlenecks.

import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((train_text, train_labels)).batch(128).prefetch(tf.data.AUTOTUNE)
val_dataset = tf.data.Dataset.from_tensor_slices((val_text, val_labels)).batch(128).prefetch(tf.data.AUTOTUNE)

Build the model

Note: To run BERT, it is necessary to install tensorflow_text package and import it.

!pip install tensorflow_text
import tensorflow_text as text # Registers the ops.

Firstly, we define the input layer using tf.keras.layers.Input to handle text data. Then, we use BERT’s preprocessor from TensorFlow Hub to tokenize and preprocess the input text. The preprocessor is applied to the input using preprocessor(inputs). We create the BERT encoder layer which utilizes the BERT encoder from TensorFlow Hub to obtain embeddings for the input text. The encoder is applied to the preprocessed inputs using encoder(encoder_inputs). I introduced a single Dense layer, so that this model serves as a baseline. Remember, start small and grow big. The final dense layer with a sigmoid activation function (tf.keras.layers.Dense(1, activation='sigmoid', name='classification')) for binary classification. The complete model is constructed using tf.keras.models.Model with the specified input and output layers.

inputs = tf.keras.layers.Input(shape=[], dtype=tf.string, name="input_layer")
preprocessor = hub.KerasLayer("https://kaggle.com/models/tensorflow/bert/frameworks/TensorFlow2/variations/en-uncased-preprocess/versions/3")
encoder_inputs = preprocessor(inputs)
encoder = hub.KerasLayer('https://www.kaggle.com/models/tensorflow/bert/frameworks/TensorFlow2/variations/bert-en-uncased-l-10-h-128-a-2/versions/2', trainable=False)
outputs = encoder(encoder_inputs)
pooled_output = outputs["pooled_output"]
sequence_output = outputs["sequence_output"]
x = tf.keras.layers.Dense(32, activation='relu')(pooled_output)
outputs = tf.keras.layers.Dense(1, activation='sigmoid', name='classification')(x)
text_clf = tf.keras.models.Model(inputs=inputs, outputs=outputs)

Compile the model

By calling compile, we configure the model for training, and it’s ready to be trained using the specified loss, optimizer, and metrics. Binary cross-entropy is suitable for problems where each instance can belong to only one of the two classes. The optimizer 1e-3 determines how the model’s weights are updated during training to minimize the defined loss. The model will be evaluated based on accuracy, which is a common metric for classification problems. Accuracy measures the proportion of correctly classified instances out of the total instances.

text_clf.compile(loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), metrics=['accuracy'])

Fit the model

Finally, we train the text classification model. In the our code, training is configured for 10 epochs (epochs=10), and validation is performed on the validation dataset (validation_data=val_dataset). Additionally, a callback is specified to save the best model during training using the ModelCheckpoint callback.

text_clf.fit(train_dataset,
epochs=10,
validation_data=val_dataset,
callbacks=[
tf.keras.callbacks.ModelCheckpoint(save_best_only=True, filepath='/output/best')

Finetuning the model

Try varying hyperparameters like regularization, batch size, and learning rate. The model’s performance may be strongly impacted by these modifications. In addition, if additional power is required for the model, add more Dense layers with more units.

Thank you for reading! If you liked the article, please share it.

If you want to read more article like this, checkout my blog.

If you want to reach out to me, checkout my contact page.

Follow me on Twitter!

--

--

Lejdi Prifti

Software Developer | ML Enthusiast | AWS Practitioner | Kubernetes Administrator