AI vs Human — Detect LLM generated text
First article of 2024! Let’s detect LLM generated text!
In this article, I will walk you through a step-by-step guide on how you can create a deep learning model able to detect if a text is generated by an LLM or written by a human. I’ll provide you access to a few HuggingFace datasets, and to improve our performance, we’ll apply BERT. Can you detect if this text was generated by an LLM or not?
What is BERT?
BERT stands for Bidirectional Encoder Representations from Transformers and is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide
range of tasks, such as question answering and language inference, without substantial task specific architecture modifications.
Source: https://arxiv.org/pdf/1810.04805.pdf
Load the dataset
Before we go more in depth into BERT, let’s start by loading the datasets first. We are going to work with two unique datasets from HuggingFace. One of them is the Ivvy Panda essays, which will provide us with human written text. The other dataset is the Big Brain 4k, which offers LLM generated text. To load the datasets, we need to have the datasets
package installed.
!pip install datasets
Now we can download our data. In every ML project, data is the key ingredient.
from datasets import load_dataset
generated_dataset = load_dataset("perlthoughts/big-brain-4k")
ivvy_dataset = load_dataset("qwedsacf/ivypanda-essays")
Preprocess the dataset
Data processing is the next step, which comes after loading the datasets. Not all data arrives in the format we would like it to. We must thus clean, process, and arrange them.
We are going to convert the datasets into DataFrame
, drop the unnecessary columns and add an additional column named generated
to represent if the text is written by a human or generated by an LLM. If the text is written by a human, the the value of generated
is 0
. On the contrary, if the text is generated by an LLM, the value of generated
is 1
.
import pandas as pd
generated_df = pd.DataFrame(generated_dataset['train'])
generated_df = generated_df.drop(columns = {'system', 'prompt'})
generated_df = generated_df.rename(columns={'output': 'text'})
generated_df['generated'] = 1
ivvy_df = pd.DataFrame(ivvy_dataset['train'])
ivvy_df = ivvy_df.drop(columns = {'SOURCE', '__index_level_0__'})
ivvy_df = ivvy_df.rename(columns = {'TEXT':'text'})
ivvy_df['generated'] = 0
Up until this point, we have created the DataFrame
for each of the datasets. We have dropped the unnecessary columns and added the generated
column with the respective value. Now let’s combine the two DataFrames
into a single result_df
. In addition, we shuffle the result_df
a little bit so that the values are not in chronological order.
result_df = pd.concat([generated_df,
ivvy_df
], ignore_index=True, sort=False)
result_df = result_df.dropna(axis=1)
result_df = result_df.sample(frac=1, random_state=42).reset_index(drop=True)
Split the dataset
Next step is to split a dataset into training and validation sets. In our case, test_size=0.2
means that 20% of the data will be used for validation, and the remaining 80% will be used for training. train_text
and train_labels
represent the training set’s input text and corresponding labels, while val_text
and val_labels
represent the validation set’s input text and corresponding labels.
from sklearn.model_selection import train_test_split
train_text, val_text, train_labels, val_labels = train_test_split(result_df["text"].to_numpy(),
result_df["generated"].to_numpy(),
test_size=0.2)
Moreover, we create a more efficient type of data by utilizing TensorFlow’s tf.data.Dataset
module. The from_tensor_slices
method is employed to convert these arrays into slices along the first dimension, creating pairs of text and label elements. Subsequently, the batch(128)
operation is applied to group these pairs into batches of size 128, facilitating efficient processing during training. Lastly, the prefetch(tf.data.AUTOTUNE)
call is used to optimize the pipeline’s performance by prefetching data asynchronously during training, reducing potential input pipeline bottlenecks.
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((train_text, train_labels)).batch(128).prefetch(tf.data.AUTOTUNE)
val_dataset = tf.data.Dataset.from_tensor_slices((val_text, val_labels)).batch(128).prefetch(tf.data.AUTOTUNE)
Build the model
Note: To run BERT, it is necessary to install tensorflow_text
package and import it.
!pip install tensorflow_text
import tensorflow_text as text # Registers the ops.
Firstly, we define the input layer using tf.keras.layers.Input
to handle text data. Then, we use BERT’s preprocessor from TensorFlow Hub to tokenize and preprocess the input text. The preprocessor is applied to the input using preprocessor(inputs)
. We create the BERT encoder layer which utilizes the BERT encoder from TensorFlow Hub to obtain embeddings for the input text. The encoder is applied to the preprocessed inputs using encoder(encoder_inputs)
. I introduced a single Dense
layer, so that this model serves as a baseline. Remember, start small and grow big. The final dense layer with a sigmoid activation function (tf.keras.layers.Dense(1, activation='sigmoid', name='classification')
) for binary classification. The complete model is constructed using tf.keras.models.Model
with the specified input and output layers.
inputs = tf.keras.layers.Input(shape=[], dtype=tf.string, name="input_layer")
preprocessor = hub.KerasLayer("https://kaggle.com/models/tensorflow/bert/frameworks/TensorFlow2/variations/en-uncased-preprocess/versions/3")
encoder_inputs = preprocessor(inputs)
encoder = hub.KerasLayer('https://www.kaggle.com/models/tensorflow/bert/frameworks/TensorFlow2/variations/bert-en-uncased-l-10-h-128-a-2/versions/2', trainable=False)
outputs = encoder(encoder_inputs)
pooled_output = outputs["pooled_output"]
sequence_output = outputs["sequence_output"]
x = tf.keras.layers.Dense(32, activation='relu')(pooled_output)
outputs = tf.keras.layers.Dense(1, activation='sigmoid', name='classification')(x)
text_clf = tf.keras.models.Model(inputs=inputs, outputs=outputs)
Compile the model
By calling compile
, we configure the model for training, and it’s ready to be trained using the specified loss, optimizer, and metrics. Binary cross-entropy is suitable for problems where each instance can belong to only one of the two classes. The optimizer 1e-3
determines how the model’s weights are updated during training to minimize the defined loss. The model will be evaluated based on accuracy, which is a common metric for classification problems. Accuracy measures the proportion of correctly classified instances out of the total instances.
text_clf.compile(loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), metrics=['accuracy'])
Fit the model
Finally, we train the text classification model. In the our code, training is configured for 10 epochs (epochs=10
), and validation is performed on the validation dataset (validation_data=val_dataset
). Additionally, a callback is specified to save the best model during training using the ModelCheckpoint
callback.
text_clf.fit(train_dataset,
epochs=10,
validation_data=val_dataset,
callbacks=[
tf.keras.callbacks.ModelCheckpoint(save_best_only=True, filepath='/output/best')
Finetuning the model
Try varying hyperparameters like regularization, batch size, and learning rate. The model’s performance may be strongly impacted by these modifications. In addition, if additional power is required for the model, add more Dense
layers with more units.
Thank you for reading! If you liked the article, please share it.
If you want to read more article like this, checkout my blog.
If you want to reach out to me, checkout my contact page.
Follow me on Twitter!