Detection of texts generated by LLM. Is it possible to differentiate between… | by Sergei Savvov | September 2023



The proposed watermarking strategy reduces the likelihood of specific words appearing in LLM results by reducing their probabilities, essentially creating a “to avoid list”. If a text contains these low-probability words, it is likely human-generated, because LLMs are programmed to bypass these terms.

The top-k overlay using the GLTR visualization tool. There is a notable difference between the two texts.

Post hoc watermark

Post-hoc watermarks insert a hidden message in the LLM generated text for later verification. To check the watermark, we can extract this hidden message from the text in question. These watermarking methods mainly fall into two categories: rule-based and neural-based.

Inference time watermark

This method modifies the selection of words during a decoding phase. The model creates a probability distribution for the next word and embeds a watermark by fine-tuning this selection process. Specifically, a hash code generated from a previous token classifies the vocabulary into “green list” And “red list” words, with the next token chosen from the green list.

In the illustration below, a random seed is generated by hashing the previously predicted token “a”, dividing the entire vocabulary into “green list” and “red list”. The next “all-in” token is chosen from the green list.

Inference time watermark illustration, source


Implementing this approach assumes that users are working with a modified model, which is unlikely. Additionally, if the avoid list is public, one could manually add these words to the AI-generated text to evade detection.

You can try generating text with watermarks using the Hugging Space Gradio Demo or you can consult the GitHub repository to run Python scripts on a personal machine.


The core of this approach is to create a binary classifier. The task is similar to traditional machine learning, where the accuracy of the model depends on the variety of data and the quality of the feature set.

I’ll delve deeper into several interesting implementations.

1. DetectGPT

DetectGPT calculates the log-probabilities of tokens in a text, considering the conditional probabilities on previous tokens. By multiplying these conditional probabilities, we obtain the joint probability of the text.

The method then modifies the text and compares the probabilities. If the probability of the new text is significantly lower than the original, the original text is probably generated by the AI. If the odds are similar, the text is likely human-generated.

Schematic representation of the DetectGPT method

Paper & Demo (for GPT-2) & GitHub

2. GPTZero

GPTZero demo

GPTZero is a linear regression model designed to estimate text perplexity. Perplexity is linked to the log-probability of the text, much like in DetectGPT. It is calculated as the negative log probability exponent of the text. Lower perplexity indicates less random text. LLMs aim to reduce perplexity by maximizing text probability.

Unlike previous methods, GPTZero does not label text as AI-generated or not. It provides a perplexity score for comparative analysis.

Blog & GPTZero demo & Open source version

3. ZipPy

ZipPy uses Lempel-Ziv-Markov (LZMA) compression ratio to measure the novelty/perplexity of input samples against a small corpus (< 100 KB) of AI-generated text.

In the past, compression was used as a simple anomaly detection system. By running the network event logs through a compression algorithm and evaluating the resulting compression ratio, it is possible to assess the level of novelty or anomaly of the entry. A significant change in input will result in a lower compression ratio, serving as an alert for anomalies. Conversely, recurring background event traffic, already accounted for in the dictionary, will produce a higher compression ratio.

Blog & GitHub

4. OpenAI (no longer available)

On January 31, OpenAI launched a new tool to detect AI-generated text. According to Site description, the model is a refined GPT variant that uses binary classification. The training dataset includes both human and AI-written text passages.

However, the service is no longer available at the moment:

What happened: OpenAI updated its original blog post, stating that it was suspending the classifier due to low prediction accuracy:

As of July 20, 2023, the AI ​​classifier is no longer available due to its low accuracy rate.


The main limitation of black box methods is their rapid degradation. There is a constant need to improve the algorithm to meet new quality standards. This problem is particularly visible in light of OpenAI’s recent decision to shut down its detector.


I decided to conduct my own evaluation to see how well some services detect generated text. To do this, I generate and paraphrase text using different models (GPT-3.5, GPT-4, LLaMA-2 70B, QuillBot). I compiled a dataset containing 10 stories for each item:

  • History: “Write me a short story about…”
  • Tell : “Rewrite and paraphrase the story…”
  • E-mail: “Write me a quick email to Paul and ask him…”
  • Mimicry: “Explain to me like a little child what…”

I was amazed by the results; only a few services have managed to accurately identify the generated text:

Accuracy of paid services for generated text detection

If you want to explore further research in this area, I suggest you read the article “LLM Generated Text Detection in Computer Science Education», which already offers a complete comparison of these services. Here’s what the article had to say:

Our results show that CopyLeaks is the most accurate LLM-generated text detector, GPT kit is the best LLM-generated text detector for reducing false positives, and GLTR is the most resilient LLM-generated text detector… Finally, note that all LLM-generated text detectors are less accurate with code and other languages (apart from English), and after using paraphrasing tools (as QuillBot).

Here is the full list of services currently available:

Related Articles:

  1. Can we detect AI-generated text? by Salvatore Raieli
  2. Checking facts generated by the LLM
  3. The science of detecting texts generated by LLM
  4. On detecting whether text was generated by a human or an AI language model
  5. LLM Generated Text Detection in Computer Science Education


Source link

Related Articles

Back to top button