No training / finetuning
Does ClauseBase use my data for training AI models?
Short answer: NO
ClauseBase is not in the business of creating its own AI model, so we do not use your data at all for AI modelling / training / finetuning, and have no plans in this direction.
The only LLM-related processing we do with your data is so-called "prompt-engineering" in the background, passing on carefully selected fragments of your data to the LLM in order to get an answer.
Long answer
What we currently do at ClauseBase, and will even extend in the future, is to allow you to store your legal data on our platform, and then feed a subset of that data to the LLM engine as part of the prompt that gets submitted.
For example, if you have stored 100 fragments of text about corporate liability, then you can take a subset of those fragments (e.g., only the ones that talk about personal liability of directors) and feed those fragments to the LLM and ask it to draft a new text based on that input.
This process of "prompt engineering", where we selectively feed data to the LLM through text in a prompt, is technically completely different than “training a model” (or its lightweight alternative of "finetuning").
Prompt feeding is limited to a carefully selected subset of your data, typically a few thousand words (quality first, less is often more). Conversely, "training" a model usually concerns all your data, typically millions of words (quantity first: the more the better); similarly, "finetuning" a model will involve significant parts of your data.
Prompt feeding does not leave any "residue" at the LLM: once the answer is returned by the LLM, the LLM will immediately forget this answer and the information fed into it, due to the technical way in which an LLM operates (i.e., adding information to an LLM requires retraining or finetuning; an LLM does not have any short-term memory). In addition, the LLM vendor essentially guarantees that the LLM will forget about the prompt it was fed. Conversely, the very goal of model training & finetuning is to create a new, permanent file (a "model").
Prompt feeding is constrained to what the customer himself uploaded in the past: the legal data of customer X will never be used for the prompt feeding purposes of customer Y. Conversely, taking into account the enormous resources required for model training, the goal of model training & finetuning is to reuse content among customers as much as possible.
Prompt feeding puts the end-user in control: it only happens when the end-user asks a question from the LLM, and then requires the end-user to make a selection, so that only a subset of all data is being fed to the LLM. Conversely, model training & finetuning works silently in the background, takes all existing data, and digests a trained model from this.
Read more
The difference between "prompt engineering", finetuning and training is in no way special or specific for the ClauseBase platform. You can read more about this on the following locations:
Updates
2nd April 2025:
replaced references to GPT4o by generic references to "the LLM"
explained that, due to their technical operation mode, LLMs have no short-term memory
Last updated