No training / finetuning

Does ClauseBase use my data for training AI models?

Short answer: NO

ClauseBase is not in the business of creating its own AI model, so we do not use your data at all for AI modelling / training / finetuning, and have no plans in this direction.

The only LLM-related processing we do with your data is so-called "prompt-engineering" in the background, passing on carefully selected fragments of your data to the LLM in order to get an answer.

Long answer

What we currently do at ClauseBase, and will even extend in the future, is to allow you to store your legal data on our platform, and then feed a subset of that data to Microsoft’s GPT-4 engine as part of the prompt that gets submitted to GPT-4.

For example, if you have stored 100 fragments of text about corporate liability, then you can take a subset of those fragments (e.g., only the ones that talk about personal liability of directors) and feed those fragments to GPT-4 and ask it to draft a new text based on that input.

This process of "prompt engineering", where we selectively feed data to GPT-4 through text in a prompt, is technically completely different than “training a model” (or its lightweight alternative of "finetuning").

  • Prompt feeding is limited to a carefully selected subset of your data, typically a few thousand words (quality first, less is often more). Conversely, "training" a model usually concerns all your data, typically millions of words (quantity first: the more the better); similarly, "finetuning" a model will involve significant parts of your data.

  • Prompt feeding does not leave any "residue" at the LLM: once the answer is returned by GPT-4, Microsoft essentially guarantees that GPT-4 will forget about the prompt it was fed. Conversely, the very goal of model training & finetuning is to create a new, permanent file (a "model").

  • Prompt feeding is constrained to what the customer himself uploaded in the past: the legal data of customer X will never be used for the prompt feeding purposes of customer Y. Conversely, taking into account the enormous resources required for model training, the goal of model training & finetuning is to reuse content among customers as much as possible.

  • Prompt feeding puts the end-user in control: it only happens when the end-user asks a question from GPT-4, and then requires the end-user to make a selection, so that only a subset of all data is being fed to GPT-4. Conversely, model training & finetuning works silently in the background, takes all existing data, and digests a trained model from this.

Read more

The difference between "prompt engineering", finetuning and training is in no way special or specific for the ClauseBase platform. You can read more about this on the following locations:

Last updated