Usage in detail

How customer data gets passed to the LLM

The ClauseBase platform makes use of LLMs in very diverse situations. Because the use of LLMs involves quite some concerns from legal teams and compliance officers, we describe each individual usage of LLMs within ClauseBuddy, including the customer data that gets sent to the LLM.

Note that for each of the situations described below, ClauseBase merely passes along the relevant data to the LLM. In other words, the ClauseBase platform does not store the customer data that it passes to the LLM, except if the user would afterwards deliberately store the results in ClauseBuddy's database.

Moreover (except for some administrative metadata for invoicing purposes) ClauseBase does not monitor any such data, does not want to monitor any such data, is not interested in monitoring any of it, does not know what passes through its system, let alone use it in any way. The customer is itself responsible for making sure that its users act responsibly.

For example, if the user selects two pages of highly confidential text from a share purchase agreement in an MS Word document, and asks ClauseBuddy to summarise that text, then ClauseBuddy will merely pass along that text to the LLM, and present the summary to the user. It is then up to the user to decide what to do with the text — e.g., the user may decide to copy that summary into some new Word document, or perhaps even to store the summary as a new "clause" in a ClauseBuddy library. However, except for what the user decides to do, ClauseBuddy does not remember either the initially selected text or the summary.

Mid March 2025, ClauseBase was granted the abuse monitoring exception by Microsoft, meaning that Microsoft will not store any content submitted to the LLMs, not even for "abuse monitoring" (e.g., to check whether users are looking for tips to hide a body in their garden, or request recipes to build bombs).

Accordingly, when using the LLMs hosted by Microsoft, it is no longer the case that Microsoft keeps a copy of the instruction for a limited period of time, for fraud prevent purposes or to help with authority requests.

For each of the data flows described below, the scenario is basically as follows:

The embedded browser in which ClauseBuddy is running sends some data (e.g., the selected text, the currently opened document, or some other uploaded file) to the ClauseBase-server.
The ClauseBase-server logs some basic metadata (timestamp of the request, requesting user / customer, size of the request), to be used only for administrative purposes such as invoicing.
The ClauseBase-server sends the request to the LLM, usually unmodified but sometimes with some additional information that had to be processed by the ClauseBase-server (e.g., when a DOCX-file was passed on, then it may be the case that that DOCX-file got split into clauses or otherwise converted in another format that allows for easier consumption by the LLM).
The LLM handles the request and sends back its answer to the ClauseBase-server. Other than the limited administrative information the LLM forgets about the request and the answer it formulated.
The ClauseBase-server passes along the reply to ClauseBuddy, usually unmodified but sometimes augmented with some information that can only be handled by a server-environment (e.g., assemble pieces of text into a PDF or DOCX-file).
ClauseBuddy shows the information on the screen.

Clause9

Automatic filename for a clause

Users can instruct Clause9 to automatically draft the filename of a clause on the basis of the clause's contents. No data other than the current clause's contents gets sent to the LLM, in the currently selected language.

ClauseBuddy

Semantic vectors & reranking

Truffle Hunt, AutoSuggest and the Quality Library all rely upon semantic vectors and reranking when storing & processing clauses.

This means that the textual content of clauses gets converted into mathematical vectors (currently 1024-dimensions) that then get stored in a dedicated database table, to facilitate later "smart searches" that can then be requested by end-users, e.g. to retrieve a clause talking about "termination" even when the user would enter a query relating to "stopping the contract".

When performing a smart search, the ClauseBase servers also automatically "rerank" retrieved clauses based on their legal content. This means that the clauses get sent to a dedicated re-ranking server, which reorders them based on their semantic information. This ensures that users get more intelligent search results then what can be achieved by traditional search technology (that mostly relies on the relative frequencies of words).

Both the conversion to semantic vectors and the reranking make use of technologies that are also used within LLM operations. Even so, this does not involve the use of any LLM. Furthermore, for reasons of cost, data sovereignty, control and speed, ClauseBase selected open-source semantic databases and self-hosts the re-ranker and semantic conversions, on dedicated GPU-driven servers operated by ClauseBase. In other words, for none of these operations any outside service ever gets contacted.

"Process results with AI"

In Truffle Hunt, AutoSuggest and the Quality Library's Browse module there is a brain-like button to post-process search results with an LLM.

Essentially, this button takes the current visible clauses and submits them to an LLM for further intelligent processing.

Obviously, those clauses get submitted to the LLM. Note that in practice, due to capacity constraints and time constraints, only the top 100 clauses actually get submitted to the LLM.
The user may also optionally submit an additional prompt with further explanations.
The LLM will then filter and/or reorder the clauses, and return the internal IDs of the filtered/reordered clauses. For example, when 25 clauses would be submitted together with the user's request to "only retain clauses dealing with outsourcing" then the LLM will return the IDs of the clauses that meet that constraint.
The LLM will send back those results to the ClauseBase-server, which will forward the results to the end-user's ClauseBuddy-instance. Other than a minimal log with a timestamp and the fact that the operation took place, the ClauseBase-server will not retain any information.

Doc Chat

The Doc Chat module allows users to interactively "interrogate" an opened document — or one or more other PDF/DOCX files — with the help of LLMs.

With respect to data transmissions, Doc Chat acts as follows

Be aware that LLMs have no short-term memory. This means that the entire chat conversation — i.e., both the user's own prompts, the LLM's answers and (where relevant, see the next bullets) the selected documents — will get resent to the LLM with each and every new question submitted to the LLM. In other words, what gets submitted to the LLM with each question naturally gets longer and longer.
By default — for reasons of cost, speed and data protection — ClauseBuddy will only submit the currently selected text (if any) to the LLM. However, the LLM is requested to signal that it requires the entire document when the question asked by the user cannot be answered solely using the selected text and the chat conversation. Accordingly, the document will frequently (but not necessarily) get sent in its entirety to the LLM with follow-up questions.
Note that it is also possible that the user selects multiple documents, instead of only the currently opened document, e.g. to ask questions about a contract and its amendments and/or annexes. All of these documents get sent together to the LLM.
ClauseBuddy will store the questions asked by the end-user in the LocalStorage of the (embedded) browser in which ClauseBuddy is running. Those questions are not saved on the ClauseBase-server.
Similarly, ClauseBuddy will save the previous chat sessions — i.e., questions & answers, but not the actual documents — in the (embedded) browser's LocalStorage. Taking into account that LocalStorage is limited to 5 or 10 MB in Chrome (Windows) or Safari (Mac), the chat sessions are only saved up to 4 MB in size. Any chat sessions beyond this will get discarded, and upon logout those chat sessions are also removed.

Multi-document Table

The Multi-document Table functionality (accessible through the Doc Chat icon) allows users to ask a series of questions about multiple PDF/DOCX documents at once. The user can then export teh resulting table to DOCX or XLSX.

With respect to data transmissions, Multi-document Table does the following:

Obviously, all documents selected (the currently opened document and uploaded DOCX/PDF files) will get sent to the LLM. In practice, they get sent separately (in separate rounds) to the LLM, in order to reduce the intellectual taxation of the LLM.
Together with each document that gets sent, ClauseBuddy will send the list of questions formulated by the user. The LLM will then respond with:
- Some answer it formulated.
- The reasoning / explanation behind its answer.
- The list of paragraph references on which it based its answers.
The answers only get saved temporarily in the memory of the (embedded) browser in which ClauseBuddy is running. However, do note:
- The user can save the questions, to foster reuse in the future or share interesting question sets with colleagues. Those questions do not contain any answers, however, and it is very atypical for the questions to contain any sensitive data.
- The user can export the answers to either a DOCX or an XLSX file. When doing so, the answers held by the (embedded) browser in which ClauseBuddy is running, will be temporarily sent to the ClauseBase-server. The ClauseBase-server then replies with the downloadable DOCX/XLSX file and immediately forgets the answers sent to it.

Write & Rewrite module

This module consists of several submodules:

Draft new text, on the basis of a prompt
Redraft selection, where the currently selected text in MS Word gets redrafted on the basis of a prompt
AutoCheck selection, where the currently selected text in MS Word gets amended to optimise it in favour of a certain party
Polish selection, where the currently selected text in MS Word gets grammatically reviewed and changed
Open a checklist, where a checklist gets opened through which the end-user can then keep track of human-verified items in the document and (where relevant) redraft selected text parts.

Common processing operations

The different modules operate differently, but from a data perspective, they share the following similarities:

If a prompt can be drafted by the end-user, it will be sent to the LLM, together with the text currently selected in MS Word.
For reasons of layout-optimisation, ClauseBuddy will also select the immediately surrounding paragraphs. Those get sent to the LLM in order to illustrate towards the LLM where the newly drafted (or redrafted) text will get inserted, what the numbering looks like, etc. In the background-prompt, the LLM is instructed to take into account those surrounding paragraphs.
As is always the case, neither the LLM, nor the ClauseBase-servers store any of the information passed to it, except (as further detailed in the next paragraphs) that the prompts get temporarily or permanently stored.

"Find missing topics"

The Find missing topics button of the Draft submodule will send the entire document to the LLM, and asks the LLM to suggest interesting topics that could be added because they're currently missing.

"Automatic suggestions"

The Redraft submodule automatically provides suggestions on how to redraft the current selection. End-users can then click on those suggestions to have them inserted into the prompt.

In order to formulate those suggestions, the LLM gets a copy of the currently selected text in MS Word.

"Include document context"

The Draft and Redraft submodules also contain a checkbox Include document context.

As explained in the manual, this checkbox causes the opened document to get analysed (e.g., to extract the defined terms from the definition list) and summarised upfront by the LLM. The summary will then get passed along to subsequent draft/redraft instructions, in order to let them increase the quality of their ouput.

AutoCheck selection

As explained in the manual, before the optimisations (amendments) get sent to the LLM, the AutoCheck selection will send the entire document to the LLM in a prior phase, in order to extract the names of the relevant parties.

The idea is that the end-user will then choose the party for whom the optimisation must be performed. In the next stage, the selected text is then sent to the LLM in order to have it optimised to the advantage of the selected party.

Storing prompts

The Write & Rewrite submodules store prompts in various ways, at different levels. All of this storage happens within the browser or ClauseBase-servers, but of course those prompts do get sent to the LLMs at various points in time, as explained above.

Administrators can create default prompts (either for the entire organisation, or only for specific groups of users) through the Admin > Write & Rewrite settings panel.
Individual users can explicitly store their prompt explicitly through the "..." upper-right menu. These prompts are saved at the ClauseBase-server for each individual user.

An individual user's recently used prompts are automatically tracked and stored within the user's browser, i.e. they are not saved at the ClauseBase-server and will be lost when the browser-storage would be reset.

Summarise

The Summarise module allows users to summarise selected text from the currently opened text, and/or summarise entire uploaded DOCX/PDF files.

The following data is processed by this module:

Obviously, the text selected for summarisation (either in the currently opened document, or the DOCX/PDF file that gets uploaded) gets sent to the LLM. The LLM will create the summary, send it to the ClauseBase-server, which will on its turn stream the result to the end-user's ClauseBuddy instance.
When the resulting summary gets inserted into the opened DOCX-file, no ClauseBase-server is involved. Instead, when the Export gets used, the summary gets sent to the ClauseBase-server, which pastes the plain text into either a default base DOCX-file, or into the user's customer DOCX-base file for reporting.
Users are able to store the structure of the summary (e.g., which data to extract, in which order, ...) to foster future reuse and/or sharing with colleagues. However, it is very atypical for this structure to contain any sensitive data.

Draft entire document

ClauseBuddy also allows users to draft an entire document on the basis of a prompt.

Initially the LLM is provided with the first prompt of the user (e.g., "Draft me a short consultancy agreement between client X and counterparty Y").
The LLM will subsequently draft a table of contents and this to ClauseBuddy.
The user can then choose to fill individual clauses with either content from his own clause library (for which the LLM doe not get involved), or content drafted by the LLM following a new instruction. In the latter case, the LLM gets sent the new prompt.
Users can also ask the LLM to provide suggestions for redrafting existing clauses within the table of contents, or for adding subclauses. In such case, the LLM gets sent the content of the current clause.

Smart Merge

The Smart Merge operation allows users to intelligently merge two clauses (or selections of text) in an intelligent way, with the help of an LLM. This module appears in various locations throughout ClauseBase, within the Insertion menu (typically a big round plus button).

Essentially, the Smart Merge operation sends two different clauses to the LLM and then asks the LLM to extract relevant legal features from each clause:

The LLM will then respond with those legal features, which get passed on to the ClauseBase-server, which then forwards them to ClauseBuddy.

The user can them "mix" those features and request the LLM to redraft a mixed version. When doing so, ClauseBuddy sends the original texts plus the selected & deselected legal features to the ClauseBase server, which passes this information to the LLM. The LLM then replies with a newly drafted clause, which gets forwarded by the ClauseBase server to ClauseBuddy:

Other than those clauses and some administrative information (timestamp and size of the clauses), no information is retained.

Deep Compare

The Deep Compare module will intelligently line up the active document with some other document that the user uploads. The user can then more easily see the legal differences between various clauses.

In practice, the data flow is as follows:

The first document gets split into clauses and then each of those clauses is submitted to the LLM, together with the request to provide a summary of that clause. The LLM will then pass on the summaries to the ClauseBase-server, which on its turn will pass on the summaries to ClauseBuddy.
Next, the same is done for the second document.
ClauseBuddy will then take the individual summaries and pass them back to the LLM, with the request to perform a lineup in order to find matching summaries. The LLM then sends back the alignment data (e.g., "clause X of document 1 matches with clauses Y and Z of document 2") to the ClauseBase-server, which on its turn passes that data to ClauseBuddy.
Other than some administrative information (timestamp, size of the files), no information is retained. When the user would close ClauseBuddy, the entire operation must be repeated.

Automatic filename for a clause

Users can instruct ClauseBuddy to automatically draft the filename of a clause, on the basis of a summary of the clause's contents, by clicking on the "Summary" button.

Alternatively, users can click on the "Keywords" button to draft a filename as a set of five keywords.

In both cases, no customer data other than the current clause's contents gets sent to the LLM, in the currently selected language.

Anonymisation of a clause's body

Users can instruct ClauseBuddy to automatically anonymise the body of a clause, to remove typical confidential data (e.g., customer names, addresses, etc.).

Only the body of the currently selected clause will be sent to the LLM; no other customer data gets sent.

Please note the irony of this anonymisation feature. Anonymisation is actually a very hard problem, for which a significant level of intelligence is required from AI. Accordingly, only the latest AI-models (such as GPT4) are reasonably capable of this task. At the same time, many legal experts fear exactly those AI-models for confidentiality reasons.

Automatically determine attributes

ClauseBuddy can automatically guess relevant "attributes" (metadata) for each clause.

When the "Automatic" button gets clicked, the currently selected clause body, as well as the list of all potentially relevant but yet unused attributes, gets sent to the LLM. The LLM will then respond with a subset of relevant attributes.

Change clause in the library

Users can ask the LLM to redraft clauses stored within ClauseBuddy, by submitting a prompt.

The current contents of that clause will then get sent to the LLM, along with the prompt.
Optionally, users may also ask the LLM to automatically or semi-automatically adapt the terminology of the clause, so that it gets aligned with the terminology of the currently opened document in MS Word. In such case:
- the currently opened document in MS Word will be sent to the ClauseBase platform, in order to extract the relevant terminology.
- the LLM will only receive a list of the terminology that was compiled by the ClauseBase platform. In other words: the LLM does not receive a copy of the currently opened document.

Document review

ClauseBuddy's full document review feature allows users to request the LLM to review their currently opened document, on the basis of the user's own reviewing rules.

When performing such review, the document's contents will obviously be sent to the LLM, together with the rule set selected by the user.

No involvement of an LLM

For the avoidance of doubt: in the following scenarios, no LLM is involved. Instead, only the ClauseBase server is involved:

PDF conversion

When PDF-files get uploaded to ClauseBuddy, they must be converted into DOCX, because ClauseBuddy and the ClauseBase-servers cannot handle PDF-files directly.

While it is becoming increasingly common to convert PDF-files to text using LLMs, ClauseBase deliberately uses a traditional OCR-package (Abbyy) that does not rely on LLMs. The reasons are a mix of data protection and layout fidelity (the current LLMs tend to only return plain text, without layout).

The PDF-conversions are sent to a dedicated OCR-server hosted by ClauseBase in Finland. This OCR-server will receive the PDF-file, convert it to DOCX and then immediately forget the result. Other than the timestamp, some internal UUID, the amount of pages processed (for licensing reasons) and the success/failure of the operation, this server does not retain any information about the PDF-document submitted to it.

If the conversion was successful, the PDF-server will send the resulting DOCX-file to the ClauseBase-server that instructed the conversion. That ClauseBase-server will then only store some administrative information (UUID, requesting customer, timestamp, number of pages) and forget the rest of the file, and subsequently pass on the DOCX-file to the end-user's ClauseBuddy instance.

Proofreading & definitions

When the currently opened document is being proofread, or its definitions are being analysed, the entire document gets sent to the ClauseBase server. As will be evident on the basis of the speed of the analysis (usually less than a few seconds for even a 50 page document), this does not currently involve the use of any LLM.

Semantic Search

Users can search within their currently opened Word-document for text that is semantically related to a search term.

The contents of the entire document gets sent to the ClauseBase platform for semantic analysis. The ClauseBase platform has its own local semantic vector database, so does not involve any third party LLM in this analysis.

Truffle Hunt text extraction

ClauseBuddy can automatically extract clauses from uploaded documents (DOCX, PDF or scans). Those documents get sent to the ClauseBase platform for clause extraction purposes, but — as will also be evident from the high speed of analysis — no LLM is involved.

AutoSuggest

ClauseBuddy's AutoSuggest feature will present clauses that are semantically related to the currently selected clause in the currently opened MS Word document.

ClauseBuddy will sent the currently selected paragraph to the ClauseBase platform for analysis and semantic search, but no LLM gets involved. (Also here, speed is one of the determining factors: the search results are usually presented in less than 0.3 seconds).

Smart Templates

ClauseBuddy's Smart Templates feature only makes use of LLMs for automatically generating questions & so-called "cards" on the basis of the cyan-highlighted text fragments inside of the DOCX file that got uploaded to ClauseBase's server.

In this situation, paragraphs that contain cyan highlights may get sent to the LLM. (Behind the scenes, ClauseBuddy chooses a set of paragraphs: if several paragraphs contain a certain cyan-highlighted identifier, then maximum two of them will ultimately get sent to the LLM, in order to not overload the LLM.)

Text comparisons

The Text Compare and Bulk Compare options do not make use of any LLM. They rely on traditional text comparison algorithms to compare the uploaded texts, executed on a ClauseBase-server.

The various other text comparison options available throughout ClauseBuddy preferably perform the text comparison within the embedded browser's memory. However, when the comparison is made against text selected in MS Word, that comparison gets sent to the ClauseBase-server, in order to use exactly the same algorithm as the algorithm that would get used when the user executes the Insert with changes command (which inserts new text into the opened MS Word file with "track changes").

Bulk Operations

The Bulk Operations module allows users to perform various operations in bulk, i.e. on many paragraphs and even many documents at once. Examples include extracting text, replacing headers or footers, concatenating DOCX-files into one PDF-file, etc.

Currently, none of the many available processing operations involve the use of an LLM. Instead, they are all executed locally by a ClauseBase-server.

Clause9 templates

The advanced full-document automation features of Clause9 currently only use LLMs for automatically creating the title of a clause on the basis of that clause's contents.

For completing a template, the use of an LLM does not make much sense, as this would be too slow and too unpredictable.

Updates

19 March 2025: added the Microsoft abuse monitoring exception
2 April 2025: significant updates to this page in order to reflect the recent changes made to ClauseBuddy (Doc Chat, Write & Rewrite, Summarise, Bulk Operations, PDF conversion, semantic vectorisation & reranking).

PreviousNo training / finetuning NextSummary of security features

Last updated 2 months ago