Lexbe logo spacer Login
Lexbe Home About Us Services FAQ Support
Litigation Support Software and Technology
litigation case and document management software and support to a friend

Concept Search in Litigation Support

What is it?
'Concept search' in litigation refers to the search of electronic documents on the basis of ideas they contain, rather than just specific keywords.  Concept searching is usually implemented by broadening a keyword-based search to include synonyms or using a thesaurus to include results related to the ideas in the search keywords, even though not directly derived from the keyword search term. 

Concept searching can be helpful in analyzing documents in a legal proceeding because a search based on 'concepts' may include results from relevant documents that might otherwise be missed in a standard keyword search.  Standard keyword searches will return a positive result only if the exact keyword or a close derivative is specified.  Search derivatives returned by litigation support search engines commonly include 'stemming' and 'fuzzy searches'.  Stemming includes grammatical variations on a word, such that a search for "applied" would also return "applying", "applies", and "apply".  Fuzzy searches return results even if the text to be searched is slightly misspelled. Fuzzy searches are helpful in returning a result even if the original text has been corrupted thorough an optical character recognition (OCR) error, which is common in scanned documents.      

Why is Concept Searching Important?
Research has shown that simple keyword searches may not be sufficient to return many potentially relevant documents.  This is because lawyers and litigation teams may be unable, despite their efforts and best intentions, to think of all the search keyword terms that might result in relevant documents. 

A study conducted the 1980s casts doubt on the ability of litigation teams to accurately determine a set of comprehensive search terms that will return all or even most potentially relevant documents.  Attorneys and paralegals involved in a subway accident case used a keyword methodology to search a discovery database consisting of 350,000 pages in 40,000 documents. The litigation team believed that they had located about at least 75% of the relevant documents. A separate manual review of documents was conducted and found that the litigation team had only identified 20% of potentially discoverable documents through keyword searching alone. Blair & Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System, 28 Com. A.C.M. 289 (1985) (publication of the Association for Computing Machinery).  Many of the missed documents included terms that the litigation team had not anticipated.  Participants working for the subway in internal communications referred to “the unfortunate incident,” while victims referred to it as the “disaster”. Relevant documents also included oblique references to the “event,” “incident,” “situation,” “problem,” and “difficulty.”  Many documents were missed that might have been identified with a thorough manual search or more inclusive searching approaches, like concept searching. 

Manual Review is No Panacea
The influential Sedona Conference has discussed these issues in its recently released Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (August 2007).  The Sedona Conference authors note that while most lawyers consider "manual review [to be]  the gold standard by which all searches should be measured," that in fact manual search is no panacea and a manual review methodology inevitably results in its own errors of missed documents.  "Human review of documents in discovery is expensive, time consuming, and error-prone. There is growing consensus that the application of linguistic and mathematic-based content analysis, embodied in new forms of search and retrieval technologies, tools, techniques and process in support of the review function can effectively reduce litigation cost, time, and error rates."  While keyword searching alone can miss many documents, so does manual review.  A mixed approach is often best with construction of better search keywords through an iterative methodology of computer searches and then manual review of returned documents to develop an expanded list of keywords.  

The Sedona Conference and Courts Support Concept Searching
The authors of the Sedona commentary also maintain that simple keyword searches can be substantially improved by "using conceptual searching [emphasis added], which makes use of taxonomies and ontologies assembled by linguists; and using other machine learning and text mining tools that employ mathematical probabilities."  Concept searching has gained greater importance as the bench and attorneys have begun to recognize that regular keyword or boolean searches may miss much relevant evidence. 

Concept searching is also beginning to get the attention of the bench.  Federal Judge Facciola noted in a recent case that "concept searching, as opposed to keyword searching, is more efficient and more likely to produce the most comprehensive results."  Disability Rights Council of Greater Wash. v. Wash. Metro. Area Transit Auth., 2007 WL 1585452 (D.D.C. June 1, 2007).  

How Concept Searching is Implemented in Lexbe 
Lexbe Online includes concept searching as an option in its web-based litigation support application.  If concept searching is not selected, then a search is conducted as a  normal keyword search, modified to include results obtained through stemming and with fuzziness included to compensate for possible OCR scan errors.    

A user can apply concept searching by checking the 'Concept Search' box on the left side of the search screen.  (It is left unchecked by default.)   Concept SearchOnce checked, the search will be expanded through reference to an extensive lexical and semantic network of the English language. This includes reference to synonyms, search term definitions and links between words other than synonym relationships (e.g., antonyms, hyponyms).

For example, a search of the term 'contractual' without the 'Concept Search' checked would return documents including the word 'contractual' and derivatives: 'contract', 'contracting', 'contracted', etc.   (Minor misspellings would also be returned: e.g. contractua1.)  If the 'Concept Search' box were checked, the search would be expanded to include terms like 'agreement', 'arrangement', 'compact', 'covenant', 'obligation', 'pledge', 'promise', 'understanding', as well as derivatives of these words.   Concept searching can find documents that might otherwise be missed, but use of it involves a tradeoff as it can lead to a number of false positive results as well.   Knowing the tools and tradeoffs will help lawyers and litigation teams decide when and where concept search can help benefit the review process.

The Importance of Transparency and Validation 
When concept search is used as part of e-discovery, it is important for attorneys to understand how concept search is implemented in the tool to be able to have confidence in the results and be prepared to validate it in court, if needed.  Lexbe uses an implementation of concept search provided by dtSearch, a top regarded litigation search tool.  dtSearch relies on the publically available WorldNet Lexical database, a open source thesaurus from Princeton University with over 100,000 English words and associations. As an open source resource, the WorldNet database available for download and examination if needed for litigation validation purposes.