Big Data...turning a liability into an asset.

Interrogation: Analysis, Compliance and E-Discovery

The challenge of interrogation is that most of today’s business documents and correspondence; word documents, emails, PDF’s and manuals are all unstructured text. Unlocking the value or information in unstructured text is clearly far more difficult than in structured text found in databases. In these instances, conventional search engines are not the answer. Many credible search engines can create a powerful word based index for you, BUT this is only a small part of the document Interrogation (search) challenge. 

Language is incredibly rich and complex. Words can have different meanings in a different context- think of the word ‘bank’-financial institution or edge of a river? The same meaning can be conveyed by using different words or groups of words- “downsizing, layoffs, a reduction in force, a restructuring, consolidation even right sizing.” Language changes over time- expressions LOL; unfriend, poke, recognised by some dictionaries and some age groups, but are not always recognised by educators or even Microsoft Word.  Language is morphing constantly affecting words and content alike. 

So if words alone aren’t the answer how does a search engine solve the issue? The core issue is that the conventional search engine provides results against the information supplied by the searcher- which almost certainly will not encompass the range of possibilities of the variations on similar but different search options.

DataCube transforms large volumes of unstructured data into organized, relevant information, and exposes insights hidden in the data. The DataCube platform provides organisation tools for classification and email analysis; using the Latent Semantic Indexing (LSI) based concept search; and other text analytics capabilities that automate most of the human activity traditionally associated with using unstructured data.

  • Corporate Searches, Documents and Emails
  • Compliance monitoring and discovery
  • Email conversations and threads
  • Freedom of Information Searches
  • Fraud investigation
  • E-discovery of specific data and data cleansing of outdated and orphaned data


Compliance- financial and legal discovery

Today’s world insists on strict separation between people in certain industries. The introduction and maintenance of ‘Chinese Walls’ which separate different sides of the Banking and Investment business, are seen as crucial to the integrity within that market. Recent scandals on Wall St and in London’s financial centres have not only cost huge amounts of money in fines and compensation, but have seriously undermined the reputation of the companies involved.

Conceptual searches get at the data you want not the data surrounding a word or string search

The ability to track individual comments- often used by Financial Business Analysts or share salesmen- are critical to maintaining integrity. Imagine an investment house has issued a ‘Hold’ notice advice for a certain share which they have a solid and working relationship with, but the internal commentary or sales email traffic says that it regards it as a ‘sell’ position, the statement will worry the compliance officers but also undermine the credibility and independence of advice of the particular financial house. DataCube’s ability to constantly scan the documentation and through its ability to combine, strings, words, entities, aliases and of course its conceptual search ability, it can create an in-flight and a retrospective discovery alert to protect all the parties.


Freedom of Information Request Handling

The handling of Freedom of Information queries is becoming an increasingly onerous task for local authorities. The current rate of enquires across the UK is believed to be increasing at between 25-35% per year. The DataCube system enables rapid and thorough interrogation of a governmental or local authority’s electronic information base significantly reducing the complexity, time and effort involved in providing coherent and high quality responses.

The ability to gather into a case file the documents needed for Freedom of information requests or Corporate queries, to both speed up the process without having to initially distribute the request and rely on local information knowledge to identify the paperwork- all the while controlling its collection and its authority levels before distribution and issuance is both economically and resource beneficial.

This service provides the information and document collation and presents it to the Data Information officer to personalise the response and ensure that any necessary redactions are carried out before dispatch.

Similarly, the DataCube system will enable enquires to be made on specific topics which hitherto had not been possible due to the construction of the original databases and information architecture. For example, the organisation, perhaps in this case a council, will be able to identify multiple people using a single address, using benefit claims, pest control requests, school rolls, and so forth, where a single occupancy discount for Council Tax is being claimed.


Email discovery- threading and content discovery

Unfortunately each day brings a new case of emails being central to an internal investigation or legal case or challenge. The ability to know what is in your emails, or to discover the instructions, thoughts, motivations behind a third party’s actions are increasingly being represented in the Email traffic between the members of the case. 

DataCube has the ability to analyse multiple email mailboxes to interrogate the content and moreover to detect deletions and reconstruct email threads.

This enables discovery of email threads across multiple internal and external email systems (Outlook, Gmail, Hotmail et al) by concept, topic and / or conversation authors. The system can also display the email threads of online conversations. These chains of emails can be displayed and gaps in the thread are displayed visually. The gaps are derived from the trail, having identified the recipient, originator or distribution list of related emails. Gaps can exist because the mailbox data source is not available to be analysed or that an email has been deleted from a mailbox. The visual display therefore can reconstruct the conversations between a number of parties, even where some of the emails are missing or have been deleted.

Reviewers can quickly narrow-in on only the most relevant conversations among only the most appropriate recipients, and by grouping similar strings and topics, can find relevant information in a fraction of the time they might spend going through chronological email trails. 

© Copyright 2013 Apperception