Big Data...turning a liability into an asset.


White Papers

Read what the industry is saying about document handling and security in our white paper section.

DataCube Blog

Help for the common Man

by host on 09 January 2016 14:26

Don’t you love the stories concerning monumental misjudgements or fantastical increases in something- Bill Gates’s ‘640k ought to be enough for anybody’ or Decca rejecting the Beatles because ‘Guitar groups are on the way out.’ What about the moon landing? – ‘More memory in a car seat or wing mirror than in the Apollo moon landing.’ And of course the ever reliable Moore’s Law on Price, memory and computing power!

Big Data- again the hyperbole is there- the four or is it six ‘V’s; the data tsunami which is supposed to be hitting us-apparently we created 5 Exabytes (EB= 1 billion terabytes-I think!) from the start of recorded time to 2003. In 2011 it took 2 days to create the same data- today 10minutes!

Companies like Amazon can talk with authority about data- they refresh their 1.5 billion products in 200 centres every 30 minutes and their communications to 150 million customers daily certainly put them in the bragging league for not just the use and insight from Big Data but from Huge Data.

BUT, let’s look at ordinary data use- I mean data use at what the world is actually mostly made up of- small and medium companies or even local governments.  There is no doubt that Email and text communication is increasing daily. Social media and web based communications also ramp the usage- often using increasingly diverse media- Facebook uses video, audio, photos and so on- replicated many times. BUT most companies are still battling with what they have today- the conventional communications channels, in reality just nodding at social media. Most SME’s and Local Government IT departments are actually more concerned with fixing today’s issues and implementing simple to say but difficult to achieve objectives.

  • How do you introduce and establish policies to protect your information assets and their use in a consistent and proactive basis?
  • How to you solve the rapid increase in data volumes, the rapidly changing structural positioning of your company with the knowledge that manual monitoring of data as an approach is completely non sustainable?
  • How do you create a data governance policy which crosses all the areas of a company- often from different heritages and mostly from different data file plans?
  • How do you automate the policy enforcement, which doesn’t rely on individual applications but allows the data to be safely transferred and appropriately used across the whole organisation?

There are many answers to this conundrum- usually proffered by highly expensive consultants- who often,  like the implementation of half a traffic management system create more difficulty than it actually solves.

The approach which Apperception recommends recognises the two major issues which any project in this area will come across.  Let us assume that the Data Governance and protection policies are written, and represent the business needs of the company. Let us assume that there is a comprehensive File Plan in place which states and illustrates the document types and their properties- i.e. retention dates and minimum document protective levels. Let us also assume that everyone recognises the need to bring the staff along on the journey and is ready to place them at the centre of the solution with a Culture change and Training program. Let us also assume that a sensible document labelling system has been installed and all staff through their training are now labelling documents- (and by definition adding the necessary metadata to protect the organisation.)

So what is left? Well what about the historic data- it is more than likely in unstructured storage- researchers predict that 80-85% of all data will be unstructured in the future. It probably has no indication on the document other than by its content of its confidentiality levels nor its status against the Information retention policy rules and short of looking at each and every document, no easy mechanism to find out.

The second potential issue is the increase of the incoming dataflow. Sorting the electronic wheat from the chaff is difficult enough if the email or message is directly addressed to you. If the organisation has an Info@ or similar contact point then the task of sorting the subject matter into figurative piles for onward transmission to the correct department becomes daunting. If you throw in social media, then just multiply the issues.

Apperception has a suite of solutions to the above points, ranging from Policy Consultancy, on line tailored learning programmes, provision of CESG level document labelling systems, DLP solutions, Interrogation systems and retro labelling of the legacy documents by conceptual content.

The business solution which we believe separates us from the crowd is our ability to apply automated categorisation tools to your data and sort it by its content (conceptual- not just key words, titles, authors or probability) into your specified File plan. The way it works is that Apperception has a system called DataCube which by using LSI technology can sort the data set into the File Plan categories by its content. It takes a reading of each document and harvests the textual content of the document (leaving the original in situ- unchanged) placing it in a multidimensional array which mathematically relates each document’s content to all the other documents. This then enables us to run a Schema (made from your file plan and examples of the documentation which makes up that category within the file plan,) against the array- sorting the historic documents into the categories and either placing it in a Document Handling System or protectively  labelling it according to the schema definitions.

To address the second point - the increasing flow of incoming data- DataCube, once set up takes the incoming data and ‘sorts’ it into predetermined categories using the same techniques as used to categorise legacy data and having done so- forwards it to the appropriate recipient.

Of course, once having set up the DataCube, you can interrogate it using conceptual questions, other documents, keywords, Bayesian searches or a combination of them all.  Case management, fraud prevention and Freedom of Information queries begin to fall off today’s problem list.

If you think we could help, or you would like more information – please click on

Blogs Parent Separator DataCube


© Copyright 2013 Apperception