Big Data...turning a liability into an asset.


White Papers

Read what the industry is saying about document handling and security in our white paper section.

DataCube Blog

Don’t Panic- Just Cleanse your data!

by host on 09 June 2016 08:00

Do you open your inbox and find yourself assaulted by an avalanche of scare mongering Big Data terms and technology?

Reading through worthy articles concerning ‘Big Data’, ‘360° information,’ ‘Total Touch Points,’ and ‘tsunami of data growth’ frightens a lot of people- not just the technologically illiterate or even the old and crusty companies that don’t want to change, but often the go ahead companies and staff who cannot see a way through yesterday’s problems into the daylight of tomorrow- short of mortgaging the whole company to a ‘top five consultancy.’

People know that if they were starting from here, they would surely know they could build a system which handled the ‘exploding level of customer touch points,’ one that could provide ‘relevance in real time,’ one that could handle the Data Gold Rush pand beat the ‘Not the data Challenge but the Analytic challenge.’

But they are not. People are living in a real world where the data is made up of a mixture of old data which no one has got rid of, of organisational changes which have delivered data with different taxonomies and formats (often unstructured) a history of people locally avoiding cumbersome document management systems for ease of working and new data being created at a rate which almost defies imagination. (You can make up any statistic around data growth- it almost certainly won’t be wrong- IBM doubling every 18 months- HP 15 x growth by 2020.)

So what we actually have is a real life problem – Data which has been lost from sight; probably not accessed recently, probably originated by someone who has left the organisations, probably mixed in with ‘private’ data- videos, songs, illegal software and without doubt the content and the potential security impact of that data unknown.

Schemas and retention dates are loosely adhered to. The old data is certainly not under any control, and the new data is probably only controlled in a few departments, who have strict access and delivery rules rather than securing the data itself- legal, finance, HR, Care departments for example.

Companies have become proficient at locking the data down inside the walls of their organisations, and even within the departmental silos but that in itself is only solving the single confidentiality issue- it doesn’t solve the wider use of the data, it certainly doesn’t solve the access issue if questions are asked- Corporate enquiries, Freedom of Information queries, and it doesn’t solve the increased use of storage and the increased need to respond to the legal and regulatory positions.

So what to do?

First of all don’t panic- there are many things that can be done which will completely alter the management and availability of the data. To start- let’s look at cleansing the data where it stands today- find out everywhere there is data on your network, what it is (file type), clean the Redundant, Obsolete and Trivial (ROT)data out. Using auto categorisation look at the content of each document-Find out why a confidential document is stored in the maintenance servers, look at the duplicates, look at the data which hasn’t been accessed for 36 months, look at the retention dates which allows deletion of documents- look at the schemas, build relevant categories stipulating document protection levels and document retention dates- appropriate to the subject matter.

How do we do this?

We need to understand (and be able to sort by) the content of the data- not just the Meta data, not just the titles, dates and authors but the actual concepts which the document is holding in its content.

Auto categorise the documents by their concepts into your own subject matter categories automatically.

Having stipulated the parameters (protective marking levels, retention dates) for each category -Apply the document protective markings, the category retention dates, analyse the duplicates identify the Redundant, Obsolete and Trivial (including empty, corrupt and illegal files) files (including having applied the retention date criteria) and look to move them out.

Move out the ‘expired’ and redundant, obsolete and trivial documents (plus you may wish to move out anything which hasn’t been accessed for 36/48 months) into a lower cost archive ready for deletion after a ‘cooling off period

Run the retention date analysis monthly moving the appropriate documents into the low cost archive ready for removal at an agreed timescale after the move.

Thus your data is cleansed, it self-cleanses on a monthly basis, it is categorised and can be secured appropriately and you have confidence in knowing what and where your information is.

You won’t have solved all the issues surrounding Big Data but you will know your own data is clean and organised into a schema which you can manage for everybody’s’ benefit, and your exponential growth in storage has at least slowed to a manageable level.

Now you can go talk about Big Content, 360° Intelligence, total touch points, mining the Data Gold Rush and so on safe in the knowledge that at least your system is built on solid foundations and not todays marketing hype! Anyone want to talk about Biological Memory????

Blogs Parent Separator DataCube


© Copyright 2013 Apperception