Big Data...turning a liability into an asset.

Data Cleansing

Multiple departmental changes, takeovers, staff moving or retiring, old projects finishing and discredited data storage and document management systems have all led to many datasets being virtually uncontrollable.

DataCube has addressed these issues and has created the Data Cleaning Service to locate and cleanse your data.. 

The DataCube Data Cleansing Service uses its sophisticated Data Inventory system to collate and maintain all the details of each document within a dataset.

The DataCube collates the information regarding the dataset it is working on. To achieve this it builds a sophisticated Data Inventory where it stores all the pertinent details regarding the data, File types, its locations, file paths, metadata, protection levels, retention details, authors, creation and access dates and any other required fields. This inventory then allows the analysis of the data enabling the recognition of many characteristics. It also recognizes duplicates, Email threads which are multiple copies of original emails, dates of documents, and can identify authors who have no longer any connection with the organisation. It can also recognize subject matters and their dates thus enabling different department ‘cut off’ points to be recognized and deleted or archived. The most immediate gain here would be the identification of the ‘date expired documents’ which could legally be removed from the storage facility to free up disc space but also reduce ‘discoverability’ regarding FoI requests and other enquiries.

The DataCube Data Cleansing Service will analyse all data and produce an array of reports detailing existing data including:

  • File numbers by types and locations
  • Files by sizes, Mimes groups
  • Duplicates- MD#5 analysis of each document unique content to establish exact duplicates –irrespective of file label or metadata
  • Validity of file extensions
  • Aged file analysis by its content subject matter by category, enabling real understanding of a document's providence as well as the various restrictions and rules which should apply to it. (The application of the schema will enable clients to apply rules to documents based on their content rather than their title or some keywords)
  • Analysis of its metadata and its activity history; created, modified author originating source
  • Files by Categories (either LGCS or any local version of a schema)
  • Outdated and orphaned data
  • Email Threading based on content, author, dates- including attachments 
© Copyright 2013 Apperception