Big Data...turning a liability into an asset.

Categorisation Module

Categorisation Module

It takes the collected data and indexes the data. In parallel it creates in conjunction with the Customer the Customer’s personalised schema or LGCS which defines; retention dates, protective levels (labels) and their exceptions. Once the data inventory and staging areas are built, it provides the ability to feed the third major workflow process. From the staging areas, we ingest data into the DataCube core to build a centralised index and categorisation system. The initial build takes several weeks but once built, new data is simply absorbed into the index, which is periodically updated. The index is held in memory and is incredibly fast and feeds a number of elements of the overall solution.

Additional workflow elements are subject to the organisation’s required use of the system and to enforce their data management and information security policies. 

Data Extraction into other applications- EDRMS implementation or Casework collation: Search the data conceptually, or by using keywords, or by metadata or a combination of all and filter the results to refine the accuracy of the search. The search results then enable the user to view the data files individually or extract the results into other applications such as SQL Server, SharePoint or Dynamics, or a Document handling system

  • Most Document handling systems fail because the document ingestion tasks are never completed as pressure on the users always diverts them from this task.
  • This module automates that process- enabling timely installation and introduction of a document handling system. 

Auto Creation of a Dataset’s Taxonomy: Dynamically create taxonomies for the data based on the data itself. Either using a specified number of categories, the DataCube automatically clusters the data that contains similar concepts into the specified number of categories. The system even suggests a naming convention- this is referred to as Dynamic Clustering.

  • Dynamic Clustering. Create communities of subject matter or by automatically clustering content and resources into related groups.
  • This can be used for either search purposes or the creation of taxonomies or schemas from within the existing dataset. Many committee created schema end up unevenly weighted so narrow the purpose of the Document handling methodology.

Supervised Categorisation into a specific industry or personalised Taxonomy: An alternate categorisation method is that of supervised Categorisation. Creating and imposing an industry specific (but personalised) taxonomy- in the case of Local Authorities, it would be the LGCS amended to cater for the customer’s own requirements. DataCube Services would help the Customer with the definition of the categories and the appropriate transactions, and then by adding to the anonomised data already gathered by earlier exercises, add exemplars to create a taxonomy (set of categories) specific to the Customer.

  • DataCube has already a sophisticated schema set for Local government, which already provides for the indexing and categorisation of the unstructured data. The personalisation of the schema ensures that it fits the purpose for your particular business or organisational needs.

We provide a number of tools, including a reporting dashboard, email forensic analysis, Interrogation case management and duplicate file analysis to support an interrogation team.


© Copyright 2013 Apperception