What is a Data Catalogue?
The concept of a data catalogue is a very simple one. It refers to a centralised and complete directory of all data sources that are in use within your business. Think of it as a “metadata management system”. A Data Catalogue is concerned with ensuring data is curated in such a way that when it actually needs to be used, it is easily locatable, is well documented and in the most optimal format.
A data catalogue can only be successful if it is a single, reliable source of truth which covers all of the data sources and data elements that are in use within an organisation. If data sources and pipelines fall outside of its scope, they will compromise the implementation of good data governance. Having an automated system for recognising and registering new data assets is another important aspect of managing metadata. If this process is performed manually by staff, not only is it a bottleneck but it can mean certain elements can be accidentally overlooked, compromising the integrity of the entire project.
Why is Metadata Management Important?
A data catalogue is an example of a metadata management system which is often not given as much focus and attention within organisations as a master data management system (MDM). An MDM governs the meat-and-potatoes aspects of enterprise data: information about customers, products, business entities and other assets. It enables the core day-to-day operations and its management system works to ensure that this data remains consolidated and accessible, with no duplication, redundancy or inconsistency.
A metadata management system, is slightly more abstract and is concerned with governing those data points which ensure that the enterprise data ecosystem itself is running smoothly. Metadata management is an important step towards good ongoing organisational data governance and should be one of the areas of responsibility of a Data Steward. It is concerned with documenting data source, owner, history, format and entity relationship. An important overall focus of improving all round data quality is to make the process of accumulating and visualising this metadata a matter of course, setting up a solid foundation for good data governance.
What is Data Lineage?
The importance of metadata management lies in the fact that a single, isolated data point does not automatically explain itself and is at its most useful when presented within a well elaborated context such as within data catalogue software. This is especially the case when you are dealing with data that has undergone integration from multiple source systems, has been transformed to conform to a specific schema and has been changed by the application of some other formulas. The documentation and presentation of all the changes and alteration can be as important as the final figures and visualisations that are presented.
Data that is presented without context, in ways that lead to it being misinterpreted, can lead to a lack of trust in data within teams. When this happens, inefficient, ad-hoc parallel processes can emerge which will reduce the value that a business is getting from an elaborate data storage and analytics platform. A data catalogue can help to minimise these issues.
How Does a Business Glossary Aid a Data Catalogue?
Business terms can be notoriously ambiguous or open ended, often unintentionally and sometimes by design. When it comes to quantification and analytics, lack of standardised definitions can lead to confusion when inputting data at the coalface and can create unreliable, diluted or omitted data at the visualisation and analytics stage.
Since clarity is so important, an often-neglected step is to define the important terminology, outlining precisely what functions are being referenced by ambiguous terms. A data catalogue is the best place to create and maintain such a business glossary because of its proximity to other, important clarifying information about business data.
How Loome Can Help Establish a Data Catalogue
Loome Publish provides an overview of data lineage by displaying the relationship between Entities. These can be reports, data cubes, tables or other objects pulled from data sources. The important bigger picture is displayed in a Network Map, visualising all the connections between the Entities.
This makes it possible to trace back the data source and easily see all the transformations and derivatives which take place. It is simple to see, for example, which reports will be affected if a certain table is altered. Further information can be provided through a dynamic metadata management system of annotation which builds even more important context and allows the incorporation of specific, business-relevant note taking.
Loome Publish also offers a fully integrated business glossary which takes the form of Business Term Entities. These can be added within the Network Map at any point and connected to any Entities. Once a definition has been written, it is easily accessible to anyone working with the data catalogue, the rules for standardisation available on demand.