How to create a data catalog, a step by step guide
How to create a data catalog, a step by step guide
2 by Etai | 0 comments on Hacker News.
Simple data cataloging starts with a great organization. A data catalog is a collection of metadata and documentation that helps make sense of the data sprawl that exists in most growing companies. Getting together and starting to use a data catalog is a simple process, but starting to get adoption and having the dictionary exist as part of your workflow is a little bit more difficult. An example of the data catalog problems shared by one of the delivery companies we spoke with. At this company, it was difficult to get aligned on which tables were commonly used, joined, how they were used together and what columns meant. Similarly, it’s difficult to monitor the number of data assets that exist across different departments, especially when the number of resources grows at a faster rate than people. Why is this the case? Data is becoming more decentralized through concepts like the data mesh. As more teams outside of the data function start to use data in their day-to-day, different tables, dashboards and definitions are being created at an almost exponential rate. Data catalogs are important because they help you organize your data whether you are working with structured or unstructured data. Below are the steps that teams need to take when creating a data catalog: 1. Gather sources from across the organization The first step data teams need to take is to collect the different resources that are scattered across different tools in the origination. This may require multiple meetings and stakeholders to come together and figure out which resources need to be in the catalog. Today, this collection could be done in a spreadsheet with an ongoing list of all resources and how they connect. 2. Give each resource an owner After data teams have identified all the resources from across the company that they would like to include in their data catalog, we recommend assigning ownership to each resource. Teams that we’ve worked within the past have assigned ownership based on the source, schema or even domain. Teams that start assigning ownership should look for people who are familiar with the data knowledge they are responsible for managing and are willing to help others who want to learn how to use it. 3. Get support and sign off Once these meetings conclude and owners are on the same page, have the owners sign off on their responsibilities. The owners should be in alignment with the documentation and feel like the data team worked collaboratively with them to come to this ownership structure. If the team leadership team sees the value of a data catalog, this can move at a much faster pace. 4. Integrate the catalog base into your workflow After data teams have received support for their data documentation process, they should look for ways to integrate this tool into their workflow. This step is critical for maintenance and upkeep. Without a tool that allows teammates to receive notifications on Slack, it will likely be forgotten. By creating a process around the data catalog, teams can ensure that it is not left behind as the team grows 5. Upkeep the data catalog Although the documentation should be stable, it may need to change over time. One instance that might require documentation to change is when a new revenue stream is introduced or when the pricing of an existing revenue line changes. These changes traditionally come from the business team and might require the data team to implement the changes into the data catalog. Teams that invest the time to get alignment using a data catalog can see major benefits in the long term as they make faster decisions as a team. Creating a data catalog is not a small undertaking. You can read the full step-by-step guide here if you found this post useful: https://ift.tt/3p0YqiL
2 by Etai | 0 comments on Hacker News.
Simple data cataloging starts with a great organization. A data catalog is a collection of metadata and documentation that helps make sense of the data sprawl that exists in most growing companies. Getting together and starting to use a data catalog is a simple process, but starting to get adoption and having the dictionary exist as part of your workflow is a little bit more difficult. An example of the data catalog problems shared by one of the delivery companies we spoke with. At this company, it was difficult to get aligned on which tables were commonly used, joined, how they were used together and what columns meant. Similarly, it’s difficult to monitor the number of data assets that exist across different departments, especially when the number of resources grows at a faster rate than people. Why is this the case? Data is becoming more decentralized through concepts like the data mesh. As more teams outside of the data function start to use data in their day-to-day, different tables, dashboards and definitions are being created at an almost exponential rate. Data catalogs are important because they help you organize your data whether you are working with structured or unstructured data. Below are the steps that teams need to take when creating a data catalog: 1. Gather sources from across the organization The first step data teams need to take is to collect the different resources that are scattered across different tools in the origination. This may require multiple meetings and stakeholders to come together and figure out which resources need to be in the catalog. Today, this collection could be done in a spreadsheet with an ongoing list of all resources and how they connect. 2. Give each resource an owner After data teams have identified all the resources from across the company that they would like to include in their data catalog, we recommend assigning ownership to each resource. Teams that we’ve worked within the past have assigned ownership based on the source, schema or even domain. Teams that start assigning ownership should look for people who are familiar with the data knowledge they are responsible for managing and are willing to help others who want to learn how to use it. 3. Get support and sign off Once these meetings conclude and owners are on the same page, have the owners sign off on their responsibilities. The owners should be in alignment with the documentation and feel like the data team worked collaboratively with them to come to this ownership structure. If the team leadership team sees the value of a data catalog, this can move at a much faster pace. 4. Integrate the catalog base into your workflow After data teams have received support for their data documentation process, they should look for ways to integrate this tool into their workflow. This step is critical for maintenance and upkeep. Without a tool that allows teammates to receive notifications on Slack, it will likely be forgotten. By creating a process around the data catalog, teams can ensure that it is not left behind as the team grows 5. Upkeep the data catalog Although the documentation should be stable, it may need to change over time. One instance that might require documentation to change is when a new revenue stream is introduced or when the pricing of an existing revenue line changes. These changes traditionally come from the business team and might require the data team to implement the changes into the data catalog. Teams that invest the time to get alignment using a data catalog can see major benefits in the long term as they make faster decisions as a team. Creating a data catalog is not a small undertaking. You can read the full step-by-step guide here if you found this post useful: https://ift.tt/3p0YqiL
Comments
Post a Comment