How to Tame the Data Beast: Building a Data Warehouse

Many enterprise-sized businesses use legacy systems, such as Microsoft Excel, to collect, cleanse, store, and analyze data from separate sources, such as their CRM, Google Analytics, and call centers. Maintaining the legacy processes by which they collect data from these siloed sources takes up time and resources. When data resides in siloed systems, it becomes extremely difficult to understand a customer journey through the entire sales cycle.

Organizations need to make data-driven decisions quickly and efficiently based on timely data. But if accessing, combining, organizing, and cleansing data from multiple sources in a single location takes hours and hours, it will be difficult to obtain key insights and determine opportunities before it’s too late.

With a repository of all data systems in a data warehouse, it will become easier to analyze data from multiple sources. Data warehousing allows you to create a centralized repository of all of your data sources/systems in one place for easy and efficient reporting (with automated updates on a daily basis), analysis, and data visualizations.

To establish a clean and seamless data warehouse you need to take a step back. Building it can be time consuming, but it is worth it. Here are 5 steps that will help you to get started.

STEP 1: DEFINE YOUR BUSINESS GOALS & QUESTIONS

Different organizations have different business goals. Defining what they are, as well as the metrics and KPIs needed to assess your performance against your goals, is the the first step to starting a data warehouse project.

Begin by studying the data sources and legacy reports. Determine the types of data each system brings, and what metrics/KPIs can be tracked through these systems. Are these metrics good enough to measure against your pre-defined business goals?

A very important part of this process is to get buy in from your stakeholders so they have their expectations set from the beginning. This could also be a good time to revisit some of the KPIs that don’t matter to your organization any more and to add new metrics that are key to making strategic business decisions.

STEP 2: DEVELOP A PLAN

Once you have a clear understanding of what has been done in past and your business goals, create a metrics framework which lays out all the important KPIs that you will use to measure your performance.This framework can serve to guide for you as you establish which data sources you’ll need to pull into your data warehouse. An example framework is provided below.

Once the stakeholder approves the metrics framework, it is time to get your hands dirty with different data sources. Start with obtaining access to the data sources if you don’t have it already. From your research in step 1, you already have a list of all the systems you need to obtain to access all the relevant data. Figure out where these data systems are sitting. Are they behind a secure server or in the cloud?

STEP 3: SELECTING AN ETL TOOL
Once the stakeholder approves the metrics framework, it is time to get your hands dirty with different data sources. Start with obtaining access to the data sources if you don’t have it already. From your research in step 1, you already have a list of all the systems you need to obtain to access all the relevant data. Figure out where these data systems are sitting. Are they behind a secure server or in the cloud?

Now that you have you have access to all the data sources, the next step is to look for tools that will help you connect the different systems. Make a list of how you will pull all your data into one system. Does it require writing an SQL query? Is there a direct connection to the data system available? Or do you need to upload csv data files? This information is important when selecting which data warehouse tool you will use.

It is also important to look for tools with a good ETL (Extract, Transform and Load) capability. Your ETL is the foundation for your reporting moving forward. A good ETL tool should be able to connect to various data sources, i.e relational databases, NoSQL or object-oriented databases. Also, make sure your ETL process can perform data validation, cleansing, and transformation processes easily. And lastly, it is very important to have a tool that you can automate and that is scalable for the future. Some of the popular ETL tools out there include Oracle Data Integrator, SAP Business Object, QlikView, Pentaho, and Domo.

STEP 4: REPORTING AND DASHBOARDS

Once you are done building the ETL processes, it is important to first audit your data. This is where the metrics framework from step 2 comes into play. Make sure all the agreed upon metrics are set up in your ETL routines. Thus, the next step is to figure how to translate them into reports or dashboards that stakeholders can read. Most data warehousing tools will allow you to download a csv file with the output data you get from your ETL routine. You can then use the csv data to slice and dice the data as per your need or to create visually appealing dashboards that convey performance statistics to the different stakeholders.

For example, some of the reports you might want to create for different stakeholders are as follows:

  • Marketing dashboard: Performance by different marketing channels
  • Product teams: Conversions on different product pages for different offers
  • Executive team: Daily sales by different lines of business to look at seasonal trends

Some of the best data visualization tools available include Tableau, Domo, Periscope, Looker, Qliksense, and SAS.

STEP 5: RINSE & REPEAT

Building a data warehouse structure shouldn’t be the end of it. This is an opportunity for businesses to dive into more advanced analytics like predictive modeling or website optimization and testing. In future, if you decide to add or remove certain KPIs, you can always go back to the metrics framework and update your ETL process accordingly.

With the data available at your fingertips, you can easily understand pain points and seasonal trends and create a predictive model that can help you create goals for future. If you wish to create such models and do more deep dive analysis, having a robust ETL routine would come in handy at this point as it will make it easier to scale your data in the future. Thus, it is important to select a data warehousing tool that will enable you to expand your data systems. For example, if you decide to optimize your website data via A/B tests in future, it would be helpful if your data warehouse is able to connect to those tools to look at real time data on the performance of the experiments. This will make it easier to connect the dots between the tests performed and the results generated.

CONCLUSION:

Making the decision to build data warehouse will help your business stay on top of your data and make strategic decisions proactively. It will also help to establish data transparency where everyone within your organization relies on the same information. Ultimately, companies that take the leap and build a data warehouse will save money and get more value from their resources.

Leave a Reply