Creating a Centralised Data Hub on Databricks Using TOGAF: A Step-by-Step Journey

Let me tell you a story...

A few months ago, I walked into a room filled with frustrated faces. The Data team couldn’t trust the reports they were getting. The Network Analytics team was using data from tools that didn’t talk to each other. Operations? They had their own spreadsheets—most of which were outdated.

“Vish, can you fix this mess?” they asked.

I smiled. This wasn’t just a mess; it was an opportunity to create something transformative. I knew exactly what we needed: a centralised data hub. And to get it right, I turned to my trusted playbook—TOGAF—and a powerful tool called Databricks.

Here’s how it all unfolded.

Step 1: Setting the Vision

I started by asking a simple but crucial question: Why do we need this data hub?

The answer was clear:

To unify scattered data between networks.
To enable reliable, real-time insights.
To make better decisions faster.

I gathered stakeholders together and painted a picture of success:
"Imagine one place where all your data lives, ready to answer your questions at the speed of thought."

Eyes lit up around the room. The mission was clear.

Step 2: Understanding the Business

TOGAF taught me to start with the business. I took the time to understand each department’s processes:

Data team needed better forecasting tools to predict network usage.
Network Analytics wanted personalised segmentation for grid health and power quality monitoring.
Operations craved real-time insights into grid loads, outage management, and inventory tracking.

I mapped the data flows, pinpointed the bottlenecks, and identified where we needed improvements. This formed the foundation of our design.

Step 3: Designing the Architecture with DataMesh and Databricks

Here’s where DataMesh and Databricks took centre stage.

Based on my previous experience at a Health service in Australia, I introduced the team to DataMesh as an architectural concept that enables decentralised data ownership while maintaining central governance. The Lakehouse architecture was introduced to manage the mix of raw and structured data in one seamless platform.

DataMesh allowed us to treat data as a product, with data domains focused on specific teams owning their respective data.
Lakehouse architecture provided the best of both worlds: a data lake for raw data and a data warehouse for structured datasets that could be queried easily.

The architecture came to life in layers:

Data Ingestion: Streamed data from multiple sources like iTron (SIQ/UIQ), SAP HANA, ODW, and GIS (SDW).
Middleware: We used Zepben EnergyWorkBench (EWB) for transforming and syncing data across platforms.
Connectors: Databricks Jobs and Confluent Kafka enabled seamless data movement across systems and ensured we could handle both batch and real-time data processing.
Processing: Using Delta Lake for data transformation, we cleaned unstructured data into structured, query-ready tables.
Analytics: LVA Dashboards, Strategic Network Apps, and Databricks SQL powered queries, while POSIT Shiny visualised the insights.

I mapped all these layers using TOGAF’s Architecture Development Method (ADM) to ensure the design was scalable, flexible, and aligned with Jemena's long-term goals.

Step 4: Building the Technology Stack

Our stack was cutting-edge:

Compute: Databricks clusters auto-scaled to handle massive workloads with ease.
Storage: AWS S3 served as our central, resilient data repository.
Security: We used AWS Identity and Access Management (IAM) and Databricks Unity Catalog for access control, ensuring the right stakeholders had the right data at the right time.

Gone were the days of mystery spreadsheets—Unity Catalog tracked every piece of data, ensuring compliance with Australian data protection laws and giving full transparency into data lineage.

Step 5: Execution

With the architecture in place, I crafted a detailed phased roadmap to bring the vision to life. The Work Packages (WP) were structured to deliver value quickly and iteratively, ensuring continuous progress:

WP1: Discovery Phase

In this foundational phase, we focused on understanding the current data landscape. We assessed the existing systems, identified the gaps, and planned how to integrate disparate data sources. This allowed us to build a single source of truth and set the groundwork for the integration process.

WP2: Key Components Delivered

This work package saw the integration of SAP HANA and the creation of data products via Confluent Platform. Key components included:

SAP HANA Integration: Daily extracts and uploads of Meter-CT-ratio and Usage Point NMI data to an S3 bucket.
Confluent Platform Data Products: Stream processing capabilities were added to enrich meter-power-quality data with usage-point-info, which was then ingested into R-Server’s SQL Server DB for deeper analysis.

Key Outcomes:

Prepared R-Server for high-volume data through the VVC rollout.
Retired inefficient file-based integration for power quality data from SIQ.

WP3: Zepben and ODW Integration

Key integrations were set up in this phase:

ODW Integration: Captured and streamed switch state changes to a Kafka topic.
Zepben Network Model: We managed dynamic network models and integrated GIS data for the static model.
Switch State Micro Service: This service ingested switch state events into the Energy Workbench Network Model.

Key Outcomes:

CIM-compliant reusable Electricity Network Model was validated and made ready for the Grid Stability Program and DERMS use cases.

WP4: Databricks and Network Model Enhancements

In this phase, the Databricks Platform was operationalised, and significant integrations took place:

Zepben Network Model was enhanced with circuit information.
Time-Series database was introduced to handle Power Quality Data.
Confluent Platform was integrated with Databricks to enrich Usage Point Info.

Key Outcomes:

Bronze, Silver, and Gold layer data products were defined and implemented.
SAP/HANA batch data was successfully ingested into Databricks.
Operationalised the Databricks platform for data engineering, data science, and analytics.

WP5: LVA Dashboard MVP

In this work package, we delivered initial MVP use cases:

LVA Dashboard MVP: A new dashboard leveraging Databricks visualisation tools.
Additional Data Products: Enabling further use cases in the following work packages.

Key Outcomes:

MVPs for key use cases like Dynamic Network Model and Power Quality Data.
All platform components (Zepben EWB, Databricks, Time-Series DB) were built, tested, and deployed.

WP6: Expansion of LVA Dashboard and Strategic Analytics

This phase focused on expanding the LVA Dashboard to support additional use cases and replacing legacy R-Server + Shiny solutions with Databricks-based alternatives.

Key Outcomes:

R-Server algorithm migration completed to the Databricks platform.
LVA Dashboard was expanded to support new insights, enabling operational decision-making.

WP7: Full Integration and Grid Stability

The final work package focused on integrating additional data sources and creating advanced data products for Grid Stability and related analytics.

Key Outcomes:

Integrated external data sources like Weatherzone, BOM, and Solcast.
Built Grid Stability Solution using the network model and Databricks platform.
Created a LIDAR Image Processor using Databricks’ image processing capabilities.

Step 6: Launch Day

The launch day was unforgettable.

I watched as the Network Analytics team pulled up their first real-time LVA (Low Voltage Analytics) Dashboard. “This is magic,” one of the analysts whispered, seeing the live data on grid performance and power quality. The Operations team was already diving into the new insights, using the Dynamic Network Model and power quality datafrom the Time-Series database to optimise network management.

In Operations, we saw the immediate impact: outage management was now more efficient, and the team could finally retire their outdated spreadsheets, knowing they had real-time access to critical data. The engineering teams also embraced the Databricks-powered insights, allowing them to perform deeper analysis on grid health and even plan better for future energy demand.

Behind it all was the seamless integration of TOGAF’s structured governance and Databricks’ powerful tools, turning chaos into clarity.

Step 7: Future-Proofing with TOGAF

Architecture isn’t a one-time job. TOGAF’s Architecture Change Management reminded me to always plan for the future.

We expanded the hub with:

New data sources like power quality data, IoT sensors, and energy monitoring tools.
Machine learning models with Databricks MLflow for predictive analytics.
Sustainability metrics to track and optimise operations, in line with Jemena’s goals for carbon reduction.

This wasn’t just a data hub—it became a platform for endless possibilities.

Looking Back

Creating a centralised data hub wasn’t just about the technology; it was about solving real problems for real people. TOGAF gave me the structure, and Databricks provided the tools.

And the best part? Watching those frustrated faces light up when they realised what was possible.

That’s why I do what I do.

Tags: SolutionDesign ScalableSolutions ServerlessArchitecture DevOpsInArchitecture

Vishnu Devarajan

I am a seasoned IT professional and strategic architect specialising in solution design, enterprise architecture, and emerging technologies. With extensive experience in cloud computing, AI/ML, blockchain, and IoT, I'm passionate about crafting innovative solutions that bridge technology and business. Based in Melbourne, I'm dedicated to mentoring, writing, and sharing insights to empower the tech community.

IT Architecture

A Beginner’s Guide to Microservices Architecture with Node.js

29 Jan, 2025 17 mins read 22 views

IT Architecture Solution Architecture Enterprise Architecture Cloud Architecture Emerging Tech and Innovation Tools and Techniques Leadership and Collaboration Tutorials and How-Tos Thought Leadership

Led IT Transformation with Architecture Standards: Designing an Outbreak Management System

26 Jan, 2025 40 mins read 51 views

Demystifying Cloud Architecture: Building the Foundation for Scalable and Resilient Systems

26 Jan, 2025 30 views
Understanding Enterprise Architecture: Bridging Business and Technology

26 Jan, 2025 39 views

Creating a Centralised Data Hub on Databricks Using TOGAF: A Step-by-Step Journey

Step 1: Setting the Vision

Step 2: Understanding the Business

Step 3: Designing the Architecture with DataMesh and Databricks

Step 4: Building the Technology Stack

Step 5: Execution

WP1: Discovery Phase

WP2: Key Components Delivered

WP3: Zepben and ODW Integration

WP4: Databricks and Network Model Enhancements

WP5: LVA Dashboard MVP

WP6: Expansion of LVA Dashboard and Strategic Analytics

WP7: Full Integration and Grid Stability

Step 6: Launch Day

Step 7: Future-Proofing with TOGAF

Looking Back

Vishnu Devarajan

Related posts

A Beginner’s Guide to Microservices Architecture with Node.js

Led IT Transformation with Architecture Standards: Designing an Outbreak Management System

You might be interested in

Demystifying Cloud Architecture: Building the Foundation for Scalable and Resilient Systems

Understanding Enterprise Architecture: Bridging Business and Technology