Saturday, October 3, 2015

Conceptual, Logical, Physical: HELP!

Standard modeling languages give you too much flexibility, so you can all too easily start developing your own proprietary extensions. This should be avoided at all costs. Using the standard language features as intended is more productive, and makes things easier to learn and share. Introducing extensions in a language to support conventions commonly practiced in another language only causes confusion and wasted effort. Try to understand where languages fit and use them as intended by the language author. And understand that languages can be used to express different concepts to different audiences.
To illustrate these points I will talk about some time I spent a few months back at the DMZ (Data Modeling Zone) conference in Baltimore, MD. This is a conference targeted at analysts, specifically data analysts, and is very relevant to many of the Need Powered Change community. Much of the material below is directed towards more experienced analysts, but feel free to push on even if you are a beginner!
I attended a session led by the excellent Michael Blaha, someone whose published material I highly recommend (to a data modeling audience – see bottom of article). Michael was describing the benefits of using UML as a rich, business friendly notation to help elicit business data requirements, and explained UML to “diehard” data modelers very well indeed. One of the reasons I respect Michael’s work, especially over that of many / most of his peers, is that he does not misapply the UML constructs when data modeling. For example, others advocate role names to describe associations. <rant>This drives me crazy and confuses everyone. No “homemade” stereotypes either, except color (maybe). <end of rant>).
Anyway, we were shown how to use UML for conceptual and logical data models, with sound advice to consider moving to a dedicated database modeling tool when producing physical data models. This, it was said, means re-keying the data model, and “hard work” to keep the physical data model consistent with the upstream data models.
This started me thinking. If you need models, exactly what type of conceptual or logical data model would you want to model with UML? A data warehouse data model perhaps? Yes, I think so. High level enterprise data models? Again yes. But what about data models for transactional systems (the systems that do the work of the business, such as accounting, order management, and shipping)? Would we want to model their data with UML (especially as there are probably already UML models that show behavior and data for the developers i.e. class models)?  And when do we start physical data modeling – what is the transition point from conceptual to logical to physical?
To answer this we need to identify what conceptual, logical and physical data models are. Then their purpose becomes clear as well as when they are created. Here are my current thoughts:
Conceptual models of any kind are used when you are eliciting business strategy. They show the key subject areas in which an organization operates, and are likely to be static while an organization does not change its product and service offerings.
Logical models are used to show a relevant representation of the real world for a particular solution.
They may be described as “implementation neutral”, in that they contain no “implementation details”. The idea is that the logical model represents needs, not how the needs will be met by technology. In fact a logical model can ideally be implemented on any applicable technology, and applies to any type of need (transactional, reporting, analysis and so on).
Physical models show how technology will be used to implement the logical models. There should be bidirectional traceability between logical and physical models if both are to be maintained. However, a logical model may naturally evolve in to a physical model, and the logical model may be “lost” in the process. Therefore, all models must be versioned, with each version annotated with descriptions.
A really simple way to see the difference between conceptual, logical and physical models is that you need business knowledge (only) to produce conceptual and logical models. Physical models require technical knowledge.
Conceptual, logical and physical models can be produced for data and functional (class) models. Modeling artifacts must be traceable between logical and physical data models and also between logical (interface or API) and physical (implementation) functional models. Ideally (although it’s probably not possible) traceability should come from forward and reverse engineering between these models, but practically it’s probably through evolution, versioning, and hard work that this trail is maintained – again, if you really need to. Physical models and their associated implementations can be / should be forward and reverse engineered too.
All models of every type (where type is conceptual, logical and physical) must be consistent with each other (so that a logical functional model maps to a logical data model etc.)
So, any of the models can be produced in UML (well, physical models only at a pinch, and only in some tools). An essential point is that a UML logical data model is NOT the same as a functional UML (class) model used by developers. Also, not all of the models are required on all projects – it just depends on what you need.
That represents my current thinking. What do you think? Create a comment!