Whether urban planning or Information Technology Systems, an outside pair of eyes might alert you to something you have gradually become blind to as it accrued. Consider these pictures: Some landscape architect planned ahead for the growth of these trees. Those curbed and slotted sections spread the load to prevent soil compaction, prevent wheeled machinery from coming too close to the tree trunk, and are easy to remove and replace. Unfortunately some time after the design and implementation, the planned replacement as the tree grew was forgotten. An at least annual review of your system implementation documents or just a look by someone from outside your shop might help you from going from the first picture to the second. We (Rightsizing, Inc.) do this sort of thing for Oracle Technology and Business Processes.
With some friends from the Netherlands and Estonia at the Hard Eight BBQ at 688 Freeport Parkway in Coppell, Texas (+1) for food and another (+1) for the Texas experience. What a fun way to decompress after a brain stuffing week at Hotsos!
An easy confusion of logic was succinctly cleared up by Toon Koppelaars on oracle-l today. It *may* be helpful to read that thread before you read this post, but I hope it stands alone.
The original poster wanted to exclude only month number 5 from year 2012, but was perplexed that
"where (month != 5 and year != 2012)"
also excluded the 5th month of 2011 and all of the data from 2012.
Toon referenced and explained from De Morgan’s Laws that “not X and not Y” is equivalent to “not (X or Y)”, which would be “not (month=5 or year=2012) in your first query.
For those who are truth table challenged, it may be helpful to consider this (regardless of the actual plan Oracle might choose) as a filter operation. So you get a row, and if it is either that nasty 5th month you apparently did not want or that nasty year 2012, throw it away.
What the poster wanted was only to omit the 5th month of 2012. That is “not (month=5 and year=2012)” as Toon so succinctly wrote.
Now I found it interesting that the 5th month of 2012 was the last month for which there were rows in the table at all. As an information processing issue, this looks to me as if you want all the data except the last month (which is perhaps still in process or as yet not audited). If that is indeed accurate, then
where year < 2012 or month < 5
would fulfill the logical requirement and this would helpfully lead to re-usable code
where year < &incomplete_year or month < &incomplete_month
which is a little more clear (if in fact my surmise of the purpose is correct which we’ll consider true for the rest of this posting.) But this still requires logic on two columns when in fact what is probably wanted is simply the data before 2012-05. (yyyy-mm format).
Now this can be rendered as
where to_date(to_char(year,'FM0000')||to_char(month,'FM00'),'yyyymm') <
which makes it clear that this is really a range query, if you can discern that fact through all the formatting used to shove the data cleanly through the to_char and to_date functions.
Now this points out one of the benefits of using time value columns in the first place, so if that is an operationally practical solution you can fix the data model. If it is not operationally practical to retool the data model, Oracle has provided us with a virtual column capability. You won’t have to see the sausage getting made in your routine reading of queries, and you can put a helpful comment on the table in case someone might want to read in text what your intention was for the meaning of the virtual column.
In this case,
alter table sometable add year_month as (to_date(to_char(year,'FM0000')||to_char(month,'FM00'),'yyyymm'));
comment on column sometable.year_month is 'year and month columns combined to produce a valid date';
does the trick the where clause becomes really simple:
where year_month < to_date('&incomplete_month','yyyymm')
This makes the meaning of the query trivial to understand, and as the volume of data grows and perhaps older data is no longer of interest predicates of the form
where year_month > to_date('&older_than_relevant','yyyymm')
and year_month < to_date('&incomplete_month','yyyymm')
where year_month between to_date('&older_than_relevant','yyyymm') and to_date('&incomplete_month','yyyymm')
if you prefer between syntax neatly does the trick.
Since you can index a virtual column and partition on it as well, this then becomes a real opportunity for reducing the amount of work the computer will need to do to answer your query. With an index and no partitioning, there is a good chance it will be cheaper to access the data you need via the index. That will depend on the details. With partitioning you may get partition pruning and the nature of the date partitioning likely means you can set it up to scan whole partitions which should dovetail nicely with parallel query and exadata optimizations. (That bit about exadata is speculation. Oracle still hasn’t sent me one to play with.)
So I’ve gone well beyond the subject of the oracle-l question which was already answered. That’s what a few friends keep telling me I should use a blog for and it is slowly sinking in.
Here is a snippet I wrote on the oracle-l list today before I remembered that I’m a blogger now. To see the whole thread in context (potentially including someone telling me I’ve got it all wrong later today or in the future) the subject was OT: sheltered little world i live in -> NODB?
Someone opined about where a DBA would start designing. That person might be right about some DBAs. Here is approximately what I wrote (only fixing some typos and grammar, I think.):
gee whiz. I think people think of me as a DBA. re: “That’s true that DBA would start designing application from database.” (sic.)
Larry Constantine has written some excellent books about information systems design. Many of my ideas are well explained in those books.
First, you figure out what the application needs to do. Folks have made the phrase “use cases” popular in recent years. I always check whether someone is wearing clothes when I hear that phrase. For some it is an extremely useful focusing phrase and for some it is an excuse to not know what you’re doing. So check. By the way, figuring out what the application needs to do, right away, includes getting a ballpark idea of how much total data is going to be handled and where that fits in the context of the current capabilities of hardware. “Is it bigger than a breadbox?”
Second, you need to figure out what the inputs to the systems are, what the outputs of the system are, where the transform centers from one to the other are, what instrumentation is appropriate to determine whether the inputs are correctly becoming outputs (where appropriate includes performance), and what the required information life cycle is for all the data and metadata about the system. Lingo: Inputs are called afferent legs, outputs are called efferent legs. A point many folks (not including Tim Gorman, who nailed this in one as well as making several other key points with many fewer words than I need) is that efferent legs include reporting requirements. That’s analytics, mining, performance metadata so you know whether your system is heading for the crater: ALL THAT.
Third, you need to figure out the data model requirements for the system.
(That could indeed be a flat file [and please note that a single flat file can be a representation of a relation] – and awk might be just the right tool [or even just grep], but you don’t know that before steps 1 and 2.)
Fourth, you need to figure out how much of the system you can build in the first chunk without making it impossible or difficult to build the other parts of the system.
Fifth, you need to assess a useful point on the scale between complete waterfall design and the lightest weight agility that makes sense for the project.
Sixth, you need to choose tools and technologies and start construction if the expected value of the application exceeds the expected cost and is within budget.
Seventh, you need to get a pizza and a beer. And I hope it is a really good beer. I think the article author made a lot of good points about beer. Now for some really trivial applications, you might go through the first six steps and build the thing in under an hour. If that’s the case, you better do a few of them before you proceed to a bundled step seven. But do not forget to budget for the pizza and the beer.
PS: Step seven is also the seventh step of debugging, but that is a different story
RSIZ (Rightsizing, Inc.) has begun exploring wordpress.
In 1990, when Ken Jacobs hosted the RDBMS campground talks at the Anaheim International Oracle User Week appreciation event, one of the topic areas was whether we (some users representing the Very Large DataBases VLDB of the Oracle world which meant anything north of about 7 GB back then) thought that the rule based optimizer (RBO) was good enough, or whether we needed a cost based optimizer (CBO) for the real applications we were running at enterprise scale to work well. “Oracle’s optimizer is like Mary Poppins. It’s practically perfect in every way. But we do have some cases where it would be helpful for the optimizer to consider the relative sizes of tables and whether a table was local or remote when the plan for joining and filtering is constructed. And some of the tie breakers for which table should go first and which index should be used when two indexes with similar characteristics are present can change, so we would like to be able to specify them for important plans.” Fast forward 22 years. Now there is a bumpersticker slogan contest for Oracle’s CBO. To enter you fire off a tweet with the hashtag #optimizerbumperstickers. Now I’ve always thought that the goal of the CBO is good plans, not perfect stats (or even perfect plans). So in bumper sticker size, my entry is “The goal is good plans!”