Clock tree plays a critical role in high performance chips. Designing and optimizing clock trees is very essential in today's designs. Optimal clock tree which also consumes less power is a major challenge. Here we show that how we can make use of the systematic nature of data flow to build a clock tree which is not completely balanced at the chip level, but at the same time making sure that timing is met without much difficulty. This approach significantly reduces the number of clock buffers used compared to a balanced tree approach, hence reduces the total clock power.