
Once I determined to put in writing this weblog put up, I assumed it will be a good suggestion to be taught a bit concerning the historical past of Enterprise Intelligence. I searched on the web, and I discovered this web page on Wikipedia. The time period Enterprise Intelligence as we all know it as we speak was coined by an IBM laptop science researcher, Hans Peter Luhn, in 1958, who wrote a paper within the IBM Programs journal titled A Enterprise Intelligence System as a particular course of in knowledge science. Within the Targets and ideas part of his paper, Luhn defines the enterprise as “a set of actions carried on for no matter goal, be it science, expertise, commerce, trade, regulation, authorities, protection, et cetera.” and an intelligence system as “the communication facility serving the conduct of a enterprise (within the broad sense)”. Then he refers to Webster’s dictionary’s definition of the phrase Intelligence as “the power to apprehend the interrelationships of offered details in such a manner as to information motion in the direction of a desired purpose”.
It’s fascinating to see how a incredible concept up to now units a concrete future that may assist us have a greater life. Isn’t it exactly what we do in our each day BI processes as Luhn described of a Enterprise Intelligence System for the primary time? How cool is that?
After we speak concerning the time period BI as we speak, we seek advice from a particular and scientific set of processes of remodeling the uncooked knowledge into beneficial and comprehensible info for numerous enterprise sectors (akin to gross sales, stock, regulation, and so on…). These processes will assist companies to make data-driven choices based mostly on the present hidden details within the knowledge.
Like every thing else, the BI processes improved rather a lot throughout its life. I’ll attempt to make some smart hyperlinks between as we speak’s BI Elements and Energy BI on this put up.
Generic Elements of Enterprise Intelligence Options
Usually talking, a BI answer accommodates numerous parts and instruments that will fluctuate in numerous options relying on the enterprise necessities, knowledge tradition and the organisation’s maturity in analytics. However the processes are similar to the next:
- We normally have a number of supply methods with completely different applied sciences containing the uncooked knowledge, akin to SQL Server, Excel, JSON, Parquet information and so on…
- We combine the uncooked knowledge right into a central repository to cut back the chance of constructing any interruptions to the supply methods by always connecting to them. We normally load the info from the info sources into the central repository.
- We rework the info to optimise it for reporting and analytical functions, and we load it into one other storage. We purpose to maintain the historic knowledge on this storage.
- We pre-aggregate the info into sure ranges based mostly on the enterprise necessities and cargo the info into one other storage. We normally don’t maintain the entire historic knowledge on this storage; as a substitute, we solely maintain the info required to be analysed or reported.
- We create studies and dashboards to show the info into helpful info
With the above processes in thoughts, a BI answer consists of the next parts:
- Information Sources
- Staging
- Information Warehouse/Information Mart(s)
- Extract, Rework and Load (ETL)
- Semantic Layer
- Information Visualisation
Information Sources
One of many primary objectives of working a BI mission is to allow organisations to make data-driven choices. An organisation might need a number of departments utilizing numerous instruments to gather the related knowledge on daily basis, akin to gross sales, stock, advertising, finance, well being and security and so on.
The information generated by the enterprise instruments are saved someplace utilizing completely different applied sciences. A gross sales system would possibly retailer the info in an Oracle database, whereas the finance system shops the info in a SQL Server database within the cloud. The finance group additionally generate some knowledge saved in Excel information.
The information generated by completely different methods are the supply for a BI answer.
Staging
We normally have a number of knowledge sources contributing to the info evaluation in real-world situations. To have the ability to analyse all the info sources, we require a mechanism to load the info right into a central repository. The principle motive for that’s the enterprise instruments required to always retailer knowledge within the underlying storage. Subsequently, frequent connections to the supply methods can put our manufacturing methods prone to being unresponsive or performing poorly. The central repository the place we retailer the info from numerous knowledge sources known as Staging. We normally retailer the info within the staging with no or minor adjustments in comparison with the info within the knowledge sources. Subsequently, the standard of the info saved within the staging is normally low and requires cleaning within the subsequent phases of the info journey. In lots of BI options, we use Staging as a brief atmosphere, so we delete the Staging knowledge frequently after it’s efficiently transferred to the subsequent stage, the info warehouse or knowledge marts.
If we wish to point out the info high quality with colors, it’s truthful to say the info high quality in staging is Bronze.
Information Warehouse/Information Mart(s)
As talked about earlier than, the info within the staging isn’t in its greatest form and format. A number of knowledge sources disparately generate the info. So, analysing the info and creating studies on high of the info in staging could be difficult, time-consuming and costly. So we require to seek out out the hyperlinks between the info sources, cleanse, reshape and rework the info and make it extra optimised for knowledge evaluation and reporting actions. We retailer the present and historic knowledge in a knowledge warehouse. So it’s fairly regular to have a whole lot of tens of millions and even billions of rows of information over a protracted interval. Relying on the general structure, the info warehouse would possibly comprise encapsulated business-specific knowledge in a knowledge mart or a set of information marts. In knowledge warehousing, we use completely different modelling approaches akin to Star Schema. As talked about earlier, one of many main functions of getting a knowledge warehouse is to maintain the historical past of the info. This can be a huge profit of getting a knowledge warehouse, however this energy comes with a price. As the amount of the info within the knowledge warehouse grows, it makes it dearer to analyse the info. The information high quality within the knowledge warehouse or knowledge marts is Silver.
Extract, Transfrom and Load (ETL)
Within the earlier sections, we talked about that we combine the info from the info sources within the staging space, then we cleanse, reshape and rework the info and cargo it into a knowledge warehouse. To take action, we observe a course of known as Extract, Rework and Load or, in brief, ETL. As you may think about, the ETL processes are normally fairly complicated and costly, however they’re an important a part of each BI answer.
Semantic Layer
As we now know, one of many strengths of getting a knowledge warehouse is to maintain the historical past of the info. However over time, maintaining huge quantities of historical past could make knowledge evaluation dearer. As an example, we could have an issue if we wish to get the sum of gross sales over 500 million rows of information. So, we pre-aggregate the info into sure ranges based mostly on the enterprise necessities right into a Semantic layer to have an much more optimised and performant atmosphere for knowledge evaluation and reporting functions. Information aggregation dramatically reduces the info quantity and improves the efficiency of the analytical answer.
Let’s proceed with a easy instance to higher perceive how aggregating the info might help with the info quantity and knowledge processing efficiency. Think about a state of affairs the place we saved 20 years of information of a sequence retail retailer with 200 shops throughout the nation, that are open 24 hours and seven days per week. We saved the info on the hour stage within the knowledge warehouse. Every retailer normally serves 500 clients per hour a day. Every buyer normally buys 5 gadgets on common. So, listed here are some easy calculations to know the quantity of information we’re coping with:
- Common hourly data of information per retailer: 5 (gadgets) x 500 (served cusomters per hour) = 2,500
- Each day data per retailer: 2,500 x 24 (hours a day) = 60,000
- Yearly data per retailer: 60,000 x 365 (days a 12 months) = 21,900,000
- Yearly data for all shops: 21,900,000 x 200 = 4,380,000,000
- Twenty years of information: 4,380,000,000 x 20 = 87,600,000,000
A easy summation over greater than 80 billion rows of information would take lengthy to be calculated. Now, think about that the enterprise requires to analyse the info on day stage. So within the semantic layer we combination 80 billion rows into the day stage. In different phrases, 87,600,000,000 ÷ 24 = 3,650,000,000 which is a a lot smaller variety of rows to cope with.
The opposite profit of getting a semantic layer is that we normally don’t require to load the entire historical past of the info from the info warehouse into our semantic layer. Whereas we’d maintain 20 years of information within the knowledge warehouse, the enterprise may not require to analyse 20 years of information. Subsequently, we solely load the info for a interval required by the enterprise into the semantic layer, which reinforces the general efficiency of the analytical system.
Let’s proceed with our earlier instance. Let’s say the enterprise requires analysing the previous 5 years of information. Here’s a simplistic calculation of the variety of rows after aggregating the info for the previous 5 years on the day stage: 3,650,000,000 ÷ 4 = 912,500,000.
The information high quality of the semantic layer is Gold.
Information Visualisation
Information visualisation refers to representing the info from the semantic layer with graphical diagrams and charts utilizing numerous reporting or knowledge visualisation instruments. We might create analytical and interactive studies, dashboards, or low-level operational studies. However the studies run on high of the semantic layer, which supplies us high-quality knowledge with distinctive efficiency.
How Totally different BI Elements Relate
The next diagram reveals how completely different Enterprise Intelligence parts are associated to one another:
Within the above diagram:
- The blue arrows present the extra conventional processes and steps of a BI answer
- The dotted line gray(ish) arrows present extra trendy approaches the place we don’t require to create any knowledge warehouses or knowledge marts. As a substitute, we load the info immediately right into a Semantic layer, then visualise the info.
- Relying on the enterprise, we’d must undergo the orange arrow with the dotted line when creating studies on high of the info warehouse. Certainly, this method is respectable and nonetheless utilized by many organisations.
- Whereas visualising the info on high of the Staging atmosphere (the dotted pink arrow) isn’t supreme; certainly, it isn’t unusual that we require to create some operational studies on high of the info in staging. A superb instance is creating ad-hoc studies on high of the present knowledge loaded into the staging atmosphere.
How Enterprise Intelligence Elements Relate to Energy BI
To grasp how the BI parts relate to Energy BI, we now have to have a great understanding of Energy BI itself. I already defined what Energy BI is in a earlier put up, so I recommend you test it out if you’re new to Energy BI. As a BI platform, we anticipate Energy BI to cowl all or most BI parts proven within the earlier diagram, which it does certainly. This part seems on the completely different parts of Energy BI and the way they map to the generic BI parts.
Energy BI as a BI platform accommodates the next parts:
- Energy Question
- Information Mannequin
- Information Visualisation
Now let’s see how the BI parts relate to Energy BI parts.
ETL: Energy Question
Energy Question is the ETL engine accessible within the Energy BI platform. It’s accessible in each desktop purposes and from the cloud. With Energy Question, we are able to hook up with greater than 250 completely different knowledge sources, cleanse the info, rework the info and cargo the info. Relying on our structure, Energy Question can load the info into:
- Energy BI knowledge mannequin when used inside Energy BI Desktop
- The Energy BI Service inner storage, when utilized in Dataflows
With the mixing of Dataflows and Azure Information Lake Gen 2, we are able to now retailer the Dataflows’ knowledge right into a Information Lake Retailer Gen 2.
Staging: Dataflows
The Staging part is on the market solely when utilizing Dataflows with the Energy BI Service. The Dataflows use the Energy Question On-line engine. We are able to use the Dataflows to combine the info coming from completely different knowledge sources and cargo it into the inner Energy BI Service storage or an Azure Information Lake Gen 2. As talked about earlier than, the info within the Staging atmosphere will likely be used within the knowledge warehouse or knowledge marts within the BI options, which interprets to referencing the Dataflows from different Dataflows downstream. Take into account that this functionality is a Premium function; subsequently, we will need to have one of many following Premium licenses:
Information Marts: Dataflows
As talked about earlier, the Dataflows use the Energy Question On-line engine, which implies we are able to hook up with the info sources, cleanse, rework the info, and cargo the outcomes into both the Energy BI Service storage or an Azure Information Kale Retailer Gen 2. So, we are able to create knowledge marts utilizing Dataflows. It’s possible you’ll ask why knowledge marts and never knowledge warehouses. The elemental motive relies on the variations between knowledge marts and knowledge warehouses which is a broader matter to debate and is out of the scope of this blogpost. However in brief, the Dataflows don’t at present assist some elementary knowledge warehousing capabilities akin to Slowly Altering Dimensions (SCDs). The opposite level is that the info warehouses normally deal with huge volumes of information, way more than the amount of information dealt with by the info marts. Bear in mind, the info marts comprise enterprise particular knowledge and don’t essentially comprise loads of historic knowledge. So, let’s face it; the Dataflows usually are not designed to deal with billions or hundred tens of millions of rows of information {that a} knowledge warehouse can deal with. So we at present settle for the truth that we are able to design knowledge marts within the Energy BI Service utilizing Dataflows with out spending a whole lot of 1000’s of {dollars}.
Semantic Layer: Information Mannequin or Dataset
In Energy BI, relying on the situation we develop the answer, we load the info from the info sources into the info mannequin or a dataset.
Utilizing Energy BI Desktop (desktop software)
It is strongly recommended that we use Energy BI Desktop to develop a Energy BI answer. When utilizing Energy BI Desktop, we immediately use Energy Question to connect with the info sources and cleanse and rework the info. We then load the info into the info mannequin. We are able to additionally implement aggregations throughout the knowledge mannequin to enhance the efficiency.
Utilizing Energy BI Service (cloud)
Growing a report immediately in Energy BI Service is feasible, however it isn’t the really useful methodology. After we create a report in Energy BI Service, we hook up with the info supply and create a report. Energy BI Service doesn’t at present assist knowledge modelling; subsequently, we can not create measures or relationships and so on… After we save the report, all the info and the connection to the info supply are saved in a dataset, which is the semantic layer. Whereas knowledge modelling isn’t at present accessible within the Energy BI Service, the info within the dataset wouldn’t be in its cleanest state. That is a superb motive to keep away from utilizing this methodology to create studies. However it’s attainable, and the choice is yours in any case.
Information Visualisation: Studies
Now that we now have the ready knowledge, we visualise the info utilizing both the default visuals or some customized visuals throughout the Energy BI Desktop (or within the service). The following step after ending the event is publishing the report back to the Energy BI Service.
Information Mannequin vs. Dataset
At this level, chances are you’ll ask concerning the variations between a knowledge mannequin and a dataset. The brief reply is that the info mannequin is the modelling layer present within the Energy BI Desktop, whereas the dataset is an object within the Energy BI Service. Allow us to proceed the dialog with a easy state of affairs to know the variations higher. I develop a Energy BI report on Energy BI Desktop, after which I publish the report into Energy BI Service. Throughout my growth, the next steps occur:
- From the second I hook up with the info sources, I’m utilizing Energy Question. I cleanse and rework the info within the Energy Question Editor window. Thus far, I’m within the knowledge preparation layer. In different phrases, I solely ready the info, however no knowledge is being loaded but.
- I shut the Energy Question Editor window and apply the adjustments. That is the place the info begins being loaded into the info mannequin. Then I create the relationships and create some measures and so on. So, the info mannequin layer accommodates the info and the mannequin itself.
- I create some studies within the Energy BI Desktop
- I publish the report back to the Energy BI Service
Right here is the purpose that magic occurs. Throughout publishing the report back to the Energy BI Service, the next adjustments apply to my report file:
- Energy BI Service encapsulates the info preparation (Energy Question), and the info mannequin layers right into a single object known as a dataset. The dataset can be utilized in different studies as a shared dataset or different datasets with composite mannequin structure.
- The report is saved as a separated object within the dataset. We are able to pin the studies or their visuals to the dashboards later.
There it’s. You will have it. I hope this weblog put up helps you higher perceive some elementary ideas of Enterprise Intelligence, its parts and the way they relate to Energy BI. I might like to have your suggestions or reply your questions within the feedback part beneath.