Actionable insights to
drive growth for business
data science & analytics

Ad-Hoc Analysis

Analysis or report created on the spot to analyze the specific business questions that are not clearly answered in a pre-built report.
Ad Hoc Analysis is performed on an existing data warehouse or transactional database and results can be in form of direct output of query or data visualization or static report or a combination of all of them based on business need.

Alternative Data

Alternative data refers to information that is obtained from nontraditional sources, which are used by relevant stack holders(e.g investors, hedge funds) to gain an advantage or sentiment of the market.

Amazon Aurora

Is an RDBMS built for the cloud(Amazon Web Services).

  • It provides full compatibility with open-sourced databases e.g. MySql & PostgreSQL.
  • It is high in performance and cost-effective at the same time.
  • It delivers high performance and availability.
Amazon Redshift

Amazon Redshift is a fully managed, highly scalable(up to petabytes) enterprise data warehouse service provided by Amazon Web Services (AWS). Users can easily set up, operate, and scale a data warehouse in the cloud. Columnar storage & advanced compression techniques provide high performance and scalability for data warehousing and business intelligence workloads.

Azure Data Factory

Azure Data Factory (ADF) stands as a fully managed and server-less data integration solution provided by Microsoft’s Azure platform. ADF effectively manages large-scale data ingestion, preparation, and transformation tasks. It caters to a broad spectrum of applications, including data engineering, migrating data from on-premises SSIS packages, operational data integration, analytics, and data warehousing. ADF seamlessly links to various data origins, both on-premises and in the cloud, enabling smooth data replication across distinct repositories. It empowers code-free data transformation through data flows, and ensures secure integration through private endpoints. Furthermore, ADF supports continuous integration and delivery (CI/CD) workflows via its integration with Azure DevOps and GitHub.

Azure Synapse

Azure Synapse Analytics is an integrated platform service on Microsoft Azure that combines data warehousing, ETL pipelines, analytics tools, and big-data capabilities. It empowers businesses with descriptive, diagnostic, predictive, and prescriptive analytics. Synapse SQL offers dedicated SQL servers for data warehousing, while the serverless model allows ad-hoc workloads. Synapse Pipelines handle ETL and data integration, including big-data services like HDInsight and DataBricks. Common use cases are data warehousing, analytics, real-time insights, advanced analytics with Databricks, and visualization with PowerBI.

Bar Chart

A bar chart or a bar graph is useful to compare the data across different categories to identify patterns or trends.
The bars can be plotted vertically or horizontally; along with clusters or stacked pattern(each bar represent one category(the total value of that category) and the sections within that bar represent subcategories).

Business Intelligence

Business intelligence is an umbrella term of a set of processes and best practices for data gathering, data analysis, data visualization using best-fit architectures and technologies. BI practices creates a direct impact on data-driven decision-making processes. 

CI / CD Integration

CI/CD, or Continuous Integration and Continuous Delivery, is a software development practice that enhances efficiency and reliability in the software development and release process. Continuous Integration involves frequent code merging into a central repository, triggering automated build and testing procedures within minutes of code changes. Continuous Delivery extends this by automating the entire release process, including infrastructure provisioning and deployment.

Key elements of a CI/CD pipeline include stages like source code integration, building runnable software instances, running automated tests, and deploying to various environments. Failures at any stage trigger notifications to the responsible developers. CI/CD pipelines ensure fast, reliable, and accurate software delivery, with a focus on speed, reproducibility, and automation. They also promote a culture of collaboration, enabling developers to focus on coding, ensuring code quality, and facilitating easy access to the latest versions. Overall, CI/CD streamlines development enhances product quality, and accelerates software delivery.

Cloud Analysis

In cloud Analysis; business intelligence or data analytics operations(e.g. Collect, Store & Analyze data) are performed over the cloud against the private, or on premises environments, that enable them to access and analyze data from anywhere & at any time. Cloud computing gives higher performance and ROI for large volumes of data and users.

Cognitive Search

Cognitive Search revolutionizes enterprise search by using machine learning and natural language processing to deliver highly personalized and accurate results. Unlike traditional keyword-based searches, it understands user intent and context, enhancing information discovery.

Businesses benefit from Cognitive Search in several ways. It empowers self-service solutions, making it easier for customers and employees to find information across multiple sources. Intelligent chat-bots improve user interactions, while in customer service, it enhances contact center quality through knowledge consolidation and automation.

The future of Cognitive Search includes voice and visual search integration. Choosing the right Cognitive Search engine involves considering factors like architecture, scalability, connectivity, intelligence, security, content processing, customization, and insights to align with specific needs and technology stacks.

In summary, Cognitive Search is an AI-driven search solution that elevates information retrieval, making it more efficient and tailored for businesses.


A BI dashboard is a user interface / canvas to represent KPIs(Key Performance Indicators) and other relevant data in a clear and comprehensive way; graphically. It enables users to easily evaluate the current state of their business performance and make informed decisions.
Dashboard acts as a hub for monitoring and analyzing organizational performance and can be accessed through web browsers or mobile devices.

Data Governance

Transactions generate data; those data need to be accurate, consistent, and secure throughout its lifecycle; to achieve that there should be framework, policies and procedures and responsibilities for data. This management of data is known as Data Governance.

Data Lake

Data Lake is a centralized data storage repository that stores any data e.g. structure, semi-structured, or unstructured data in as-is format at any scale and can be organized using metadata tags. Data is transformed at a time of need that can be reporting, visualization, machine learning, and advanced analytics.

Data Mesh

Data Mesh is a relatively new architectural approach for data management and analysis where in data ownership is distributed across different business units or domains. Business units or domain are responsible for managing their own data.

Data Strategy

A data strategy is a plan that outlines in details; how an organization will use data as a core asset to achieve its business objectives.
It encompasses the policies, procedures, and technologies that are needed to manage and leverage data effectively. A well-defined data strategy is essential for organizations to make informed decisions, drive innovation, and enhance their competitive edge.

Data Warehouse

DW or DWH or EDW (Enterprise Data Warehouse) is central repository of integrated data from one or more disparate sources that can be analyzed to make more informed decisions. 

Data integrated into a data warehouse from transactional systems (e.g. ERP, CRM)  relational databases, and other data sources, typically on a regular cadence.

Data is stored in hierarchical dimensions and table format to Data Warehouse.

Data Warehouse Automation

Data Warehouse Automation (DWA) is a modern approach to streamline and accelerate data warehouse development & management by automating various tasks in the data warehousing process, such as designing the data warehouse, generating code for data extraction, transformation, and loading (ETL), deploying code to servers, executing ETL processes, and monitoring and reporting on batch executions. Data Warehouse Automation results in faster development, adaptability to changing business needs, focus on reporting and analytics rather than ETL code, ensuring data quality, and maintaining consistency in code and naming standards. DWA tools are primarily used by professional Data Warehouse developers and can empower technical data analysts to maintain their own Data Warehouses with proper training and support. DWA tools can be deployed on-premise servers, or in the cloud.


DAX Data Analysis Expressions (DAX) is a formula programming language used in Microsoft Power BI(Business Intelligence platform from Microsoft) to create measures, calculated columns, aggregations, and custom tables. It also has the capability to transform data.


Dimensions in the context of data warehousing and business intelligence are collections of reference information that provide essential context and categorization for measurable events or facts. These dimensions enable a comprehensive understanding and analysis of these facts by offering background details and structure, such as product information, customer profiles, geographic territories, and temporal data. Dimensions serve as foundational elements in data models, facilitating efficient historical data analysis and enabling meaningful answers to business queries. They are fundamental for organizing and making sense of a set of related facts within a data warehouse or BI system.

Dimensions Tables

A dimension table is a database table of a data warehouse, referring to information about measurable events known as facts. Facts(context of the data) are stored in the Fact Table. Facts stored in a dimension table are organized into categories or attributes.
In other words; it is a table of star or snowflake schema of a data warehouse.

Doughnut Chart

A doughnut chart is an extended variant of a PI chart that features a circular shape with a hole in the center. Similar to a PI chart it is used to visualize data in a way that highlights the proportion of different categories. The outer ring represents the total value, while the segments within the ring represent individual categories and their respective values.


Extract, Load and Transform refer to ELT. ETL is process of extracting and loading data from multiple sources to destination as it is. Transformation operation are carried out at the after loading data not before loading data. Later on this data can be used for analysis & reporting.


Extract, Transform and Load refer to ETL. The ETL process is to extract data from multiple sources, transform it as per business requirements, and load it to the central repository known as a data warehouse that can be used for analysis & reporting.

Fact Table

Fact Table is at the center of a data warehouse and stores measurements, metrics, or facts(context of the data) of business transactions. Fact Tables make it easy for analysis and aggregation of data stored from multiple sources.

Factless Fact Tables

A fact table which doesn’t connotation measures, and act as an interaction of dimensions. Factless Fact Tables are bifurcated; one is to capture an event and second is to describe conditions.

Hierarchical Dimensions
Dimensions in Data warehouse modeling having parent/child(many to one) relationships are known as Hierarchical Dimensions.
Simple example of possible hierarchy widely adopted by by all dimensional data warehouse is date dimension : Year > Quarter > Month > Week > Day
Incremental Load

Incremental load is ETL process to load delta(difference) from source to destination. Based on the requirement it can be streaming incremental load or batch incremental load. Incremental load is faster in comparison to full load as it touches less data.

Key Performance Indicators

Key performance Indicator is value measured by organizations to evaluate the effectiveness of key business objectives.


In the realm of Business Intelligence, a measure refers to a fundamental data point that encapsulates a specific aspect of a business process or entity. These measures can take the form of qualitative information, like a customer’s name, or quantitative data, such as monthly revenue figures. Measures are pivotal in data modeling, enabling calculations and summarizations for analytical purposes. Qualitative measures are typically associated with dimensions, which provide context and additional details to enrich the interpretation of the measured data. By leveraging measures and dimensions, businesses can make more informed decisions and gain valuable insights in their BI endeavors.

The measures in a fact table can be; – Additive, Semi-additive, Non-additive


Metadata narrates other data. Detail data of Data e.g. description, structure, references etc.


Multidimensional Online Analytical Processing (MOLAP) is a data analysis approach used in business intelligence to efficiently analyze and explore data from multiple perspectives. It involves creating multidimensional data cubes optimized for faster querying and exploration. MOLAP systems store aggregated and summarized data in these cubes, enabling users to drill down, roll up, slice, dice, and pivot data to gain insights into trends and make informed decisions. MOLAP contrasts with relational OLAP (ROLAP), offering quicker performance by pre-calculating and storing data in memory. It’s a vital tool for organizations to uncover insights, improve decision-making, and understand customer behavior.


Online Analytical Processing is compounding methods of data to perform multidimensional analysis of transactional data.

Power BI

Power BI is Business Intelligence and Data visualization tool from Microsoft Corporation.


Pentaho is Business Intelligence, Data Integration and OLAP and reporting service tool from Hitachi and now known as Hitachi Ventara

Qlik View

Qlik View and Qlik Sence are Business Intelligence and Data Visualization Tools.

Real-Time Data Integration

Real-time data integration involves the seamless and continuous flow of data from multiple sources into a central system in actual time or nearly so. Specialized tools & technologies are used to capture data from databases, IoT devices, social media platforms, and more. This data is then swiftly transformed into a compatible format and transmitted to the intended system. This dynamic process empowers decision-makers with up-to-the-minute insights, fostering smarter choices, improved customer interactions, streamlined operations, a competitive edge, and better risk management.

Real-Time Data Integration

Real-time data integration involves the seamless and continuous flow of data from multiple sources into a central system in actual time or nearly so. Specialized tools & technologies are used to capture data from databases, IoT devices, social media platforms, and more. This data is then swiftly transformed into a compatible format and transmitted to the intended system. This dynamic process empowers decision-makers with up-to-the-minute insights, fostering smarter choices, improved customer interactions, streamlined operations, a competitive edge, and better risk management.


The database schema is logical view(skeleton) of all or part of entier relational database.

Semantic Layer

A semantic layer is a data abstraction that translates complex data structures into user-friendly terms. It is created by individuals who understand both data storage and business reporting needs. They rename raw data fields into intuitive business terms, hide unnecessary fields, and provide pre-defined filters for common queries. This layer allows users to access data through a simplified interface, organize it into folders, and run reports without needing extensive knowledge of data storage or query languages. Users can also customize the presentation of data for their reporting requirements. The semantic layer streamlines access to information, reducing dependence on developers and fostering a common language for report creation. Benefits include collaborative prototyping, query uniformity, report history independent of data store changes, and enhanced cross-departmental collaboration. In summary, a semantic layer simplifies data access and reporting, making it more user-friendly and efficient for businesses.

Server-less Integration

Server-less Data Integration is a cloud computing model that frees developers from managing server infrastructure. It allows them to focus solely on creating applications, enhancing productivity. This approach streamlines app development by eliminating the need to provision and maintain infrastructure manually. Server-less Data Integration is vital for resolving data quality issues caused by data silos in organizations. It replaces traditional ETL processes, offering agility and cost savings. Key capabilities include automatic infrastructure setup and event-driven execution. In essence, it leverages server-less architecture, triggered by events like HTTP requests, enabling efficient data integration without the need for dedicated servers.

Snowflake Schema

The snowflake schema is an extended model of star schema where dimension tables are normalized that reduce data redundancy. This schema is commonly used for OLAP data warehouses. Snowflake schema saves storage space but impacts performance while executing the query results.

SQL Server Analysis Services - SSAS

SQL Server Analysis Services (SSAS) is a component of Microsoft SQL Server that enables organizations to analyze large datasets by organizing them into easily searchable cubes. It offers multidimensional and tabular capabilities for optimized querying and data mining. Multidimensional OLAP (MOLAP) uses optimized storage for fast queries, while tabular mode compresses data in-memory for even faster performance. SSAS allows users to analyze data from various angles, facilitating historical and trend analysis. OLAP cubes are essential components of data warehouses, providing quick insights and the ability to slice, dice, and solve problems.

SQL Server Integration Services - SSIS

SQL Server Integration Services (SSIS) is a Microsoft tool within SQL Server that handles enterprise data integration, transformation, and migration. It supports diverse tasks such as data analysis, cleansing, and ETL processes to update data warehouses. SSIS extracts, transforms, and loads data from various sources, like relational databases and files, into data warehouses or other systems. It includes development tools, like SQL Server Data Tools, and a server component for deploying and executing integration tasks. SSIS history dates to SQL Server 2005, replacing Data Transformation Services (DTS), offering features like graphical development, workflow capabilities, data transformations, and APIs for object models.

Star Schema

Star schema is a mature data modeling approach widely adopted by relational data warehouses and optimized for querying and analysis. It requires modelers to classify their model tables as either dimension or fact. Star Schema consists of one or more fact tables indexing any number of dimensional tables. The diagram of relationships between tables resembles a star shape, wherein the fact table is a central table and multiple dimension tables are surrounding it.


Tableau is a leading cloud base Business Intelligence and Data Visualization tool.


Tableau is leading cloud base Business Intelligence and Data Visualization tool.

Unified Data Management (UDM)

Unified Data Management (UDM) is a comprehensive strategy that involves consolidating diverse data sources into a single repository within a data warehouse. This approach encompasses people, processes, and technology to address data fragmentation, fatigue, and inefficiencies. UDM aims to optimize data utilization by integrating, cleansing, and transforming data to ensure consistency and compliance, enabling better insights and decision-making across an organization. It serves as a centralized hub for data storage and distribution, comparable to a heart pumping oxygenated blood throughout the body.