Wen-Chiao Su
2023/08/28
In the analytics industry, the terms “data”, “datasets”, and “data series” are used interchangeably. So are “models” and “transformation”.
The distinctions between these are both subtle and not terribly important. But we should understand why.
Data are a collection of discrete pieces of information. This information may be correct, or it may not be. Data are used as input for reasoning. Good, or correct data is likely to yield accurate conclusions; incorrect data is not. Of course, we all know that the same data can be used to reach several, sometimes contradictory, conclusions. Confusing.
Datasets are collections of data that are likely to be related. For example, all of the final grades of the students in classroom B.
Data series are datasets that represent a sequential pattern. For example, the price of a stock over a period of time is a data series.
From an analytics perspective, input data are input data. How related they are is just one of the data’s characteristics.
Models are encoded actions that transform data into conclusions which, as stated earlier, may or may not be accurate. A conclusion may be a new data set.
Transformations are encoded actions that transform data into a new data set. The new data set may be a conclusion.
The terms “model” and “transformation” are completely interchangeable. Data engineers tend to use “model” because their thinking is data-based whereas data analysts and data scientists tend to use “transformation” because their thinking is process-based.
Since Credmark has built a framework for managing processes, we’ll use transformations to be clear.
Any kind of data manipulation is a transformation. There are 4 fundamental types:
While trying to build up a consistent time series, we might use interpolation to create missing data points.
While preparing a dataset for analysis, we may delete data that are outliers.
When processing a set of US addresses, we may divide street addresses and zip codes into separate datasets for better analysis.
To produce a dataset of unnatural deaths, we may combine suicide and accident data.
These are, of course, primitive operations. Advanced transformations are likely to combine two or more of these basic transformation types.
All of the following are transformations:
There are of course many more.
Analysis is the application of transformations over datasets with the goal of reaching conclusions.
It uses data to find patterns or answer questions, which may allow for better decision-making. For instance, by analyzing sales data, we might discover that sales increase during certain seasons, leading to better stock management.
Transformations sit at the heart of the analysis process. Transformations allow data to be prepared and interpreted.
Put simply: analysis = data + transformations + intelligence.
DeFi data is complicated, which makes analysis difficult. Credmark builds powerful DeFi-aware transformation tools to support analysis.
Credmark runs a financial modeling platform powered by reliable on-chain data. We curate and manages DeFi data making it available via API and the Snowflake Marketplace around the globe and across industries.
Our community of quants, developers, and modelers actively build models for the DeFi community by leveraging our data API and tools. Join the growing community and together we will advance the next-generation financial system.
Sign up for our newsletter for the latest product updates, partnerships, and more.