WorkflowCredmark Logo

What is Data Transformation?

Data, datasets, data series, models, and transformations

Wen-Chiao Su

2023/08/28

Introduction

In the analytics industry, the terms “data”, “datasets”, and “data series” are used interchangeably. So are “models” and “transformation”.

The distinctions between these are both subtle and not terribly important. But we should understand why.

Data, Datasets, and Data Series

Data are a collection of discrete pieces of information. This information may be correct, or it may not be. Data are used as input for reasoning. Good, or correct data is likely to yield accurate conclusions; incorrect data is not. Of course, we all know that the same data can be used to reach several, sometimes contradictory, conclusions. Confusing.

Datasets are collections of data that are likely to be related. For example, all of the final grades of the students in classroom B.

Data series are datasets that represent a sequential pattern. For example, the price of a stock over a period of time is a data series.

From an analytics perspective, input data are input data. How related they are is just one of the data’s characteristics.


Models and Transformations

Models are encoded actions that transform data into conclusions which, as stated earlier, may or may not be accurate. A conclusion may be a new data set.

Transformations are encoded actions that transform data into a new data set. The new data set may be a conclusion.

The terms “model” and “transformation” are completely interchangeable. Data engineers tend to use “model” because their thinking is data-based whereas data analysts and data scientists tend to use “transformation” because their thinking is process-based.

Since Credmark has built a framework for managing processes, we’ll use transformations to be clear.

Example Transformations

Any kind of data manipulation is a transformation. There are 4 fundamental types:

  • Data creation
  • Data deletion
  • Data division
  • Data combination

While trying to build up a consistent time series, we might use interpolation to create missing data points.

While preparing a dataset for analysis, we may delete data that are outliers.

When processing a set of US addresses, we may divide street addresses and zip codes into separate datasets for better analysis.

To produce a dataset of unnatural deaths, we may combine suicide and accident data.

These are, of course, primitive operations. Advanced transformations are likely to combine two or more of these basic transformation types.

All of the following are transformations:

  • cleaning,
  • normalization,
  • encoding,
  • decoding,
  • aggregation, and
  • feature extraction.

There are of course many more.

Analysis

Analysis is the application of transformations over datasets with the goal of reaching conclusions.

It uses data to find patterns or answer questions, which may allow for better decision-making. For instance, by analyzing sales data, we might discover that sales increase during certain seasons, leading to better stock management.

Conclusion

Transformations sit at the heart of the analysis process. Transformations allow data to be prepared and interpreted.

Put simply: analysis = data + transformations + intelligence.

DeFi data is complicated, which makes analysis difficult. Credmark builds powerful DeFi-aware transformation tools to support analysis.

About Credmark

Credmark runs a financial modeling platform powered by reliable on-chain data. We curate and manages DeFi data making it available via API and the Snowflake Marketplace around the globe and across industries.

Our community of quants, developers, and modelers actively build models for the DeFi community by leveraging our data API and tools. Join the growing community and together we will advance the next-generation financial system.

copy to clipboard

Sign up for our newsletter for the latest product updates, partnerships, and more.

Ready to get started?

Sign up for our free Token API

Get the latest news

Footer

Credmark logo
DiscordDiscord iconTwitterTelegramTelegram iconYoutubeYouTube iconGitHub

© 2023 Credmark Labs, Inc. All rights reserved.

Products

  • All Products
  • Token API
  • Portfolio API
  • DeFi API
  • Raw Data

Documentation

  • Token API Reference
  • Portfolio API Reference
  • DeFi API Reference
  • Transformation Reference
  • Framework Reference

Resources

  • FAQ
  • Blog
  • Reports
  • Media

About

  • Careers
  • About us
  • Community

Support

  • Status
  • Contact us