ThoughtWorks
  • 联系我们
  • Español
  • Português
  • Deutsch
  • English
概况
  • 工匠精神和科技思维

    采用现代的软件开发方法,更快地交付价值

    智能驱动的决策机制

    利用数据资产解锁新价值来源

  • 低摩擦的运营模式

    提升组织的变革响应力

    企业级平台战略

    创建与经营战略发展同步的灵活的技术平台

  • 客户洞察和数字化产品能力

    快速设计、交付及演进优质产品和卓越体验

    合作伙伴

    利用我们可靠的合作商网络来扩大我们为客户提供的成果

概况
  • 汽车企业
  • 清洁技术,能源与公用事业
  • 金融和保险企业
  • 医疗企业
  • 媒体和出版业
  • 非盈利性组织
  • 公共服务机构
  • 零售业和电商
  • 旅游业和运输业
概况

特色

  • 技术

    深入探索企业技术与卓越工程管理

  • 商业

    及时了解数字领导者的最新业务和行业见解

  • 文化

    分享职业发展心得,以及我们对社会公正和包容性的见解

数字出版物和工具

  • 技术雷达

    对前沿技术提供意见和指引

  • 视野

    服务数字读者的出版物

  • 数字化流畅度模型

    可以将应对不确定性所需的数字能力进行优先级划分的模型

  • 解码器

    业务主管的A-Z技术指南

所有洞见

  • 文章

    助力商业的专业洞见

  • 博客

    ThoughtWorks 全球员工的洞见及观点

  • 书籍

    浏览更多我们的书籍

  • 播客

    分析商业和技术最新趋势的精彩对话

概况
  • 申请流程

    面试准备

  • 毕业生和变换职业者

    正确开启技术生涯

  • 搜索工作

    在您所在的区域寻找正在招聘的岗位

  • 保持联系

    订阅我们的月度新闻简报

概况
  • 会议与活动
  • 多元与包容
  • 新闻
  • 开源
  • 领导层
  • 社会影响力
  • Español
  • Português
  • Deutsch
  • English
ThoughtWorks菜单
  • 关闭   ✕
  • 产品及服务
  • 合作伙伴
  • 洞见
  • 加入我们
  • 关于我们
  • 联系我们
  • 返回
  • 关闭   ✕
  • 概况
  • 工匠精神和科技思维

    采用现代的软件开发方法,更快地交付价值

  • 客户洞察和数字化产品能力

    快速设计、交付及演进优质产品和卓越体验

  • 低摩擦的运营模式

    提升组织的变革响应力

  • 智能驱动的决策机制

    利用数据资产解锁新价值来源

  • 合作伙伴

    利用我们可靠的合作商网络来扩大我们为客户提供的成果

  • 企业级平台战略

    创建与经营战略发展同步的灵活的技术平台

  • 返回
  • 关闭   ✕
  • 概况
  • 汽车企业
  • 清洁技术,能源与公用事业
  • 金融和保险企业
  • 医疗企业
  • 媒体和出版业
  • 非盈利性组织
  • 公共服务机构
  • 零售业和电商
  • 旅游业和运输业
  • 返回
  • 关闭   ✕
  • 概况
  • 特色

  • 技术

    深入探索企业技术与卓越工程管理

  • 商业

    及时了解数字领导者的最新业务和行业见解

  • 文化

    分享职业发展心得,以及我们对社会公正和包容性的见解

  • 数字出版物和工具

  • 技术雷达

    对前沿技术提供意见和指引

  • 视野

    服务数字读者的出版物

  • 数字化流畅度模型

    可以将应对不确定性所需的数字能力进行优先级划分的模型

  • 解码器

    业务主管的A-Z技术指南

  • 所有洞见

  • 文章

    助力商业的专业洞见

  • 博客

    ThoughtWorks 全球员工的洞见及观点

  • 书籍

    浏览更多我们的书籍

  • 播客

    分析商业和技术最新趋势的精彩对话

  • 返回
  • 关闭   ✕
  • 概况
  • 申请流程

    面试准备

  • 毕业生和变换职业者

    正确开启技术生涯

  • 搜索工作

    在您所在的区域寻找正在招聘的岗位

  • 保持联系

    订阅我们的月度新闻简报

  • 返回
  • 关闭   ✕
  • 概况
  • 会议与活动
  • 多元与包容
  • 新闻
  • 开源
  • 领导层
  • 社会影响力
博客
选择主题
查看所有话题关闭
技术 
敏捷项目管理 云 持续交付 数据科学与工程 捍卫网络自由 演进式架构 体验设计 物联网 语言、工具与框架 遗留资产现代化 Machine Learning & Artificial Intelligence 微服务 平台 安全 软件测试 技术策略 
商业 
金融服务 全球医疗 创新 零售行业 转型 
招聘 
职业心得 多元与融合 社会改变 
博客

话题

选择主题
  • 技术
    技术
  • 技术 概观
  • 敏捷项目管理
  • 云
  • 持续交付
  • 数据科学与工程
  • 捍卫网络自由
  • 演进式架构
  • 体验设计
  • 物联网
  • 语言、工具与框架
  • 遗留资产现代化
  • Machine Learning & Artificial Intelligence
  • 微服务
  • 平台
  • 安全
  • 软件测试
  • 技术策略
  • 商业
    商业
  • 商业 概观
  • 金融服务
  • 全球医疗
  • 创新
  • 零售行业
  • 转型
  • 招聘
    招聘
  • 招聘 概观
  • 职业心得
  • 多元与融合
  • 社会改变
数据科学与工程技术

Uncovering Your Data's Dark Matter

ThoughtWorks ThoughtWorks

Published: Sep 23, 2013

We write applications which report on data in systems every day. And that data is a rich source of information that is used to understand your business and your customers better. What isn’t obvious is that this data can tell you more about your customers if you can see it in a different way.

But getting a fresh perspective on your existing data in its existing structure can be harder than it looks. Your lines of investigation are constrained by the medium in which you store the data. We have to put into a fresh context to really see the patterns which tell a story. By exploring your data freed up from it’s normal environments, you may discover insights and ideas. Lets have a look at some of the challenges here.

“Lateral thinking is… concerned with breaking out of the concept prisons of old ideas. This leads to changes in attitude and approach; to looking in a different way at things which have always been looked at in the same way.” – Edward De Bono

I was working with a client recently who wanted use their data to examine how their users interacted with the site and with each other. Their database contained several years worth of transactional data: they knew the answer would be in there. But their existing database made it hard to find the information they wanted. Trying to query interactions and relationships between users in the existing, relational database quickly descended into complicated join statements and temporary tables. Aside from performance problems, the complexity of the SQL statements made the whole venture extremely frustrating.

This is a common problem. When choosing a database technology and structure we – quite reasonably – make our decision based on the best fit for the application. Unfortunately the choices we make can later constrain our ability to analyse and explore the data. So we decided that we had to break the data out of its existing structure and model it in a new way.

As a proof-of-concept, we loaded two years of data into a graph database. We didn’t spend long developing a mature Extract-Transform-Load (ETL) process. It was more important to us to get feedback quickly on whether our approach was working.

Graph databases store data as graphs of nodes and relationships. For example, if you were analysing data from an online record store, you might model an album purchase as follows:

As it stands, this resembles the original, relational database structure. Users, albums and artists, originally stored as tables, are now represented as nodes. Likewise, purchase information and links between albums and artists are now named, directed relationships between these nodes.

Things start to get interesting when you start to infer new relationships from the existing data and overlay them on the graph. For example, you could infer that I like Aphex Twin if I have bought two of his albums:

Now from a simple database of music purchases, you can build up a graph of users and their preferred artists. You can start to use this information to start recommending music to me that I might enjoy:

Perhaps I would enjoy Squarepusher too as my friend Bob – a fellow Aphex Twin enthusiast – is a fan.

Graph database show their strength when you introduce graph theory. The shortest path between Aphex Twin and Squarepusher can indicate their musical similarity. You can cluster artists by their fan’s tastes or calculate the clustering coefficient to measure whether users purchase a broad range of genres or they are all really into Drill n’ Bass.

The graph database allowed us to traverse relationships in our data that were hard to access in the original structure. This solved our initial problem, but more excitingly than that, simply changing the way we stored our data opened up a wealth of new ideas and possibilities that we had not thought of before.

Graphs are a powerful way of modelling your data to discover new insights, but other database technologies can open up different areas of exploration. For example, Datomic organises data as a series of time based facts allowing you to explore user behaviour over time such as “2013-02-04 Jen bought ‘Classics’. Document databases are great for unstructured data and relational databases are still great for aggregating and slicing up data. When was the last time you de-normalised a NoSQL store into DB2 to glean insight?

“A moment’s insight is sometimes worth a life’s experience.” – Oliver Wendell Holmes, Jr.

This article first appeared in the June 2013 edition of P2 magazine, a ThoughtWorks publication.

  • 产品及服务
  • 合作伙伴
  • 洞见
  • 加入我们
  • 关于我们
  • 联系我们

WeChat

×
QR code to ThoughtWorks China WeChat subscription account

媒体与第三方机构垂询 | 政策声明 | Modern Slavery statement ThoughtWorks| 辅助功能 | © 2021 ThoughtWorks, Inc.