Are you at your most vibrant when you’ve successfully distilled data into its simplest, most meaningful form?
Thoughtworks is a global software consultancy with an aim to create a positive impact on the world through technology. Our community of technologists thinks disruptively to deliver pragmatic solutions for our clients' most complex challenges. We are curious minds who come together as collaborative and inclusive teams to push boundaries, free to be ourselves and make our mark in tech.
Our developers have been contributing code to major organizations and open source projects for over 25 years. They’ve also been writing books, speaking at conferences and helping push software development forward, changing companies and even industries along the way. We passionately believe that software quality is driven by open communication, review and collaboration. That’s why we’re such vehement supporters of open source and have made significant contributions to open source tools for testing, continuous delivery (GoCD), continuous integration (CruiseControl), machine learning and healthcare.
As consultants, we work with our clients to ensure we’re evolving their technology and empowering adaptive mindsets to meet their business goals. You could influence the digital strategy of a retail giant, build a bold new mobile application for a bank or redesign platforms using event sourcing and intelligent data pipelines. You will learn to use the latest Lean and Agile thinking, create pragmatic solutions to solve mission-critical problems and challenge yourself every day.
Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.
You’ll spend time on the following:
Mass data collection and storage using Big Data components
Data cleaning and conversion using Big Data components
Data-visualization, multi-dimensional data-drilling, and data-analysis on large amounts of data using Big Data components.
Integrate the Big Data components to build sandbox to support multi-tenancy data analytics
Components performance tuning
Here’s what we’re looking for:
At least 3+ years of experience required, proficient in at least one of the programming languages: Familiar with one of languages such as Java, Scala or Python.
Experience with Spark, Flink, Mapreduce, or other distributed computing frameworks.
Experience with Storm, SparkStreaming, or other streaming processing frameworks.
Experience with MPP and search engines, like Impala, Preso, Elasticsearch, etc.
Experience with distributed storage systems, like HDFS, Ceph, etc.
Experience with NoSQL DBs, like HBase, MongoDB, CouchDB, etc.
Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
Knowledge of PowerBI, Tableau, or other data visualization tools is preferred.
A candidate who has Big Data component tuning experience and Machine Learning experience would be a great plus.