AI, Applied Data Science, Uncategorized

Char2vec – Character embeddings for word similarity

Most of my applied data science work is in text heavy domains where the objects are small, there isn’t a clear “vocabulary”, and most of the tasks focus on similarity. My go to tool is almost always cosine similarity, although other metrics such as Levenshtein or character n-grams also feature heavily. The reason these tools are great is they work great in situations where you have a small corpus and have a situation where there might be an imbalance between the source corpus and the destination corpus, for example:

Trying to find the most similar product that matches the offer – 10% off Sketchers

Again, these methods are tried and true, but also largely from the machine learning era of AI, as the godfather of ML, Andrew Ng, would say. But with deep learning becoming incredibly practical, and techniques such as word embedding becoming increasingly popular, new ways of looking at text are starting to take hold.

I use embeddings almost exclusively these days, with libraries like spaCy making it impossibly easy to create powerful deep learning models for text, but I sometimes find myself having to perform a basic similarity task, and over the weekend, after being stumped on what I thought was a basic task, decided to have a go at using embeddings on single words.

The problem

Take two words, bag and gab. Now, as an applied data scientist, I view all methods and techniques like tools, not an incantation from mtg. If I view cosine similarity as some magical amulet, then I just shove these two words in, take the output and declare profit. Problem is, you have to decide what the output means, and optimize your tool for that goal. If you run these two words through cosine similarity, you get the value 0.6. This is totally fine, if your goal was to see how far apart these words are, but if your goal is to ensure your system is capable of seeing bag and gab as completely different words, but bag and bags as similar, then cosine similarity may not be the right tool.

For my purposes, I needed something:

  1. That didn’t just look at the character counts, but understood their context
  2. Was capable of being tuned so I could adjust its thresholds depending on the data set

Essentially, I needed something very similar to Word2vec, but for single words, so with an afternoon free, I thought, hey, let’s see what a Char2vec might look like.

The experiment

The goal of my experiment was to explore what a character to vector model would perform like, compared to something like tf-idf based character n-grams. I used the same approach as word embeddings and simply pulled apart the two words, created a sliding window to build up the embedding matrix, then did cosine similarity on the resulting vectors.

The results


It was definitely interesting. A couple of observations:

  • For the pair bag, gab, when the sliding window is set to (2,3) we get a solid 0 with both models! However, for the pair bag, bags, the embedding worked better than tf-idf
  • Overall, for cases where the words are from the same distribution, embedding yields a higher similarity score


Honestly, I was hoping Char2vec would redefine my career and put me on the front page of AI Today, the reality was, it definitely does well, and in some cases, better than the traditional methods like tf-idf based character n-grams, but not as well as I’d hoped.

Firstly, if we were to stem the words going in, then tf-idf based character n-grams would perform better I’m almost sure.

Secondly, I didn’t build up weights for my Char2vec model, so there isn’t an estimator or optimization capability which severely hampers  the performance of the model. And as it was the day after Christmas and I was struggling with the effects of food coma, I didn’t have the intellectual horsepower to try and implement sequences to improve the performance of Char2vec.

It wasn’t a complete waste of time though, as it definitely was the right solution to the problem I was trying to solve, so I’m happy about that, and I managed to capture the crude implementation in a notebook, so there is that too.

Happy holidays!

AI, Applied Data Science, Engineering

Why Turing would have hated SQL

Nobody likes SQL, in fact, I’m almost certain Turing would have hated it the most.

Before I back up my reasonably wild and sensational statement with what I hope is a somewhat cogent argument, let me just list some stats as background:

SQL Sucks!  And so do RDBMS!

I’ve never liked SQL, and I’ve never liked relational data stores either. Why? Because I’ve spent most of my career working in industries that were very dynamic, both from a business and technology perspective. The data landscape changed rapidly, and the folks in the front office needed insights quickly, so they leaned heavily on the IT folks in the back, the keepers of the data, to get those insights. Only problem was, the folks in the back often didn’t understand the intricacies of the business domain, but were great at plugging systems together and getting the data to flow. There was a gap between the people who needed answers and the ones tasked with getting it to them.

40 or so years ago, those systems were big monolithic mainframes, where reasonably sized data stores lived. IBM needed a way to pull basic representations from those data stores, and they invented SQL. SQL is by its own definition, structured, and hence, needed a data source that was as well. It’s no surprise then that one of the fathers of SQL was also the father of one of the most painful and onerous data normalization strategies on earth. And so RDBMS became a thing, and the wretched union of SQL and RDBMS came to be.

But it was created at a time when technology, whether it was a printer or “advanced” system, was kept far away from anyone who wasn’t a technologist, which back then, meant you had never loaded a tape reel or compiled a kernel.

Time’s have changed, BI hasn’t!

Back when SQL was growing facial hair, the weapon of choice for business was this:

Image result for vintage ibm 1970 images

Today, businesses change at the speed of sound, and so have the tools of the trade:

Image result for iphone x siri

And even though most aspects of corporate IT have changed with the times, the divide between the business and the data they need to make decisions hasn’t. We may have made it more efficient for the IT folks to build reports etc., but truly letting business folks get access to the right data is still no different than it was 40 years ago; you cut a ticket, try your best to describe your need, and someone in IT will deliver it a few days later.

What has this got to do with Turing?

Turing believed that the most powerful way to deal with computers should be to simply have a conversation with them. If the computer was built right, it would be able to respond intelligently to the human, and together they could get the job done.

SQL is the absolute antithesis of this in the data context. SQL requires the human to express their needs in a language understandable by the computer. It requires that the human be able express their needs in a highly reduced language form. Because SQL lives in a one-way world, you submit your request, it runs it, and it returns data.

In a Turing world however, the machine sits between the human and the data. It understands the data to a degree, and machine learning can take this understanding past basic non-linearity. As the human makes requests, the machine tries to understand the intent and respond. Like human-human interactions, such as requesting a new BI report and going back and forth until the right information is presented, the human-machine interaction is similar, the human helps the machine learn the right way to satisfy the request, until it gets it right, but here is where the world changes forever.

See, that human-human iteration process, that we do everyday when trying to get data out of a system, by the very nature of our natural environment, is not unbounded, it may sometimes feel like that, but largely it’s not, especially in business where the bounds get more narrow because of vertical specialization. As humans, we suck at remembering vast frames of data, and recalling these. So when we receive requests for data in the form of reports, we kinda start from scratch each time, and introduce a certain bias based on what we know and don’t know.

Machines don’t suffer the same issues, in fact, “learning” about the business intelligence domain is a very tractable problem, and over time, and privy to all requests for information within an organization, a machine could become far more powerful in assisting humans with their data needs.

What does this mean for the future of SQL?

Hopefully it means SQL can one day disappear from the data landscape, much like CSV is now that more advanced formats like Parquet have emerged. There’s no need to get nostalgic or dogmatic about these things, the data world is changing as fast as the businesses that use it, and it’s time to get behind more advanced ways of thinking that embrace intelligent machine based methods.

For example, something I’ve been working on is a more natural method for interacting with data, aimed at empowering end-users and truly democratizing the vast stores of data within an organization.


For me, being able to explore data as a conversation is the right direction for advanced analytics to take, and with the right use of deep learning, natural language understanding, and semantic data modelling, the possibilities are endless. It also requires us to start embracing non-linear and probabilistic methods, which is frankly, a good thing.

AI, Applied Data Science, Engineering

Running a Data Science Workshop

In my most recent role at Microsoft, one of the aspects I loved most was consulting with clients on their toughest data science problems. Initially I was skeptical about the impact one could make on a challenging data science problem in 5+ days without any prior knowledge in the domain, the data, or the problem to be solved. It wasn’t until I did my first few projects that I realized how powerful a solid workshop playbook is to adding value to a customer’s data science problems.

It starts with a solid question

It seems intuitive, that every good data science outcome probably started with a good data science question. Truth is, AI, ML, data science, are practiced more by the marketing team than the engineering team in industry, which leads to a lot of ill-defined data science projects and in turn, ill-fated data science outcomes.

For me, a good question is a simple statement derived from a formal process that includes:

  • Who is asking this question and what is the impact to answering it? Think of this as question/answer fit; if you don’t know who needs the answer or why it matters, it’s most likely not worth answering.
  • Will the answer add insight? Rather than refine existing understanding? Why does this matter? A meaningful data science project should create new insights, if the insights are already available via existing means, the overall ROI on the project will be diminished. Seek to discover new meaning rather than incrementally add to existing knowledge.
  • Is it meaningful? Just because you have a great question, and the answer doesn’t already exist, doesn’t mean it’s a good candidate for a data science expedition. Make sure you can tie the result to business outcomes.
  • Is it tractable? Intuitively, before you start exploring the model, does it seem there is an answer somewhere in the exercise? If you’re chasing a wild goose, it will be hard to bound the problem and establish a measure of completeness.
  • Is it bounded? This is one of the most important aspects, if the question is too vague or open-ended, it will be difficult to select and tune a model that can achieve your goals. The best way to think about this is in terms of inputs and outputs, if you can’t specify the output clearly, working backwards through the model and the inputs will be difficult.

What does a good question look like?

What measurements can the maintenance team use to predict a possible motor failure on a shop floor machine with greater than 80% confidence using existing telemetry data before an outage occurs?

This question has:

  • An owner: the maintenance team
  • An insight: Predict possible motor failure
  • Meaning: Before an outage occurs
  • Tractability: Intuitively telemetry data on prior machine operation and failures should give us the answer we seek
  • Bounds: An existing data source and a confidence rating helps us bound the problem and know when we’ve finished

Wallow in the data

I love this quote by Sir Arthur Conan Doyle from Sherlock Holmes:

‘Data! Data! Data!’ he cried impatiently. ‘I can’t make bricks without clay.’

Too many data science projects rush head first into model selection and don’t spend enough time truly understanding the relationship between the underlying model and the data. You must determine:

  • What data will I need to answer the question?
  • Where will this data come from? Does it already exist or will I have to create new data through feature engineering? Is data missing, will it need to be imputed? Will we need to create new categories from continuous values?
  • Is the data “clean”. Data preparation can take up to 50% of the project time. De-duplication, removal of extraneous data, reformatting, are all cleaning tasks that may be required to prepare the data for modelling.
  • Column analysis. What is the distribution of the data. What kind of values form the population. Will new incoming data distort the distribution/population in the future?

Once you have a candidate data set, you need to revisit your question and ask, “Given the data we have, are we still able to answer the question? The relationship between the question and the data is iterative, and before exiting this loop you should feel confident the question and the data set are aligned.

Model Selection

In data science it is common for teams to miss this step and go straight to the algorithms. Model selection is crucial to the success of your data science project. So what is a model? Put simply, a model helps explain the relationship between your question and your data. Let’s go back to our original question, here we’re trying to predict an outcome based on an existing data set. A model will help us make assumptions about the telemetry data and its ability to predict failure of a machine. There are many models, that fall under groups, for example, one potential model we could use is Linear Regression, where we might assume that there is a relationship between a measure of a motors insulation resistance to potential failure.

Algorithm Selection

Once we have a set of candidate models that help us create a relationship between our question and answer, we can move towards fitting the data to the model.

Let’s revisit our example. We think Linear Regression is a good model to describe our question and answer. The base Linear Regression model has two parameters, an intercept and a slope. What we want though is a function that we can use to predict future events, so we need to create a new function from the existing data that helps us answer our question. At a naive level, we could simply work out a basic straight line function that given the value of the motor’s insulation resistance we simply output a value “fail/no-fail”. But there are many ways to create a linear model, and this is where frameworks like scikit-learn come into play, as they can help us fit a model to our data, with enough control over the parameters to ensure we can meet our goal, in our case, 80% confidence. scikit-learn as an awesome chart that helps visually explain this.


At this point in the process, we have a data set we are using for training, we’ve arrived at a model that we think represents our question and answer relationship well, and we’re using algorithms to help us fit our data to our model. At this point, we want to be highly empirical about our process, and this is where evaluation is crucial. It is also crucial to be open to iteration, data science is very iterative, and learnings from the data -> algo -> model cycle can help us refine the process.

Key measures you should consider are:

  • Performance: As you work through different models and different parameters/hyper-parameters, you must always measure the performance of each iteration. Once you reach your desired threshold, it is good discipline to baseline that experiment and move forward. Likewise, candidates that reduce your performance should be rejected and documented.
  • Explainability: Black-box models are problematic, as they can deliver superior results, but if you cannot describe why, or reason as to the relationship between the question and the answer, then you should treat these models with suspicion, and continue searching for a model with equal performance that is easy to reason about.
  • KISS: As you explore multiple models and parameters/hyper-parameters, always favor simpler candidates. This speaks to the first two points; a model that is easy to reason about and performs to the standard we desire is the better than a model that is hard to explain but performs beyond our requirements.

Ship It!

Ah, shipping. One of the biggest challenges to any data science project. What does shipping even mean? Have we shipped once we have a winning candidate model and parameters/hyper-parameters? Well, this is somewhat subjective, and it really belongs as a criteria in its own right.

At the start of every data science project, clearly define what it means to ship. If the model is going to be used in an offline business process, then shipping might mean wrapping it in a lightweight web page and exposing that to the business. If the model is going to be used in your product, in a production environment, then you’re going to have to think about operational concerns, monitoring, life-cycle, collecting model telemetry, retraining and re-deploying new versions, etc.

In general, shipping a model is measured by use of the model by the owners of the question. You know you’ve shipped when those owners begin to receive answers, and even more important, you should be able to measure the ROI on those answers. Like every good feature, if the model provides no value, maintaining it over time does not make sense.

And so…

Running a data science workshop before each project in my mind is a must. It helps identify all the relevant stakeholders, forces everyone through a methodical process, and ensures we’re using objective measures to define success or failure. The most important aspect of a workshop is to determine whether a data science project is worthwhile, before setting off into the great unknown.

Need help running a workshop? Drop me a line

AI, Engineering

Using AI to “cognify” your Apps

Today terms like Artificial Intelligence and Machine Learning are used interchangeably to describe systems that possess capabilities not easily implemented as heuristics; applications such as speech recognition, prediction or computer vision fit into this space. From an industry perspective however, we rarely discuss feature level strategies and instead try to design into a system a set of cognitive capabilities which leverage AI, ML, and a range of associated patterns and practices.

So how do you “cognify” your apps? There are two major areas to consider.

What is the cognitive scenario you’re building for?

While cognition is more of a marketing term these days, when product teams begin to think about using it, it becomes more of a solution looking for a problem. A better approach is to consider your key user scenarios and ask yourself a few questions:

  1. Can we leverage behaviours from other users to help a new customer use the system? For example, using collaborative filtering to suggest a set of starting options for a user at the beginning of a “new” interaction flow.
  2. Are there tasks we can perform on behalf of the user using their prior interactions that eliminate the need for certain UI components? A classic example of this is intensive data entry applications. Initially the user is presented with data entry screens, but over time you could train a network to fill in the forms on behalf of the user. It’s true you could also build a heuristical approach to this using regex statements to map values to field identifiers like css paths or xpaths, but as most people find, the edge cases with these approaches can become difficult to maintain and manage over time.
  3. Can we learn workflows within the application by observing the user’s interaction flows? Routing and approval paths are a perfect scenario for this, rather than users having to create static rulesets for where data flows and the actions/people associated with these flows, the system can track the actual workflows where people submit emails to one another or include users in collaborative efforts. This is more dynamic and requires less static configuration which means it will scale better and require less administration.

The above exercises are just examples of how to think about cognition in your product. Ultimately you are trying to discover user scenarios where the system is supporting decision making or automating repetitive, time consuming tasks that cannot be achieved through heuristical methods.

What does your cognitive engineering lifecycle look like?

The software development lifecycle (SDLC) and application lifecycle management (ALM) are terms used to describe the development and deployment of engineering based features. With the exception of data stores, you applications should be stateless, so the process in which they get designed, developed, deployed and maintained is based on this premise.

Cognitive systems are very different in that the data processing and statefulness is inextricably linked to the “code”. The code in this case is a pipeline of feature engineering, model training, model performance evaluation, and runtime processing capabilities. Then there is the model artifacts, which are separate to the raw data they are built from.

As such, you need to think about your cognitive engineering lifecycle, both in terms of development and deployment. This gives rise to the concept of “cogops“, the method by which business analysts, data scientists, software engineers, and operations teams manage a cognitive feature throughout it’s lifecycle.

At a high level you need to be thinking about:

  1. How do I integrate business domain experts, data scientists, and engineers using tools and automation to ship end to end features.
  2. How do you test locally and deploy into upstream environments with only env configuration changes.
  3. How will you assess the performance of new model versions and automate the process for upgrading these models in upstream environments.
  4. How will new data enter the lifecycle, be used to retrain or extend existing models, and how will these new signals be consumed.

While there are many aspects of engineering devops such as monitoring and alerts that apply to cognitive features, there are new processes such as training and performance evaluation which are new.


It’s important to start thinking about your cognitive investments as highly integrated pieces of your architecture and not as offline silos. It’s one thing to run an R report and sneakernet that to the appropriate stakeholders for evaluation, and another thing to mainline these capabilities into your development processes and your apps.

AI, Engineering, Management

Building World Class Data Science Teams

When it comes to making long term decisions, I like to collect a variety of data across a meaningful period of time. Why? Between two entities in a relationship, you are almost always going to see variables change over time. So when it comes to recruiting, retaining and developing engineering talent, observing someone over the course of an 8 hour interview loop gives you a contracted period of time with limited variables from which to draw conclusions and make predictions about the long term arc of you and them. Add to this the unique nature of AI engineers, and you’re going to need a new playbook to guide you in building a world class data science team.

Before I dig into the playbook, let’s talk about the archetypes at play. There are three distinct types of data science engineer:

The Academic

This fresh grad has more course credits in AI/ML than Prof. Ng could teach and has just put the finishing touches on their thesis solving Falconer’s conjecture written in pure R. They are light on software carpentry skills and the only tests they have ever written were as a TA in college.

The Mustang

This up and comer has been on the line for years, mastered no less than 3 languages including one named after facial hair and uses painfully esoteric IDE’s that border on ASCII art. They’re “self-taught”, meaning they’ve read every on Github that mentioned AI/ML and can wax lyrical with utmost conviction on why Deep Learning beats the stuffing out of SVMs without personally having ever used either.

The Sceptic

Wha? Machine learning? Artificial Intelligence? Who? Speak into the good ear!!


Who’s the best type to build a team with? Well, you kind of need elements of all three. I’ve built AI/ML teams for the past 5 years and while the methods for recruitment and selection have changed, the personalities are almost always the same. You need someone who has a reasonable grasp for what’s happening under the covers, otherwise they’re going to shy away from the complexity and end up practicing coincidental coding rather than making rational informed design decisions. They also need to have a hackers spirit. Engineering AI features is heavily iterative and takes the patience of a saint, so if they’re not capable of working in ambiguous environments they will burn out. And lastly, they need have a healthy level of scepticism regarding the overall field. Machine Learning was born out of a schism between AI purists and those who had to work for a living, and as AI has taken the brand forefront once again, there is a daily deluge of information on AI and ML. If you don’t have good bullshit filters, you’re going to spend a lot of time moving your eyes left to right instead of your fingers up and down.

The Playbook

The JD

Make it interesting, make it relevant, make it honest. Omit words like “rockstar”, “ninja”, “guru”, and replace these with actual frameworks and platforms you are using or intend to use. Discuss the expectations of the role within the framework of:

  • Feature engineering – Where does the data come from, how much work is required to get it into a quality standard for training models?
  • Model selection and training – How are you expected to develop hunches? What are you expected to deliver and in what time frame? When is done, done?
  • Model maintenance and troubleshooting – What does care and feeding look like? Is this mission critical or best efforts?

The JD is crucial to inviting the right candidate into the funnel while ensuring the looky-loos and tire kickers don’t clog your pipes.

The Coding Exercise

If you’re like me, when you interview someone for a role on your team, you’re thinking long term, or at least you should be. So back to the 8 hour interview. I worked with my last AI/ML hire for around 6 months, but my longest relationship has been over 5 years. So let’s say for me x is somewhere around 960 < x < 9600. So at a min of 960 hours of engineering time spent, I have 8 hours of data to set up my vector. At 9600 it gets worse.

Also the one thing I’ve always hated about the 8 hour interview, is as engineers, we spend so much time in code, alone, together, in the SO hive mind. So pulling someone in for a series of cold start coding sessions is down right bizarre, and frankly, feels like the kind of interview process a non-engineer would come up with (stares directly at IBM middle management). So I like to put together a comprehensive coding exercise which touches the major elements of the job description including any engineering and infrastructure pieces required to be successful in the job. I like to give that to a candidate on a Friday evening and expect it back before work starts Monday morning. The process is very straight forward:

  1. Clone an assignment repo – This contains a seed project in a monolithic form with all the major project folders in place with a detailed regarding what is expected and any data required to perform the task. For example, it might be some product data in a json file where the task is to create an API to expose an endpoint where incoming product descriptions are matched with similar product descriptions from the json file and delivered through the API.
  2. Encourage frequent and small commits – As the candidate is making progress, encourage them to make commits with decent commit messages. This is great to provide insight into their reasoning and thought process, and also gives the interviewing team concrete places to discuss code with the candidate should they make it to the next stage.
  3. Use the SCCS to collaborate and discuss – Github and Bitbucket both have great PR and comment capabilities. Leave comments, ask questions of the candidate, leave notes to your team mates. All of this provides a great place to have a code conversation with your candidate within a meaningful context.

Come Monday morning, you’re going to have a great insight into your candidate, and they’re going to have a realistic sense for the job they’re applying for.

The Follow Up

Don’t tell me you thought there wasn’t an 8 hour interview? Of course there is, it just isn’t a random expression of technical strength designed to make your candidates feel like dousing themselves with petrol. Instead, it’s a joyous confluence of engineering expression facilitated through code. Code that the candidate wrote, that your team is familiar with, and actually relates to the job they’re applying for. During this time you can dig into their reasoning, design decisions, but also provide positive feedback, let them know if they did something exceptional or delightfully unexpected.

A Data Scientist is not just for Christmas

So you’ve vetted 90+ candidates over the last 3 months and are ready to offer 1 a job. They accept and you’re in AI/ML engineering nirvana. Now what?

Don’t drop the ball. It’s really that simple. The best AI/ML teams recognize that it’s not an exact science or an engineering discipline, it’s a bit of both with magic in the middle. So always make sure your checkpointing your processes, reviewing your practices, and tracking against your high standard. By keeping the tempo similar to the interview process, learning new methods, exploring innovative technologies, solving unique problems, recognizing great work, you’ll not only build a great team, you’ll keep it.

And if you have any questions on how all this works in the wild, drop me a line.

AI, Engineering

AI Feature Engineering for Pros

2010 was my first experience with AI in the form of an NLP project at Microsoft. The toolchain, framework and overall process was rudimentary and did not lend itself to rapid iteration or follow any particular engineering workflow. Like most AI projects, the goal is to get to a minimum viable model (MVM), so it’s understandable that automation and tools are deferred until the basics of feature and model selection are complete.

Like most feature engineering scenarios however, deferring these concerns actually retards the iterative process, with engineers spending more time performing scaffolding tasks than data science tasks. This also speaks to the difference in building AI features versus non-AI features. With non-AI features, it’s more crucial to “true up” your stack during development to accelerate iteration, however AI features require a more exploratory approach, where feature engineering and model selection are more valuable than truing up the application during each change.

So how do you set yourself up to engineer AI features like a pro? Easy!


To build AI features quickly with confidence, especially in a team setting, you need to have a deterministic environment that must span development and production. Whether you’re trying to freeze framework versions or easily share experiments, reducing the barriers and friction for other engineers and environments to spin up an AI feature is crucial.

Docker has a number of benefits for AI engineering teams:

  • Sharing environments between engineers
  • Rapid framework evaluation
  • Deterministic deployments through environments


The feature engineering lifecycle for AI features is bifurcated between design and maintenance. During the design phase, the ability to tinker, hack and visualize is critical, and there is no better environment than Jupyter for that.

The best way to think of Jupyter is an online document editor where paragraphs can be code that is executed by an interpreter. Out of the box, Jupyter supports Python, which is perfect for AI engineering given the extensive support provided by libraries like scikit-learn and TensorFlow.

For example, before creating my Git repos, scaffolding an app, building unit tests, etc., it’s much easier and useful to simply start exploring the AI problem. Let’s take the very useful task of calculating the similarity of text, say for example, if you’re doing a deduplication activity. With Jupyter, you create a new notebook, start hacking and iterating, without the need to actually spin up a program.


The great thing about Jupyter is not only does it support Python, it also supports other languages like Golang and Bash! This means you can iterate independently of the dev process until you’ve fleshed out your concept and then simply migrate the working code to an IDE via ctrl-c-v or using Jupyter’s export capabilities.

Oh, and Docker + Jupyter means you can get started with the leading data science stacks like scikit-learnTensorFlow, Spark and R. This is a huge boost as it means you can start exploring and vetting these AI frameworks and platforms without having to waste time setting them up.


Yes! Jenkins! Now, I’m always in danger of being called out for using Jenkins for just about everything, but let’s face it, when it comes to doing stuff when stuff changes, Jenkins is awesome.

So how do you leverage Jenkins as part of your AI feature workflow. After you’ve moved your feature from iteration to mainstream development, you must monitor the performance of your feature. Now, like the rest of your stack, AI features have specific attributes that must be measured before you deploy them. For example, you might have a classifier that is being trained nightly by a content team. Jenkins is great because you can have Jenkins run a job which prepares a confusion matrix against the latest model checked into source control and if the accuracy is less than the currently deployed version, simply withhold it from deployment, conversely, if the accuracy is better, deploy it. This is a great example of using CI as part of your AI feature pipeline.


The key takeaway here is to stay lightweight and iterative during the design phase, then use proven automation platforms and techniques to ensure you manage the lifecycle of your AI features. Time spent on learning technologies like Docker and Jupyter will not only accelerate your AI feature development but make it easier to move them from development to production.