Key Lessons In Lean Analytics With Alistair Croll

Alistair Croll is an entrepreneur with a background in web performance, analytics, cloud computing, and business strategy.

In 2001, he co-founded Coradiant (acquired by BMC in 2011) and has since helped launch Rednod, CloudOps, Bitcurrent, Year One Labs, and several other early-stage companies. He works with startups on business acceleration and advises a number of larger companies on innovation and technology.

Alistair is a sought-after public speaker on data-driven innovation and the impact of technology on society, Alistair has founded and run a variety of conferences, including Cloud Connect, Bitnorth, and the International Startup Festival. He’s the chair of O’Reilly’s Strata + Hadoop World conference. He has written several books on technology and business, including the best-selling Lean Analytics.

With Alistair, we went through a set of key questions to understand how start-ups can use data to understand the impact their actions bring to the growth of the organization.

What brought you to the study and research to writing Lean Analytics?

Alistair Croll: I was running a startup accelerator called Year One Labs in Montreal, and it was based on the lean start-up principle, so we did a few things very differently from how a normal accelerator would work.

For example, we made the accelerator a whole year instead of 90 days because we kind of believed that in 90 days all you can learn how to do is pitch your product.

We told our companies they weren’t allowed to code for the first month of their participation. As you can imagine, that’s a very different thing to tell people when they’re used to writing software.

We were starting an accelerator based on lean principles, and we realized that there was no good set of ground rules about which metrics to watch for your business model and where they should be, and that led us to go out and start doing research.

What’s the difference between being data-driven and data-informed and why is that important?

Alistair Croll: Most algorithms, most A.I. is what we call normative, that is it’s based on past, existing data. People talk about A.I. being biased. What they really mean is we’re biased, and an algorithm trained on what we do exhibits biased behavior.

So, when you’re data-driven, you’re using the past. You’re trying to find trends and patterns in the data you have, and sometimes that’s useful. But it’s very hard to be truly disruptive or innovative if you’re in a world where everything you do is based on the data you have because you’ll only ever do the norm. You won’t really sort of break the rules.

If you’re data-driven, because most of the data that you have is data from your existing business or your existing industry, it’s very hard to sort of innovating and disrupting. Whereas if you’re data-informed, what you’re doing is starting out with a hypothesis and then using data to validate or invalidate that hypothesis.

In that case, your human creativity and your intuition and your domain expertise is the start, and then you use data to very quickly reduce the risk of that assumption.

What are some of the common pitfalls that entrepreneurs should avoid when collecting data or actually trying to understand what kind of data they need?

Alistair Croll: There are lots of things you need to think about. I think the overarching rule is you need to think critically about data, and what I mean by that is ask yourself why was this data collected? Is it accurate? Is it precise? Can I use it? Should I use it? Am I going to use it?

There are obviously legal and ethical constraints around data use like GDPR, and that in fact affects some of whether or not you can collect data.

But there’s also the question of maybe you have access to a bunch of data to start your business, but you may not have ongoing access to that data, so is the data source sustainable? Is the data source repeatable? Is it trustworthy, if your business model is based on that?

If you’re just doing research, primary research, then data is fairly straightforward, and then you get back into ethical issues. I’ll give you a good example of how complicated data can be.

In Boston, the City of Boston wanted to know where all the potholes were, so they built an app called Street Bump, and Street Bump looked for potholes on the road by using the accelerometer on your smartphone as you drove to and from work. It turns out the app was incredibly biased.

Why? The data subject to the experiment was from people that had a late model smartphone with an unlimited data plan and they have a passenger seat to put the phone in. In Boston, that tends to be a rich, white person.

So, even though the app worked really well, and everything about the data science was good, the fact that they used smartphones as a measurement process biased their data, so they were only analyzing potholes for rich, white people’s neighborhoods.

That’s a really bad thing. That’s a real problem. So, what they did is they went back and fitted it to buses and garbage trucks, and they got a much more accurate picture of the world. That’s the real problem here, is that if you’re trying to collect data, oftentimes that data is wrong or biased or inaccessible for reasons you don’t really know until it’s too late.

Thinking really critically about the data and ask “Can I get the data? You may have a thing that happened, you have to ask yourself can I get the data? Am I allowed to have the data? Am I allowed to act on the data? Should I act on the data? If I do so, can I build something sustainable and repeatable atop that data?”

So, thinking about data critically is really hard, and this is at the core of analytics. You have to ask very good questions about what metrics you care about, what stage of growth you’re at, what kind of data you have, and that informs how analytical you can be because it’s important not to suppress your sort of death instinct.

The Chrysler minivan and the iPad, none of the data would have told people directly that you needed that, but a lot of the data adjacent to that worked. For example, nobody said they wanted a minivan, but everybody knew that parents were picking up their kids, taking them from activity to activity and so on.

And so, you could predict that a minivan would be a useful addition. A lot of times a founder isn’t going to find someone to say, “Build me this product.” They’re going to find trends, and they’re going to extrapolate from those trends to their new product or service.

How important is data cleaning?

Alistair Croll:

The dirty secret is that 80% of the money being spent on what we call big data is actually extract, transform and load, or ETL data, cleaning and that being done at a large scale.

That’s still true with all the modern advances.

As a start-up, thinking about what data you collect and how you’re going to use it, dumb stuff like normalizing the data, making sure that you’re always entering peoples’ Twitter handles the same way instead of entering “acroll” (my twitter handle) at one point, @acroll at another, at another.

You have to assume that everything you do, you’re going to wind up using that data somewhere else, and that’s a good argument for having an analyst on the team.

There was a post a while ago from Dave McClure who came up with the concept of Pirate Metrics where he said, “Every good founding team is a hacker, a hustler, and a designer,” and I argued that you need to add an analyst to that to keep the other three honest.

What’s the key difference between qualitative and quantitative data?

Alistair Croll: Qualitative data is anecdotal. There’s an old saying the plural of anecdote is not data. So qualitative data tends to be kind of messy. If you have a form and you ask people an open-ended question, “What’s your favorite food,” it’s going to be very hard to analyze that because people will spell foods differently, they won’t structure their data in a way that makes it easy to crunch, but you may find an insight. You may go through that and find, hey, a lot of people like sushi. I didn’t know that.

Whereas if you’re quantitative, you may give people seven things that you can choose from. You’ll never see the eighth answer of sushi, and you may miss a business opportunity, but you can very easily analyze that data.

What we found is that qualitative data is where you get insights and form hypotheses that you can then research with more precision, but you can’t really analyze it at scale or automate it, and that’s somewhat true.

Machine learning is getting better at natural language processing and extracting meaning and so on. Sometimes the machine learning algorithm can help you with that, but the reality is that you’re going to want to start with qualitative stuff to find insights and hypotheses, and then you’re going to want to start with quantitative stuff to validate it at scale.

And then finally when you automate a system, you’re going to want to take that quantitative algorithm and you’re going to want to automate the process so that you’re managing by exception.

I’ll give you a good example. If you had an e-commerce site, you might ask people in a survey form, “What were you unable to buy here?” Then a human goes through that and finds out that everybody really wanted bags to go with whatever product you were selling.

So now you add bags to the checklist when someone is checking out and now you have quantitative information. You can crunch that and go, “Here’s the checklist. We can find out what things you’re looking for.”

Here’s the checkout, and you can do the math on that, and then you can actually analyze how that works. The third step is once you now know everybody wanted those things, you can go ahead and just make that a normal part of your checkout process, and only alert me when sales go down.

So, the third step is you start with qualitative, then you go to quantitative, and then I would say you go to automated, managing by exception because by then you’re on to the next metric. You’ve maximized shopping cart checkout, and now you’re onto the metric for improving the way in which retention happens or the frequency of returns.

Why vanity metrics are so dangerous and what are some examples?

Alistair Croll: I don’t think you should make a list of vanity metrics. The reality with vanity metrics is:

If you can’t tell me how this metric will change your behavior, it’s a bad metric.

So if you say a number of followers, that’s a vanity metric. But if you tell me that 10% of your followers will subscribe to your product, that’s not a vanity metric because I have a meaningful correlation with something that changes my business model.

So, vanity metrics are those where you don’t have good confidence that they will affect your underlying business model. If the number doesn’t change something that you report to your accountant, it’s probably a vanity metric. That’s the real challenge is finding metrics that are correlated with an outcome you care about.

What about cohort analysis, and why does it matter so much for start-ups?

Alistair Croll:

A cohort is any group that has something in common. The reason cohorts matter so much is because as a start-up if you’re doing things right, you’re launching a new version of your product every week.

The version that you downloaded last week is probably not as good as the version I downloaded this week which is not as good as the version my friend downloads next week.

So, if I take your experience with a very early beta of my product, where it’s not so full-featured, you’re kind of dissatisfied, and I calculate the math for your retention and how much you paid and your satisfaction. And I do it alongside the experience of someone who used a product that was much more mature and much more advanced and three things later and so on, three versions later, I don’t want to pollute your horrible experience into their wonderful experience.

What you’ve got to do is you have to analyze each cohort, meaning each group, separately because you should be seeing improvements across those cohorts. Maybe in the first version of your product, you had 20% churn and the second you had 10% churn and in the final one, you only have 2% churn which is wonderful.

It would be horrible to say, “My product on average has an 8% churn.” What you really want to do is say, “I brought churn from 20% to 2% in three releases.” If you don’t separate the individual released using distinct cohorts, then you have a real problem because you’re ignoring improvement.

What about the multivariate analysis, why is that so important for start-ups?

Alistair Croll: If you are Google, you have so much traffic that you can do a test on any one thing and very quickly get statistically significant results.

You change a search, you’re going to get thousands of results very quickly. If you are a mere mortal like us with your new start-up, you probably need to change many things at once just because you don’t have enough traffic to properly test every individual tweak and change we’ve made.

Instead of doing split testing or AB testing, what you’re going to do is change a bunch of things, and then you’re going to try and find out which package or which bundle of features deliver the outcome you want.

Again, you need to know what outcome you want. That’s why you need one metric that matters more than anything. If your metric is conversion, you need to know which bundle of things, the social network that you approached them on, the offer that you made, whether you include shipping, whether they saw the green or the blue button, all those things.

With multivariate analysis, you’re going to try and use statistics to predict which bundle of things delivers the result you want. So when you don’t have a ton of traffic, you need to use clever stats to deal with some of those issues.

And the reality is you don’t need to be really good at stats because a lot of the analytical tools out there do that stuff, but you need to be aware of stats.

A lot of people will draw a conclusion after talking to 20 customers. Well, basic stats tells you that even for something with a normal distribution, you need 30 or more data points to have a good confidence interval. If that sounded confusing, you need to go find a friend who took statistics and buy them dinner.

What’s the one metric that matter (OMTM)?

Alistair Croll: When we first put the book out, we actually got a lot of grief from people. They said, “There’s no way you could have one metric. There are 10 things.” The analogy I always use is when you’re trying to park a car. Let’s say you’re backing up your car into a parking spot.

Your car has dozens of metrics that are always talking to you. There’s the position of your hands on the wheel which is called proprioception. There is the sound and there is the vibration.

There’s pull on your body from g-forces. There are so many different dials and dashboards and numbers, but when you’re parking your car, the only thing that matters is the distance from the thing behind you.

That’s your one metric, right? Once you’ve parked the car, maybe your metric is how close am I to the curb? Or how far is it for me to walk to the store I want to go to?

But your metric when you park the car is a distance from the thing behind you. If your car runs out of gas, there’s a light that’s going to come ongoing, “Hey, you’re out of gas,” while you’re parking, and that’s great because you’ve automated uninteresting metrics; because you need to be able to make decisions about them, but you’ve told the system, “Only bug me about this when it changes.”

Early on, you might have said, “Gee, how much fuel is in my car is an important metric,” but as I mentioned earlier, it’s very easy to quickly automate that.

One of the jobs of modern data managers, in addition to thinking critically about the data, is to be able to know which metrics to automate and which metrics to test and explore and then what metric is the most important metric for the organization right now.

That’s very much a factor of a start-up culture.

What’s the key definition of a start-up?

A start-up is an organization designed to search for a sustainable, repeatable business model.

Once you’ve found a sustainable, repeatable business model, your job is to execute on it. The act of searching is basically a constant distraction, so having a focus is a very good countermeasure or counterweight to that constant distraction which is a necessary part of any start-up.

How can we classify the various business models?

Alistair Croll: In order to figure out which metric you should care about, there are two things you need to worry about.

The first is the stage you’re at:

  • Early on, companies are at what we call Empathy Stage where your job is to figure out what your market wants.
  • Then as time goes by, you need to get to what we call the Stickiness Stage which is simply can you get people to keep doing the thing you want?
  • Either to stay in the shopping carts or to subscribe and come back. Once you’re past stickiness, you get to the Virality Stage which is will those people tell their friends? There’s no point to doing virality until you have a sticky product. If I tell all my friends about something, and they all go there and don’t use it, what was the point of me telling them?
  • Once you’ve passed virality, then you get to the Revenue Stage which is where you say, “Can I make enough money to either buy ads or hire a sales force?
  • After the Revenue Stage, you ask, “Can I scale? Can I grow the business without growing the costs? Can I maintain margins while I grow?”

That’s the first thing you need to know. We offer some rules for knowing which one of those you’re at.

The second thing is we have to figure out what business model you’re in.

The business model is really a question of how do you acquire your customer, extract money from your customer and deliver value to your customer?

We came up with six business model archetypes.

An obvious one is a transactional business model. E-commerce is a good example of that.

But another one might be a two-sided marketplace where you and the customer. The buyer and the seller are coming together on your platform. There’s still a transaction, but what matters is things like inventory and how many searches resulted in something someone wants to buy and stuff like that.

There are these five stages, and then there are six business models we talked about.



1. The two transactional business models:

  • e-commerce
  • shared marketplace

2. There are two media business models: 

  • One is a traditional media site like CNN or BBC
  • The other is a user-generated content site like a Reddit or a Facebook. They both make money from ads

3. The third category is really where you’re getting into a subscription business, so either consumption-based stuff like cloud computing or tendency stuff like SaaS.

We put those six business models in, but there’s a bunch we left out.

We didn’t put in, for example, donation-based businesses like a Patreon or a Kickstarter or even Wikipedia which makes money off those things. Because we didn’t go into all those things we had a very hard time figuring out how many models or how few models.

The underlying idea here is that if you know which stage you’re at for a company and which business model you’re in, you should very quickly be able to figure out what metrics matter to you.

What is the number one metric that it’s really critical for subscription business models?

Alistair Croll: I can tell you which metric matters at different stages, but I can’t tell you which metric is critical.

I would say for SaaS, customer retention or what’s called residual lifetime value is the best metric in the long term.

Again, early on, are people telling their friends? Do they go from trying the thing to subscribing? But there’s a talk by Roberto Medri of Etsy where he talked about residual lifetime value as the measure of choice for any customer-facing product in a mature organization.

I’ll explain how residual lifetime value works. Imagine that I have two visitors to my website. The first visitor has come five times in the last month and spends $200 in total.

The second visitor has come ten times in the last month and also spent $200. Which of those two customers is more valuable to you, the one that came five times or the one that came ten times, assuming they cost the same to acquire and they spent the same amount of money? Which one of those is more valuable, the customer who’s been five times or the customer who’s been ten times?

My initial answer to Roberto’s questions was the one that’s been ten times because I get to run more experiments on them and they’re more engaged, right? Even though their shopping cart total is small.

Now let’s assume that a week goes by, and you haven’t heard from either of those customers. Normally, they come once every week or two.

One guy comes ten times in a month and one guy comes five times in a month, and you haven’t heard from either of them for a week.

That the guy who comes ten times a month, if he doesn’t show up for a week, you may be dead to him. He may have found another vendor, he may have died, he may have changed his mind. But the guy who comes once a week, he’s coming five times a month, he just maybe a couple of days late, right?

It’s more likely you haven’t lost the person who comes less frequently, and that means that the residual lifetime value, the remaining value of the customer is actually much higher for the second guy who’s only been five times.

That’s a weird concept, but this idea of residual lifetime value, of measuring the remaining value of all your customers and whether that’s increasing or decreasing is probably the master metric to SaaS.

I’m actually running a workshop at Start-Up Fest which is a big start-up conference in Montreal. It happens in about two weeks. I’m teaching an opening class at something we call a Founding Workshop.

The class is all about finding your business model because a founding stage entrepreneur, someone who has not yet found product-market fit, in start-ups the mythical thing you’re looking for is this idea called product-market fit.

What’s the key difference between the problem/solution fit to having a business model?

The reality is that if I’m running a hair salon, for example, what is the problem I’m addressing?

If I run a hair salon, the problem I’m addressing is peoples’ hair grows. What’s my solution to that problem?

The problem is peoples’ hair grows, and the solution is I cut peoples’ hair. What’s the business model?

The business model is how many seats can you fill. That’s the actual business, right? If I’m investing in a hair salon, I don’t care whether they can cut hair or whether peoples’ hair grows.

I care how many of the seats can you fill throughout the working day because if I can fill all the seats, hairstylists will beat a path to my door because they want customers, right? A lot of founders start a company with a problem and a solution but no business model. For example, a problem is hair grows.

The solution is I cut the hair. The business model is how to fill seats. The biggest mistake that early stage founders make is they don’t know how to separate the problem and solution they’re trying to offer to the market from the business model.

So, they go to a start-up accelerator pitching them a problem and a solution. People get cuts, and we make Band-Aids. Everybody needs that. Sure, I totally agree. What’s your business model? I don’t know.

That’s the biggest danger for any of these companies is they mistake having found a problem and a solution which is often a problem for themselves. I used to have a t-shirt that said your mom is not a valid test market because a lot of people are like, “Well, I talked to my family.” The reality is that you have to find a way to create a business model, and if your business model is a new, sustainable, repeatable business model, then you’re a start-up.

A lot of entrepreneurs who want to take an existing business model call themselves start-ups. They are not. And a lot of founders who found a problem and a solution but no business model yet call themselves successful start-ups, and they’re not. Once you have a business model hypothesis, that’s when data can help you understand whether it will scale properly and so on.

Key takeaways

  • Beware of vanity metrics that don’t add any value to the business
  • Algorithms are not biased, rather they are trained on what we do exhibit as biased behavior
  • Thinking about data critically means asking very good questions about what metrics you care about and that have an impact on the business
  • Qualitative data is where you get insights and form hypotheses that you can then research with more precision
  • Instead of doing split testing or AB testing, startups can use multivariate analyses which enables them to understand which features deliver the outcome you want.

Three key highlights

I want also to point out and emphasize three key things that emerged from the discussion:

  1. A good metric has to change behavior: this is extremely important! Metrics that do not change behaviors do not count and cannot be considered as good metrics.
  2. A start-up is an organization designed to search for a sustainable, repeatable business modelthis point also needs to be stressed. A business model isn’t just about creating a company. That is about creating a business which is sustainable in the long term (thus all the building blocks have to come together), it is repeatable, thus it is prone to become scalable over time.
  3. The business model is really a question of how do you acquire your customer, extract money from your customer and deliver value to your customer? As it came up from the discussion, a business model can be reduced to key things such as how you acquire customers, extract money from them while delivering value!

Suggested reading


Related interviews: 

Related business resources:

Case studies: 

About The Author

Leave a Reply

Scroll to Top