PUG: A Framework For Measuring Product Performance


Everyone loves claiming to be a “data-centric” product manager. But as any PM knows, despite having all the data in the world, measuring the true performance of a product is extremely difficult.

Take this hypothetical situation- a PM launches a new feature with the hopes that the feature would eventually lead to increased revenue for the company. That sounds straightforward enough. But let’s say the PM launches the feature and revenue doesn’t increase significantly. Does that immediately imply the feature was bad, or that the PM did a bad job? Of course not; many things could have dampened revenue growth despite a fantastically conceived and executed feature.

So, how do you isolate the impact of that specific feature from all the other moving parts within the product — not to mention the impact of things like account management, sales, marketing, and customer support?

There’s no silver bullet, of course (damn you, hypothetical silver bullet, show your face!), but that doesn’t mean you should just give up on it. You must have some way of measuring the performance of your PMs and their product. It can’t be a total laissez-faire utopia. Here on Kinnek’s Product team, we feel we’ve developed a framework that can help you navigate these tricky waters. We call it the PUG (Perfect User Group) framework and we use it to help us come up with more accurate success metrics to measure the true success (or failure) of a product or feature.

The main idea behind the PUG framework is segmentation of your data in order to hold as many factors constant as possible, isolating the impact of the new feature and reducing noise. It’s built on one main truism:

If you build a feature aimed at a specific target audience, and even they don’t enjoy it, then you can safely say it isn’t working.

Let’s say you are a PM and you’re launching a new feature. You want to come up with some success metrics to track for this feature. Here’s how the PUG framework tells you to go about it:

1.) Think about the “perfect” set of users and context for your feature. Consider what the particular characteristics are in a user who would love your new feature- it could be behavioral tendencies, personality traits, job roles, or any other types of segmentation. If this “perfect” set of users doesn’t love your new feature, well, it’s hard to imagine who else will. Now create some criteria that defines this set of users and contexts. Don’t be afraid to include criteria that controls for the influence of sales, account management, marketing, and other functions at your company. For example, let’s say that you want to build a feature whose usage is highly sensitive to how well your customer support team treats the users when they first sign up. It’s great that you recognize this- maybe that means you should define your perfect user group to only include users that were called by a customer support representative within 12 hours of signing up and that call lasted longer than 2 minutes, or maybe include users that have had at least 1 customer support ticket cleared with a certain minimum satisfaction rating. You want to try as much as possible to control for factors that are out of your product’s hands.

Now that you have defined this “perfect user group”, it will help you determine whether your feature is succeeding among its target audience. A caveat here is that it’s important to find the right balance of user specificity with sample size. If you choose such a restrictive perfect user group definition such that only three users fit into that bucket, well, that may be a problem. The goal of the perfect user group is to enable PMs to see if the product does what it’s intended to do in an ideal scenario, and then quickly identify if there are critical issues with the product that must be addressed if it can’t even perform under ideal conditions. PMs can and should calculate success metrics over increasingly expansive user groups to see if the product is doing its job across a wider set of users, as well.

Good example: Users from the Western U.S. who signed up since July 7 who are Engineers. During the signup process, they specified they enjoy eating mangoes, and also spoke to one of your customer support representatives within 24 hours of signing up.

2.) Choose at least one direct impact area and one indirect impact area that the feature will affect. The direct impact area is meant to be very close to the feature itself, something that is directly influenced by the feature. The indirect impact area, on the other hand, is meant to be slightly further removed from the feature, but still presumably could experience a lift due to the feature. The idea here is to force yourself to think intelligently and realistically about the scope of the feature and how it will affect your users.

Take AirBnB, for example- every feature their PMs ship can’t be judged only by looking at its impact on the company’s total revenue, because some of those features are just too minor to have a noticeable impact on revenue. Say, for example, you worked at AirBnB and you are about to launch a new widget on the property profile page that enabled users to visualize ratings information in cooler ways. You could certainly use revenue as an indirect impact area for this, since it’s possible that the feature could lead to more time spent on a property profile page, which would then possibly lead to an higher chance of a renter actually choosing that property. However, you’d also want to look at direct impact areas that were more low-level and closer to the feature itself, such as average time spent on the property profile page. It’s important to not get caught in the top-of-the-funnel fallacy here, whereby you choose an impact area is too far removed from the immediate point of impact of this feature and is separated by too many potential roadblocks. If you don’t see improvement in that area after launching your feature, you won’t know if it’s because your feature is performing poorly or because there simply exist some roadblocks that need to be addressed.

Good example: Let’s say you work at an e-commerce company, and you want to remove unnecessary steps from the checkout flow. Direct impact areas would be related to the completion of the checkout flow, maybe completion rates or even average time taken to complete the flow. An indirect impact area would be profits generated.

All else equal, you’d expect that reducing the steps in the checkout flow would directly increase completion rates for the flow. You’d also expect that higher completion rates would ultimately lead to more sales hence more profits, but there are other factors that could come into play, such as the types of users that are now being driven to complete the form, the profit margins on products that are now being purchased more easily, maybe even the perverse impact of a shorter checkout time on the stickiness of users (unlikely, but possible).

3.) Within those direct and indirect areas of impact, define exact success metrics across your perfect set of users. The exact metrics you define here are extremely case-specific, but try to avoid conflating awareness of a feature with depth of engagement of the feature. Many users may not notice your new feature, but those that do could use it heavily and love it. So, you may want to consider defining at least one awareness type metric and one depth type metric. This will enable you to understand if you have more of an issue with users hating your feature, or simply not noticing it. Those are very different problems. Also, don’t be afraid to use qualitative evidence as the basis of your success metrics in this step. If your definition of “success” for the feature is to call ten users of your “perfect” user set, collect their qualitative feedback, and hear at least six of them say “I loved it”- that’s completely fine. Just define the questions ahead of time and be systematic about how you ask them and record the feedback.

Good examples:

  • Awareness- Percentage of profile page views on mobile devices that resulted in clicks on the review modal (across my perfect set of users).
  • Depth- Given the review modal on the profile page was clicked, the average number of seconds spent on the review modal before closing (across my perfect set of users).

So there you have it, the PUG framework. Some people say the greatest framework ever. And by “some”, I mean mostly just me. In all seriousness, this has really helped us think more clearly about measuring our product performance here at Kinnek. Given the immense complexity of our product, the two-sided nature of our marketplace, and the plethora of non-product factors that could influence user experience, it’s become more important than ever to have our PMs isolate and understand the true impact of what they’re building.

Wearing this hoodie makes me feel like I am a technology leader.

Karthik Sridharan is co-founder & CEO of Kinnek.

A graduate of the University of Pennsylvania, he was formerly a Researcher at AQR Capital Management.