Matrix Partners | Bringing AI to Legal Services with Cognition IP

Cognition IP is an AI-powered patent law firm that aims to make legal services more accessible by making it possible to provide higher quality work at a lower cost. I sat down with Bryant Lee and Andrew Tjang to chat about how they’re building their unique business and technology.

Andrew and Bryant after graduating from YC’s W18 batch

Can you introduce yourselves and tell us a bit about Cognition IP?

Bryant: Our company is a patent law firm that helps companies obtain patents faster and less expensively by using technology, especially AI. My background is that I was a computer science major and was in a PhD program for computer science at Carnegie Mellon where I worked on developing large AI data sets. After that, I went to Harvard Law School and then was a practicing patent lawyer at Covington Burling doing patent litigation for large companies.

The impetus for starting this company came out of my experience working in big law and feeling that the largest law firms just had the wrong incentives not to use technology that exists and works well, because they work based on billable hours. This creates perverse incentives and the solution to every problem becomes to put as many people as possible on it. So a new business model in this field is to create your own law firm and create tools for your own use to make the business more efficient, and what you sell is just the legal services to the end user at a fixed price.

Andrew: I was most recently a teaching professor at Rutgers University in the computer science department. My graduate work was focused on distributed system reliability and used machine learning to surface relevant features in operator errors. I met Bryant at Harvard Law School through my wife who is a practicing corporate attorney. When Bryant approached me about working on this problem together, I was intrigued. It was very clear to me from watching my wife work that the legal industry as a whole needed something to attack some of its gross inefficiencies and to make legal services more accessible.

Why is AI uniquely useful for the use cases that you support in the legal world?

Bryant: A lot of the legal work that lawyers do really is fact intensive. The law doesn’t change often but in every case the facts are different. The application of the law to the facts takes a lot of time. So it’s really a search problem — identifying the right information, surfacing it to the right people and then organizing it effectively (lawyers typically juggle several cases simultaneously). Right now there aren’t really good tools to do this efficiently and a lot of time is wasted as result.

In patent law, in particular, one of the areas that’s really important is patent search. You are searching over a very large space. There are 10 million US patents and worldwide about 1 to 2 million patents are issued each year across jurisdictions. You’re not only looking for patents, but also scientific literature and information on other products in the market. The current way this is done is for a highly trained lawyer or engineer to spend hundreds of hours searching the space and still not know if they’ve found everything or not. But a machine can search that space much more effectively, especially using some new techniques that have come out in the last few years to help identify topics and concepts that are similar even if the wording is different.

Andrew: I think what the other neat thing about what we’re doing and why this space in particular can lend itself to these techniques is the sheer volume of data. And not just any data, but structured data that is generally pretty clean.

For a lot of use cases where people want to apply AI, one of the biggest problems is is gathering enough data to be able to train your models. In our case, we have a large amount of labeled training data and also do augmentation ourselves. There’s also what we generate internally through the law firm, which is providing services. We’re building a tool that tracks the lawyers’ search work through their web browser so that you could see what queries they use and how much time goes into reviewing results recommended by the algorithms. We look at which ones they open up and which ones are ultimately included in the report to the client. So that’s all collected automatically without our lawyers doing any incremental work on it.

It’s interesting that you mention structured data too because I tend to think of documents as unstructured data.

Andrew: Let me clarify that a little bit. Some of the data is structured and helps us in understanding the relationships between documents. Fields are generally labeled and documents are categorized. For every single patent, someone at the US patent office has listed out what categories they believe this text belongs to and this is over the course of tens of millions of documents. And as a result there’s a certain degree of reliability there — it’s not just a Mechanical Turk doing it. It’s someone who’s trained in the art of reading patent specifications and understanding what they’re about.

And yeah, a lot of it is unstructured. There are definitely opportunities to apply some of the more complex models of natural language processing for the unstructured text. The general models we’re using are for deep learning. Right now we’re doing a lot with CNNs and LSTMs but we’re also actively investigating new techniques to apply.

Having this type of expert-labeled ground truth strikes me as a big advantage.

Bryant: It would be really hard to do this if the USPTO hadn’t already done the labeling — it would be hard to get the same quality output through another mechanism.

How do you guys think about the quality of the output that you get from your models?

Andrew: Ultimately on the business side it has to be something that’s related to time or cost, for evaluation of the general technique, right? Our clients know we focus on providing the same level of quality as larger firms. So it’s not that we’re cutting corners, we’re just making things more efficient and better through this business model and technology. Our lawyers come from big law backgrounds and have experience working at the largest firms.

But for the models themselves, there are more standard ways of understanding the accuracy, like having training and test sets and and having some human interaction to evaluate the results. So our initial analyses are basically the traditional precision and accuracy and recall of things we know to be similar through various sources. But I think that’s going to have to evolve because ultimately we’re not just trying to describe the known universe of patents right now. We’re trying to predict the unknown, because all patents are supposed to be new. So the idea is that the thing that we would be searching for — if it is a good patent should not exist out there. And so I mean, that’s a whole other problem space.

I think that’s an important observation that gets lost in a lot of the dialogue about AI. There are very rigorous techniques to evaluate model performance. But those don’t necessarily translate into things you have the ability to optimize in the real world. So when you’re building the type of business that you’re building you have to find some other sort of proxy variables to track over time.

Andrew: Right. Lawyers are all about time. I think all lawyers are just conditioned to track down to the minute what they’re working on. So there’s a good data set there that we’re collecting. We know how long it takes our attorneys to do certain tasks and can track how our technology improves that over time.

We’ve talked before about how you guys do some work for people that are looking to patent AI technologies or algorithms. Can you say a little bit more about that?

Bryant: A lot of our clients are companies doing AI work and looking to file patents, which fits in well with our expertise. For clients that are looking to file patents, it makes a lot of sense to work with a lawyer who knows the technology well. It’s very helpful because the lawyer can help expand what the client is inventing and also make sure that the invention is well covered by the patent. So when you work with a lawyer who knows the technology the whole process is more efficient. We’re also a startup who works with AI so we understand both the technology and how IP figures into the funding landscape and is relevant to discussion with VCs and that sort of thing.

Andrew: There’s also a whole other discussion, that might be for a future time, but there’s a lot of controversy in the software world around what patents mean and what should or should not be patentable. But without getting into that argument, I think it’s interesting what we’re seeing now. If you look at the patent landscape since the beginning of when computers first start being commonly used — people would come up with patents that were like “this known process that we’ve done for years, I’m just going to add a computer that’s all of a sudden the new thing!”.

And right now there are some parallels. But as simple as “known process plus AI” sounds, it’s actually quite complicated. It’s not as easy as just tacking AI on to something that already exists, but it’s just funny how the progression seems to work that way. It makes it so interesting to work in this space and see how people actually are applying AI.

Without opening the whole software patent debate, are there particular trends or broad strokes that you’ve observed that you could share?

Bryant: One thing that is nice for AI companies is that grant rate of the patent office for AI patents is very high. Some patents on core AI technologies have a very high allowance rate — some stats that have been reported are as a high as 80%. And a lot of these things could be very fundamental and ubiquitous in 5–10 years. So there are unique chances to patent some pretty fundamental things.

Is that at the algorithmic level or what level of abstraction is that at?

Bryant: Some are at the algorithmic level and some are at the application level — that is, applying AI in a certain domain or for a particular type of thing. Sometimes it’s for things that you couldn’t do before with or without AI.

In terms of general technology, a lot has been going on in autonomous driving, computer vision, audio processing in various ways, drones and many other areas.

Can we walk through one as an example?

Bryant: One field that has exploded is autonomous vehicles. If you look at graphs of patents for autonomous vehicles, they show almost exponential growth. It started out very low and now there’s hundreds if not thousands that are being granted per year just on AV.

If you look at the patents themselves, Google started filing a ton about 5–10 years ago. They started patenting things back in like 2010 or 2011, back when nobody else thought that that was a reality. So as a result they were able to get some some really broad patents that seem in hindsight like maybe this is something that was obvious. But at that time, I guess people didn’t think so.

Things like, a two mode vehicle — where in one mode it is autonomous and in another it is driven by a human. So the car is driving in autonomous mode but when it gets to roadwork the human could take over because that’s more difficult.

Andrew: Yeah it’s funny because now most people would say that would be a no-brainer.

Can we walk through your AI-assisted workflow and how that allows you to quote pricing upfront?

Bryant: The interface for the client is meant to be the similar to a traditional law firm — they start by telling us what the invention is. Then we do a search based on our best practices from top law firms, but we also have a tool where you can highlight some of the text from the clients. For example, you can highlight the part that defines the invention and then the tool will automatically detect which classes to search. It will also give you a list of say, 20 patents, that are the most similar so then the lawyer can review them. So the lawyer looks at the top results, and if there is something there that’s collides with an existing patent you can potentially end the search. If not, you continue searching.

What was the most like unexpected part of this journey so far in terms of building on these technologies?

Bryant: What was surprising to me is how how quickly things are progressing in different fields and how much the AI techniques have evolved even since I was in graduate school 10 years ago. I completely expect powerful new models to come online in the next 3–5 years.

Andrew: Talking to friends in this space, they said something that was interesting to me, which was that, let’s say you want to find an expert in deep learning. You could become the expert simply because it’s such a new field that if you read the seminal papers you can catch up. You don’t have 20 years 30 years of stuff to read about.

It’s interesting to see how there are different viewpoints of how accepted AI is for certain tasks. I would have imagined a lot more pushback from people. So if you just hear our one tagline of “AI for patents”, it conjures up images of machines replacing lawyers and that’s not really what we’re doing. We do have lawyers in the pipeline to ensure that things work out well and clients actually do get a quality patent.

But before working on this, the reaction I probably would have had is if someone had said, “this is an AI robot that’s going to do heart surgery on you” is like, do I trust it enough to put my life in the hands of this AI? Do I trust it enough to put the intellectual property of my company on this AI? And it’s refreshing to see that there are people who are like, I understand what AI is and what it can do better than a human can. I understand that a human may miss this prior art thing but an AI will do a lot better at finding it. So we’re getting to a point where you’re seeing acceptance in pockets of some industries where you may not have seen it before.

A lot of people are concerned increasing adoption of AI is going to negatively impact employment. How do you see that playing out in the legal field? Do you think it will ultimately reduce the number of practicing lawyers?

Andrew: To me, because of all the inefficiencies that exist, I don’t think the current way things are done in law firms is sustainable. So I think when it’s shown the technology works it will definitely continue to be adopted.

Bryant: I think lawyers will become dramatically more efficient because you can probably wipe out 80–90% of what they do, which is just really fact intensive and not using your legal knowledge. Things that anybody could do but you can just do a lot faster by having an AI help you with it.

In terms of whether it reduces the number of lawyers, right now there are a relatively small number of lawyers who work in big law and get a lot of work, but there are also a lot of lawyers who are underemployed and don’t make that much money. They’re underemployed and yet their services are still too expensive for most people to access. So I think maybe we have the right number of lawyers now, but if you can use technology to reduce the cost of legal services you unlock a lot of demand because it is less expensive. A lot of people who need legal services now simply don’t use them because they can’t pay for it.

That’s very interesting and counter to the mainstream narrative, but seems plausible.

Bryant: There’s only about 1 million lawyers in the US as it is, so roughly one for every 300 people, not a huge number.

As founders, are there things you’d want to consume powered by AI? Do you have a call for startups?

Bryant: I wish someone would write a program to write emails for me! I spend so much time on emails. That would be great, because I think a lot of people spend tons of time on professional emails.

Andrew: I think the computer scientist in me is infatuated with the idea that I want to work on this myself — our data set and how we treat it and how we analyze it, that’s fascinating. As a business person, obviously, we spend a whole lot of resources doing things like that. It would be nice to have a service that was like, here’s my data set, I want you to go find me a model. I don’t know how viable that business would be though. As useful as it is, what gives your company value is that model and I don’t know we’d want to give up control of that. But there’s a lot of redundant work.

So a lot of companies that are in the AI space expend a lot of resources doing the same thing over and over again because they want the control. I’m wondering if there’s a way to commoditize that part and increase innovation all across the board, but not have to cede control over the creation of those models. I think that would be interesting. Being able to somehow anonymize and privatize the analysis of that data would be compelling.

Anything else we should know about Cognition IP?

Bryant: We focus a lot on AI but also do all areas of patents — around software, hardware, medical devices, and biotech.

Andrew: When you talk to people about working with lawyers, it’s about that relationship that you have with your lawyer. And we think you shouldn’t undervalue that and from the way we’ve interacted with clients from the beginning. Because we have these tools that save time we can devote time to things that will make that experience better. You go to a big firm and you get the name sure, but who do you really get working on the things, especially if you’re a small company?

We talk to people and describe what we do and it conjures up images of some interface on the website and you feed in your invention and two seconds later you get a patent out. I mean, maybe, like in 20 years. But even then, that’s no way to treat your IP — the thing that’s going to make your company billions of dollars, right? So if that misconception was out there we want to clear it up.

You can learn more about Cognition IP at https://www.cognitionip.com/ or reaching out to Bryant or Andrew at blee@cognitionip.com and atjang@cognitionip.com!

‍