The more you understand AI software, the better you can appraise its benefits.
Rees Morrison, Altman Weil, Legaltech News
The first
article of this series explained why law firms and law departments should
familiarize themselves with machine learning algorithms. While discussing data
sets collectible by firms or departments and software resources for the
computations on those sets, that article left for later the topic of how
machine learning software actually "learns." Magic may be what many
people think is the legerdemain of machine learning, but underneath the hood is
not magic—it is math.
"Wait," you
might be protesting, "I'm a lawyer, and math is a foreign language! Real
lawyers manage concepts, clients, legal problems and other lawyers, and let
techie geeks crunch the numbers." Those who stop reading here say, "Let's
preserve my comfort zone of the supremacy of text and intuition over numbers
and probabilities."
Understanding enough
about the burgeoning world of artificial intelligence and what computer
programs can deliver, that is to say how the software tools actually function,
is not too detailed, too geeky, or too incomprehensible. Lawyers will think
strategically better and manage others more effectively when they have a
grounding in the world of today's technology. Innumeracy and tech naiveté is
unworthy of management lawyers who aspire to lead; the march of machine
learning is afoot.
In the innards of their
computer code, algorithms—a fancy term for the steps the computer has been told
to take when given data—burrow through and emerge with clusters or
classifications or estimated predictions. Sophisticated mathematical
calculations take place, out of sight of the user for the most part, but
deserve to be understood at a basic level. The prospects for how artificial
intelligence will affect the practice of law, not to mention the alarming
predictions—"50,000 lawyers to lose jobs to Watson by 2020!!"—can be
evaluated better by those who have a sense of what the software can and can't
do.
Machine learning
algorithms deliver the most when the data they parse is numeric. As we
mentioned in the previous article, if we want to predict the billable hours
likely next year from individual associates (or whether they will decamp for
another job), we would assemble HR and financial data on their years out of law
school, their years with the law firm, their billable hours in each of the
previous two or three years, the number of lawyers in their primary practice
group, and other numeric facts about them.
Enter the math embedded
in machine learning software! Let's touch on the range of mathematical tools on
which algorithms such as naïve Bayes, neural nets, multiple regression and
Support Vector Machines rely.
Statistics have been
known for years regarding how to take data sets and create the "best
fit" lines for that data. When legal managers choose multiple regression,
they draw upon a powerful suite of tools that can estimate what the output will
be based on new information. With the best fit line or related output
calculated from a training set of our associate data, we gain a formula. Into
that formula we can plug information about an associate who was not in the
training set and learn an estimate of the new associate's likely billable hours
(or probability of staying with the firm). The range of regression applications
in law firms and law departments is enormous.
Matrix algebra carries
out operations on numeric matrices, rectangular sets of numbers such as in a
spreadsheet. More significantly for machine learning algorithms, this field of
mathematics can convert complex, hyper-dimensional sets of data (each dimension
is a different bit of information about the client or matter or associate or
practice group) into simpler sets that are more tractable for software. This is
at the root of what is known as "principal components analysis" and
other tools in the machine learning workshop. Decomposing large data sets into
key variables makes sense, and only computers can whisk through the complicated
mathematical transformations of large data sets.
Similarity measures are
carried out constantly by machine learning software. If you think about points
on a Cartesian plot with an x-axis and a y-axis, it helps you visualize how
software can measure the distance between any two of those points. Now, extend
your thought experiment to three dimensions, four dimensions and more. Humans
fail early, but it is easy for computers to calculate the Euclidean distance or
the Manhattan distance or other measures of distance between any of the points.
Or they can find clusters of points and calculate the center of the cluster—the
centroid—and instantly figure out the distance between centroids.
Another mathematical
tool used is probability. Probability is used for example in naïve Bayes
algorithms to figure out the optimum likelihood of something happening given
various conditions. More fundamentally, as new information comes in, Bayesian
analyses updates the new probability. With probability calculations, as with
any inference learning, more data is always better.
As for calculus, machine
learning methods "learn" by figuring out how close they are to a
known answer (the training set) by figuring out lines on curves. When the
software is training itself on part of the data, it knows the answers and it
keeps tweaking the mathematical dials until it figures out how closely it can get
to the answers. One common step involves tangents, a term for a straight line
that touches a curve at a single point. The slope of this tangent line (how
much a movement in one direction changes the position in another direction) is
instrumental for machine learning calculations. A key concept here is
forbiddingly named "gradient descent" determinations.
When you have a sense of
the calculations that underlie machine learning algorithms, you also have a
better grasp on why it is important to scrub the data you feed in. For example,
missing data can confound a machine learning algorithm, although there are many
ways to identify holes in the data and even to impute data into them or
otherwise handle it. And, we should emphasize, the data does not have to be
numeric. Machine learning algorithms can handle what are called factors, such
as practice group names, because essentially the algorithms turn those factors
into numbers.
Machine learning
algorithms do not develop concepts and abstract ideas or create anything new,
but they can extraordinarily rapidly discern patterns in data that humans
simply cannot possibly match. Text mining, by the way, manipulates words as
numbers and relies on statistics, probabilities, and similarity measures to
deliver insights into text documents.
Rather than passing on
the seeming complexity of the mathematics that may be unfamiliar to you, think
of it as an entrée into an exciting field of leading-edge technology, a
different way of thinking, and a set of powerful software tools. The old canard
about people going to law school because they were not good at mathematics
should not deter open-minded and strategically-thinking lawyers from
recognizing that what computers do is in its essence manipulate numbers, and
that the future of machine learning for the legal industry depends on clever
applications of mathematics.
No comments:
Post a Comment