My name is Maizorig Janchivdorj, and in 2018 I started a legaltech company named Codelex Legaltech. Aimed at streamlining, augmenting and enhancing legal practice, we’ve learned a lot in a short space of time - as all young companies do, and we have developed and brought to market a number of products in quite quick succession:
- eLawyer - a practice management system for SME law firms in Mongolia;
- Mongolia’s first robot lawyer, ‘iGeree’, including an NLP trained chatbot and contract automation service in the employment/labor law space, in conjunction with law firm MDS KhanLex; and
A new project
We are now at the start of another Codelex project which has been gradually germinating, named simply ‘Codelex’, and aimed at augmenting legal practice via application of AI and advanced technology.
We envision that proof of concept will emerge later this year, followed with concept validation throughout 2021. We are planning to document aspects of the journey by members of the Codelex team – updates will be posted to social media channels along the way so please do follow and join us on the journey!
This first stage, simply dubbed ‘Phase 1’, of the Codelex project will be followed by several other sub projects, based on learnings and insight from Phase 1.
Before explaining what this is all about and what we are aiming to achieve at this initial stage, let me give you a glance into where the concept came from.
Apologies also at the outset, this blog-post has been written with a mixture of audiences in mind: familiar/unfamiliar; technical and non-technical, and as such it will jump between basic discussion, simplification of more advanced items, and external reference points. Please bear with me!
I moved into the private practice of law in 2011, following 10 years in Mongolia’s public sector with the Ministry of Justice and the state oversight department.
Since entering private practice, I’ve served many clients: collaborating on and solving their legal issues and needs - mostly concerned with project finance, various corporate matters, and local and international disputes (both in court and through arbitration).
The longest-running case I have been involved in has stretched over more than 8 years. It was initiated on 12th of April, 2012, with urgent client requests leading to a series of demands to the Defendant; attempts at conciliation; numerous court proceedings in Mongolia; followed with local arbitration and later international arbitration proceeding under the ICC Rules of Arbitration; and even here now in 2020 I am still waiting for the final award to be rendered.
In this sort of situation, you know the case, you know the facts and you know the applicable legal rules. So naturally you expect a certain award (outcome) is likely to be rendered.
However, when you get deeply involved with a single case for such a prolonged amount of time, you become closely tied to it and one question will always weigh on your mind: “What will be the final decision?”
If you chance to let your mind wander, you can speculate:
“What if lawyers had an oracle, that tells them what might or would happen in the future with your cases?”
And even more, if that oracle could tell you ‘what to do’ and ‘what not to do’ in order to win the case? Wouldn’t that be fantastic to have one of these? And is there anything such in existence?
The experience oracle
Over time, I pondered further on this theme, and came to the conclusion that the answer is ‘Yes’. Conceptually, there is just such an oracle; it exists within each lawyer, and it’s generally known as “experience”. Experience is the first-person effects or influence of an event or subject gained through involvement in or exposure to it. This oracle just does not instantly appear, rather it emerges gradually as you gear up with academic knowledge, and steadily grows as you encounter various types of legal problem (tasks) and learn to solve them (solutions).
This sort of problem-solving exercise and experience is not only referring to book knowledge, but also includes knowledge that is gained through hands-on experience, known in philosophy as ‘empirical’ or ‘a posteriori’ knowledge.
As this oracle develops over time, it becomes more accurate (as probability of missing a prediction decreases), then it evolves further to a stage of being actively recognized, which we would commonly recognize as “expertise”. People refer to those lawyers geared up with such a matured oracle ‘an expert’, an ‘experienced lawyer’ or simply a ‘good lawyer’. However, it does not come in one size or shape, and of course each individual lawyer’s oracle is weighted towards their own background and experience.
The next question follows:
“Is it possible to, and how might we find a ‘one size fits all’ oracle that would give the same or similar power to each lawyer?”
Our individual and shared experience with legaltech projects starts to feed in as we think about potential solutions.
The sharp end of current technology is so advanced that it is leading replication of aspects of human intelligence. ‘Artificial Intelligence’ or ‘AI’ is now a familiar term to most, regardless of the accuracy of popular catchphrase usage.
With the availability of a vast amount of historical ‘big data’; Moore’s Law doubling computational power every two years; and rising GPU power, an increasing number of scientists, analysts, data-scientists and others are working in this field – seeking in effect to create an artificial brain that can mimic the function of the biological brain by applying information theory, statistical theory and decision theory in one bundle. Early approaches refer to Machine Learning, however this in turn is evolving into the more complex approach known as a Deep Structured Neural Network, or ‘Deep’, Learning, enabling mimicking of human neurons.
Figure 1.1 Human neuron Figure 1.2 Machine neuron (Perceptron)
Deep Learning pipeline: Building a Deep Learning Model with TensorFlow, Hisham El-Amir, Mahmoud Hamdy, 2020
If current technology can mimic (to some extent) one of the most complex areas of study known to humans, neuroscience, then would it be able to mimic a process that gathers legal “experience” through the human learning process, the oracle that we are in search of?
How to build ‘experience’?
To address this question, let’s start by looking, at high level, at the process of gaining ‘experience’; re-predicting court decisions in real life.
Chart 1. Simple process of predicting court decision outcome on the basis of experience. MDS&KhanLex LLP, 2020
If we further examine a process of reviewing a court decision that might be the same or similar to a case presented, then the <review and analyze> process flow could be demonstrated as follows:
Chart 2. Flowchart for Searching court decision, MDSKhanlex 2020
On the basis of this process flow, we can map out a simple algorithm along the lines of the below. For clarity, an ‘algorithm’ is a “finite sequence of well-defined, computer-implementable instructions, typically to solve a class of problem or to perform a computation”.
Algorithm for searching relevant court decision. CodeLex, 2020
If a search return of a court database shows more than one court decision that could be relevant to a case, then the analyzing process of a single court decision is then applied to all court decisions found, further clustered by sentiment; and analyzed by commonality with same-sentiment court decisions.
Figure 3. Clustering of numerous court decision by relevance and sentiment, Codelex 2020
So, real-life processes of predicting future court decisions can be modeled in a number of ways, and as such can also be modelled into machine-understandable processes and algorithms. Theoretically, we could build an ‘Oracle’ for lawyers.
In order to develop this oracle, there are obviously many technical challenges - such as making test data readable and understandable by machine as it would have been understood by ordinary trained lawyers. This in turn involves the computer science fields of NLP (Natural Language Processing) and NLU (Natural Language Understanding). For those with an interest, this blog post from 2018 (for example) takes a brief look at the definitions, interrelation and differences between NLP and NLU.
Developing a reliable and accurate NLP engine is both challenging and time-consuming, but the advancement of Deep Learning presents us an opportunity to overcome a number of associated burdens.
Figure 4.1. Classic NLP Figure 4.2. Deep Learning based NLP
Figure 4. The 7 NLP Techniques That Will Change How You Communicate in the Future, James Le
The bidirectional, self-supervised, and transformer deep learning NLP models which have already shown promising results in more than 100 different natural languages could be used to help develop our legal oracle. The advancement of this model is not dependent on the complexity of any natural language; performance is primarily dependent on training data.
There are a number of initiatives underway around the world in pursuit of dispute and court decision prediction; the area we are most interested in, and a key task in developing our ‘oracle’ is that of prediction based on identification and analysis of sentiment.
“How might we identify and analyze court decision sentiment, in order to help predict future court decisions?”
To identify court decisions, we need to not only look at the interplay of text and ideas within a decision, but to also look at aspects of sentiment analysis. Sentiment analysis is part of what is known as affective computing – an area of study that broadly combines computer science, psychology and cognitive science in pursuit of machine-driven recognition of and adaptive reaction to emotive-linked data. Affective computing and sentiment analysis projects and application are heavily visible in the marketing and customer services fields in relation to, for example, customer experience research; advertising; media monitoring; customer chatbots, and so forth. Currently, Deep learning appears to be the most suitable analysis approach in this area.
Figure 5. Kim’s model architecture with two channels for an example sentence. Deep Dive into Sentiment Analysis - AI Zone
In the case of court decisions, we know about the affective stage of a final judgement. However, if we can further identify and harness ‘subjective information’ from court findings (which is mostly contained, for example, in a “reasoning” section of any given court decision of Mongolian courts) and identify correlation between such subjective information and affective stage of the decision, then we can start to predict what “subjective information” needs to be contained in a statement of claim in order to increase success in court and arbitration hearings.
So, what has happened so far, up to April 2020?
A few things! One item of interest relates to the preparation of legal training data:
We have initially sampled approximately 800 court decisions of Mongolian courts pertaining to the validity of an agreement, of an available 219,000 civil court decisions. These decisions were obtained from www.shuukh.mn, a public Mongolian court database. (Because of uncertainty over the crawling policy, we elected to collect those decisions manually using existing filters and search tools offered in the database).
We wrote a script to convert all pdf files to txt format and have in turn started to normalize text data. The normalized text data will be able to be cleaned and converted into numeric representation, and will also enable us to clean redundant information which would lead to poor accuracy.
Figure 6. Data preparation - Converting pdf files to text files, Codelex 2020
Based on the information extraction model we developed, we are classifying text of each court decision accordingly.
Figure 7. Proposed model for Information extraction from court decision
This process has been undertaken manually by lawyers and students, and the outputs from which will be applied in a later stage of supervising training of legal data
Meanwhile, we have also started developing a deep learning based NLP model, which we will be applying to the project.Additions!
I should also mention that we are very pleased to welcome additions to the CodeLex team: NLP expert Enkhbayar Sanduijav (AND Systems, Hitatchi Solutions) who received his master’s degree from Kyoto University with research on Mongolian phrase generation and morphological analysis based on phonological and morphological constraints (a paper of his on this same topic in the Journal of Natural Language Processing can be found here); and AI/Deep Learning specialist Dr Enkhtogtokh Togootogtokh whose extensive experience with AI and deep learning ranges from AI unmanned drone projects for the Singapore and Italian governments to AI-driven racism recognition for German football (soccer) clubs. (For those with a technical bent and broad interest, a few published papers can be found here, here and here).
Figure 8. Data extraction for Deep NLP, Codelex 2020
And so, what next?
Well, we have a number of tasks ahead for Phase 1 of our project to more fully take shape; most of the following are either underway as mentioned, in preparatory stages, or in planning:
- Preparation of legal training data
- Developing of Mongolian deep NLP for legal data
- Training of legal data
- Processing, including:
* NER (Named Entity Recognition);
* Semantic analysis;
* Sentiment analysis.
- Other data science related tasks.
Thank you for taking the time to read through this rather long initial post, and for making it to the end!
Members of the Codelex team) will be posting updates along the way as we progress – and I look forward to having you join us on the journey!
Maizorig Janchivdorj founder of "Codelex Legaltech"