Part 3 (Final) — Auto-Generate UML Actors & Use Cases Models from Business Requirements, Vision & Scope, Business Case, Project Charters and more, using Python & Natural Language Processing

With Networkx, Pandas, NLTK and RegEx

7 min readFeb 10, 2023

Whether it is a client-facing role or an internal enterprise one; as a project manager, a business analyst, an architect or a test manager, you deal with documentation all the time. Reading these documents is critical. Comprehending them fast is the key.

In the last decade, the Agile practices have taken a center-stage. This is evident in the adoption and the demand for professionals competent in this area. A key part of their role is writing the Use Cases, the stories or an equivalent. Documents are written by different people and in varying styles that frequently create confusion. You wouldn’t remember a day when at least one long meeting was not attended by the full project team reviewing the intent and content of the documents.

A bigger issue is converting client inputs and material rapidly into software project artifacts to expedite the project or product development life cycle. Clients want their product in their hands, fast!

All projects generate requirements of some kind. It is immaterial whether these requirements are explicit or implicit, stated or not. Projects always generate documents that contain requirements. Some typical industry artifacts where requirements sit:

Different documents that contain requirements or user needs

You could argue that not all of the above documents are equal. They are not.

You create some of these well before you initiate a project. For example, a Project Charter or a Business Case, amongst the first to be submitted in order to get sponsorship, while the Business Requirements Documents (BRD), come in after your project in or has passed the Discovery phase.

Regardless of the documents you write or get, the contemporary development methodologies require these to be written as Use Case models or user stories. In the last decade, these two have become popular. Chances are you are already using them in your projects!

Yet, clients are used to seeing, writing and reviewing the requirements in a free-flowing simple text ‘traditional’ documents. This is a fact for the business users that don’t necessarily come from the IT world.

Unfortunately, the projects today hinge on Use Case models or the user stories. These bring value that software teams looking for when conceiving a Minimum Viable Product, a Prototype or an Increment. Use Cases surface the important interests, expectations and express them as dialog that reflects and resembles the ways in which your clients interact with the future system or state.

This is flummoxing. It creates several complications:

1. Clients can only write “simple text” the to express their needs whereas software teams need Use Cases or user stories.

2. Clients (assume) that they have expressed clearly and unambiguously. Clients are surprised why a certain section of requirements or documents is not making sense to the readers!

3. Relationships and associations of concepts are embedded in the document, emanating from the minds of the clients, yet they may not be apparent the readers.

4. Client has presumably given all the inputs, yet the project teams spend enormous time reading and converting these into Use Cases or user stories.

So!

I present an implementation that takes any document converted to text format as an input and generates following for you:

· Actors

· Use Cases

· Renders Actor diagram

· Renders Use Case diagram

Why Actors & Use Cases are important?

· The human Actors do the UAT for you! So, missing these Actors is never a good idea. Many of these are your direct stakeholders and approvers. Real clients, real users!

· Non-human Actors are even more important. These range everything from hardware interfaces to components to nodes and more! They become part of manual and automation test cases and also sit in the non-functional requirement space.

· Use Cases are what clients “really” want for these are direct dialogs between them and the application they are conceiving.

Why Generate Models?

For requirement documents confined to 30–60 odd pages, the process of reading, and marking Actors or Use Cases may not be much of a problem. When you do a large project, where average document has 100 pages, or have hundreds of components each of which are written in as many pages, that it becomes daunting.

Quickly identifying the Actors and Use Cases is crucial. This is what I focus in this implementation.

This implementation is adequate to extract a first cut of Actors and Use Cases. Once rendering is done, you can export the images for any of your modelling or diagramming tools.

About Packages and their Use

I used following Python packages:

The famous regular expression package to search in free-flowing text. Nothing comes close to re. Try any other approach and I can tell you it will not work. You will end up reinventing the

The re package specifically searches strings in the document in a particular industry standard nomenclature. In that this is industry standard way of writing and no reason for us to look in any other direction. For example, software requirements are written as “The system shall show an OK button on the login window that will trigger authentication validation.”. Note the use of the “shall” which adds the flavor and feel of the practice being followed by most requirements writers.

matplotlib

To plot, of course. This is the core package that sits in the front or the back of almost all the plotting functionalities that you see in one or the other plotting packages.

tkinter

Yes, I still use tkinter 😊.

I generate a simple window with buttons and a list box. The best part is it is lightweight (and reminds of old school days). One button shows network of Actors and the other one shows the Use Cases.

nltk

The all too powerful package to do crunch and munch anything to with text. There are hundreds of options in the market — many cloud providers and dictionary makers now offer endpoints that you can connect with your app to parse and dissect text. No need. NLTK rocks!

I used nltk for tokenization, lemmatization and to generate Parts of Speech.

pandas

Can we ever do super speedy data-processing operations with anything but Pandas? No way.

I used Pandas to create data structures, schema and then generate the CSV files.

networkx

This package helps generate, change or alter, and study: structural and dynamic elements of arbitrary networks. One lifetime is inadequate to use this package or to comprehend full algebra of networks. Salute to the creators! (You can depict of networks such as Facebook etc.) Since we are used to seeing software artifacts in and as diagrams, I have implemented diagrams using the networkx library. This library is actually meant for Graph analysis and depictions. I give you a baseline from which you can render diagrams.

About Code

Please read the inline comments. They reveal the full workings and help you make sense of the implementation. Some parts are advanced, but most are not. I have stayed away from too much brevity, so readability is not compromised. Finally, I did not implement exception handlers. Leave it your good judgement. Drop a note to me if you need more description or discover inadvertent errors.

Before you start, don’t forget to install all the packages using pip or Anaconda.

High Level Flow of Code

Note

Most project documents are often saved in .docx or HTML formats. Please take time to save these in the text format before you use the code. I excluded the conversion, given the format variations.

Input

· A document in text format written in the industry convention of “shall” clauses. You can modify the code to reflect your practices and nomenclatures.

· One file at a time.