Auto-Generate UML Actors & Use Cases Models from Business Requirements, Vision & Scope, Business Case, Project Charters and more, using Python & Natural Language Processing: Part 1.1

3 min readJan 12, 2023

Further to part 1 (that you can find here Auto-Generate UML Actors & Use Cases Models from — Business Requirements, Vision & Scope, Business Case, Project Charters and more — using Python & Natural Language Processing: Part 1 | by Shalabh Bhatnagar | Dec, 2022 | Medium), here is a little piece of implementation that some have asked my help on. In that they want to extract all the requirements — usually numbered or sub-numbered.

I think it is a useful addition to this series so I thank you one and all for making my lazy bones work on it.

As I said in part 1, the following documents contain a lot of info when it comes to the end-user needs or requirements. Frequently, requirements are found in & as text prefixed by numbers and also nest into sub-sections that themselves are numbered (a section or sub-section wouldn’t be numbered, it if was not as a key concept or topic in the domain space). It makes sense to pull out such requirements.

1. Vision & Scope

2. Project Charter

3. Business Case

4. User Requirements

5. Business Requirements

6. Product Requirements

7. Software Requirements Specification

From Numbered Requirements

· We can find Actors and Use Cases faster.

· Code becomes efficient.

Applies To

· Any text document. And what makes it great is the power RegEx has. Just 4 characters is all it takes to find heaps of numbered text. RegEx, as we know, offers billions of possibilities.

Input

· Text versions of any of the above documents that contain numbered text lines. I have used the text in this paper as input.

· Text lines and they must start with the number. For example, “9. Quick brown fox jumped over the silly lazy dog.” prefixes 9 and a valid input for the search expression. My feel is that it should work in any combination. Please holler if it doesn’t.

Output

· Numbered lines of text with any of the suffices– new line, space.

Benefits

· Send them all to the greedy testers that are itching to write their test cases (and harass you).

· You can quickly create Traceability Matrix and leave some fluff behind.

· You can summarize a document.

· You can make a file faster for a tool such as JIRA or Confluence or any other tools you use.

· Unstructured text becomes structured.

When to Use

· Your call!

Code

import os
import re
import pandas as pd

cwd = os.getcwd() + "\\"
filename = str(__file__).split("\\")[-1].split(".")[0]
# Text converted to lower case for standardization
raw_text = open(cwd + "some.txt", "r").read().lower()
# List to help store the outcome in a CSV file
overall_matches = []
# Regex criterion
find_criteria = ['\d.+']
for to_find in find_criteria:
    print(f"\nFinding '{to_find}'...")
    pattern = re.compile(to_find)
    matches = pattern.finditer(raw_text)
    for match in matches:
        print(raw_text[match.start(): match.end()])
        overall_matches.append(raw_text[match.start(): match.end()])
schema = { "overall_matches": overall_matches }
df = pd.DataFrame(schema)
df.to_csv(cwd + filename + ".csv", index_label="serial")