Corporate Bullshit Generator for Java

There used to be the Corporate Bullshit Generator, and after seeing how good it was and even Dilbert’s boss was using it, so reincarnate it in Java.

I’ve been flirting with the idea of taking a look at this for a long time. Markov chains are an interesting topic anyway, and if someone has worked for any of multinational companies, then you know exactly what the person who thinks of the concept of “corporate bullshit” is talking about.

ChatGPT takes our work ***, especially with generators like this, as we can even ask styles from the AI what we want the result to look like. Better results can also be achieved with other machine learning solutions, so it is not worth investing too many resources in the optimization written for the sake of it. So I also looked for an existing implementation, transpiled it and then refactored it.

Dilbert's pointy-haired boss

*** This article has not yet been generated with ChatGPT assistance.

CBSG for Java on github

The whole thing is essentially a fun practice, I didn’t follow the basic software development rules too strictly. Nevertheless, I set some expectations for myself:

  • zero dependency - do not use any 3rd party libraries
  • it should be old standard - Java8 so that anyone can include it in any project
  • dictionary organized into separate files
  • the dictionary should be interchangeable
  • it should be easy to use, either from the console or as a library

Let’s imagine the case when the customer sends an Excel file to write down when it will be ready. In such cases, it may be worthwhile to work with a similar method, but in our case it only serves hobby purposes.

Developer notes

1. Transpile

I chose this particular implementation because Python code is easy-to-read, and its code is quite simple. There are more transpilers, but most of them are not free. It worked more or less, but it messed up the indexing, and in some cases “replace all” led to faster results.

The result was a nice long generator that operated with static methods throughout, but after a lot of hammering it started to work.

2. Outsourcing of dictionaries

In the Markov chains article already referred to above, it is explained in detail why and how it works. But in the transpiled code, there were a lot of Arrays and Maps, just where they were needed, embedded in the middle of the codebase. This should be made a little more clean-coded, but still in such a way that the meaningfully named methods remain.

The greatest value of such a generator is its dictionary. In many cases, we just choose random words. Does it matter if “manager”, “evangelist”, “leader” or “mastermind” was used in the sentence? Of course not, just use any of them!

Thus, the file loading utility from the classpath was organized into the CbsgResourceUtil class. The whole thing has already become much shorter and more understandable. By convention, filenames starting with “word_” refer to the plain dictionary form, and I defined the filename-keys as constants.

The next step was to outsource the weighted expressions. We want to determine how often certain terms occur. Here, the convention is Map-type information stored by filenames starting with “senw_”. Not a real CSV format, just something like that: a positive integer representing the weight, followed by a comma, then everything else will be the body expression.

If there is nothing after the comma, it represents the default branch. In many cases, this could mean a special follow-up call.

The code thus started to be quite manageable.

3. Templating

Until now, simple refactoring that could be solved trivially have taken place, so I was able to avoid dealing with the more difficult cases. But here I ran into several cross-references, which, although it is possible to solve, cause quite a lot of confusion.

In addition, I wanted to preserve the relatively transparent methods of the original source, which is relatively easy to follow. For example, person() returns the result of the boss() method in defined probability.

Patrick Bateman can tell you how important the title is in the serious corporate hierarchy. Accordingly, the “boss” generator may have been chiseled too much.

This is where various template engine solutions, such as Jamal or Apache FreeMarker would come into the picture. But we made it clear at the beginning that we don’t want any dependencies. So a very primitive word processor was created. Here, we essentially substitute variables defined in the templateFunction() method according to the call, which we identify using a simple regular expression.

Itt jönnének a képbe a különféle template engine megoldások, például a Jamal vagy Apache FreeMarker. De az elején leszögeztük, hogy nem akarunk függőséget. Így aztán egy nagyon primitív szövegfeldolgozó készült. Itt lényegében templateFunction() metódusban definiált változókat helyettesítjük be a meghívásnak megfelelően, amit sima regexp segítségével azonosítunk.

In this way, it is possible to define sentence fragments like this:

1,there can be no $growthAtom until we can achieve $addIndefiniteArticleGrowth

This can be solved much more elegantly with dynamic languages, but this is Java, with all its advantages and disadvantages.

There is still an existing problem: template can use a key which refers itself. That is of course circular reference, i.e. an infinite loop, which will result a beautiful StackOverflowError. This is a known issue, and other template engines can produce the same behaviour.

4. Usability changes

The original, static implementation has been changed. Resource file management part has been moved to the CbsgResourceUtil class. Instead of the previous constant access, dictionary definitions started to be stored dynamically in Properties, for which the keys are defined by CbsgDictionaryKey.

The CbsgGenerator class includes the logic for handling incoming arguments and dynamic dictionary loading. There are widely-used argument handling libraries, but we wamt zero-dependencies, so just Java basic features are used. Default values are located in ‘dict/en/dictionary.properties’.

The CbsgCore class contains the essential implementation, which I explained in more detail above.

During development I used var, Map.of() and other newer features, but all of them were removed, because our pre-defined requirements.

Conclusion

The set goal was achieved, a small, simple application was created with zero dependencies and maintaining Java8 compatibility. Dictionaries could be expanded, or even to add a completely different context. Enjoy it, corporate people!