Natural language text and image classifiers: Train your feeds with plain English

Lots of possibilities! I guess my question is similar to or a restatement of the one above: could one set up just 1-2 very detailed (long) classifiers? Is there a limit on how big that chunk of text (“screenshots of UIs”) can be?