Bonfire of the (ethical) vanities and the ‘AI tool explosion’: A few things you need to know (but perhaps were afraid to ask)
Warning, Warning, Warning! Everybody to get from street….
AI tools for creating content as well as writing and editing articles are all over the place. AI is constantly in the news: Open AI is now working on a model so powerful it alarms staff, able to even solve basic math problems it’s not seen before! You’ll need to have been hibernating, or in a coma, for the last 12-months to have missed out on all the hype surrounding ChatGPT and how it can help us poor struggling academics. If we don’t use it, we’ll lose time and fall far behind the career curve.
And there are many more than just ChatGPT
A veritable cornucopia of other tools supposed to help us researchers with our English writing, especially non-native speakers. A number of these tools were developed, and are now sold back to researchers, by the large ‘Author Services’ (hereafter ‘Editing’) companies like Cactus (Editage) (PaperPal), Enago (Trinka AI), WordVice (WordVice AI) and AJE (Springer Nature) (Curie). This is a very exciting time for researchers, especially if English is not your native language: AI tools can now be used to reduce - even completely remove - the English language barrier to publication success.
‘Editing’ companies know this well. In recent years we’ve seen a race to create, own, and operate AI tools specifically targeted at academics, hand-in-hand with a ‘race to the bottom’ to offer ‘human’ editing services at the cheapest possible prices. Companies which used to employ ‘native speaking PhD-level editors’ are now outsourcing this work more and more to regions in the world where workers are available much cheaper (e.g., India, the Philippines) (nothing against these countries, not at all: but this fact is never disclosed when you order an editing job). Outsourcing (or ‘off-shoring’ as it’s known in publishing) is understandable from a business perspective: Margins on editing, especially direct to researchers (B2C), have always been extremely tight (< 10%). The ‘AI tools explosion’ has therefore created huge opportunities for our friends at the ‘Editing’ companies.
Indeed, the marketing messaging is clear. ‘Use OUR TOOL because it’s been TRAINED specifically on academic research papers and is therefore better than generic English correcting software FOR YOUR ARTICLE’S SUCCESS’, and similar. Here’s an example, a message received yesterday from PaperPal (Editage) which said “don’t use Grammarly! Use our tool to help with your next paper”.
This ‘unique selling point’ (USP) is real by the way: The differentiator is that the AI tools operated by ‘Editing’ companies were actually trained using real research papers. But whose? Answer: Articles submitted to them as editing jobs. Your research, potentially. Did you realize that? Well no: Me neither. It’s never been made particularly clear, as we’ll see in a moment.
Fun Fact: ‘Editing’ companies have been using AI tools in their workflows for years. In fact, in most cases, the driver for their initial creation was not to help researchers per se, but rather to help speed up and streamline in-house editing processes. So that they could charge you the same to work on a paper but have their editors spend less time and therefore get paid less money, increasing those small margins. Genius.
In the same way, publishers have also been using AI tools for years as part of their workflows: Proof-reading, typesetting, peer reviewer selection, and general production, for example. Not to help researchers but to help themselves.
In other words, if you’ve submitted one of your research articles to an ‘Editing’ company over the last few years then it’s almost certain that your work will have been ‘edited’ (either at the start or at the end of the ‘human process’) by an AI tool. You would have been told this fact but nevertheless probably remained blissfully unaware of it: Notices about your IP and the presence of an automated step would have been buried in the fine print within those ‘terms and conditions’ that nobody reads.
Take Home Message: You most likely thought you were paying for ‘an expert edit’ but at least part of the process was in fact carried out by an AI tool, built to increase editing workflow efficiency and learning from your article. Because that’s how AI algorithms work: They learn and improve themselves based on real written language (turned into a USP, as mentioned earlier), in this case your research.
Is this fair? Is this ethical? Tools built and trained on researcher writing are now being sold back to those same researchers. Whatever, it’s a genius business model.
AI Tools Terms & Conditions
Let's take a moment to look at the ChatGPT (first) and PaperPal (second) ‘terms and conditions’ to illustrate this point:
(ChatGPT) “Your Content. You may provide input to the Services (‘Input’), and receive output generated and returned by the Services based on the Input (‘Output’). Input and Output are collectively “Content.” As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. OpenAI may use Content as necessary to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.”
If you carefully examine this definition, you’ll notice that OpenAI declares that it can use your content as it deems necessary to maintain its services, including complying with applicable laws and enforcing its policies. This is a handy catchall for them. Indeed, in a later portion of the terms, labeled section c, they mention this: “One of the main benefits of machine learning models is that they can be improved over time. To help OpenAI provide and maintain the Services, you agree and instruct that we may use Content to develop and improve the Services.” This is akin to the earlier discussed one-line caution that appears when you log into ChatGPT.
(Similar in PaperPal) “The Content presented on this Site (including but not limited to text, design, software, graphics, audio, video, HTML code, and data) is protected by copyright law, patent law, trademark law, and other applicable intellectual property laws and is the exclusive property of us. You agree to follow all instructions on this Site limiting the way you may use the Content. You affirm that you own the copyrights of the source files you provide us for availing the services or you affirm that you have the necessary authority to provide the source files to us for availing our services and in either case you authorize us to use the source files for rendering the services envisaged by you from us.”
It’s worth repeating this last part: “ .. you authorize us to use the source files for rendering the services envisaged by you from us.”. What’s ‘the service’ you ‘envisage’ from this ‘Editing company’? An edited paper, of course. Note that ChatGPT does offer users the option to ‘opt out’ of your work being added to their training model. We’ll be returning to the key legal concept of ‘Fair Use’ in a later blog in this series - a number of landmark legal cases are ongoing. Their outcomes will likely exert a huge impact on this ‘your work in, your work out’ AI training model in the future. In short, at the moment, the concept of ‘Fair Use’ allows these companies to get away with almost anything as long as the ‘derivative output’ isn't too similar to an actual article used in the training process.
The fact that most researchers (and others) remain unaware of how AI tools train and learn is illustrated by a recent cautionary tale from our own experience. We were chatting about the issue of IP and training with a marketing leader from a major publisher who expressed surprise that this was actually the case: (paraphrasing) ‘Shucks, we recently ran a marketing workshop involving the use of AI tools’. We then asked the well-known and very widely used tool in question to give us some notes on that particular publisher’s early career researcher marketing personas: What it spat out was almost word-for-word what the marketing team was using (and had developed based on lots of research). This information had presumably been fed into the tool during the recent workshop. It’s noteworthy that for exactly this reason a growing array of pharmaceutical and other IP-dependent companies now forbid their employees from using certain ‘your work in, your work out’ learning-based AI tools.
Conclusions
We are living in interesting AI times for sure. OpenAI is launching more and more GPTs, trainable assistants which can be embedded inside your accounts. ChatGPT (Copilot) is already integrated across the entire Office suite and even within Windows 11, while Claude 2 has become much smarter and was bought by the mega-powerful Amazon. X is launching Grok, the latest AI from Elon Musk and dozens and dozens of new AI solutions are popping up all the time across various fields. Almost everyday we come across cool online solutions for making videos, upscaling images, creating tools from data in minutes, and writing music with only our voices. The dawning AI era is all pretty cool: If only people would learn to get along with each other and treat each other ‘fairly’, these times would be so great. Come on ‘Editing Companies’!
(Full disclosure: No AI Tools were used - or harmed - during the writing of this article)