Hello and welcome to the debut issue of TechScape, the Guardian’s newsletter on all things tech, and sometimes things not-tech if they’re interesting enough. I can’t tell you how excited I am to have you here with me, and I hope between us we can build not just a newsletter, but a news community.
Sometimes there’s a story that just sums up all the hopes and fears of its entire field. Here’s one.
GitHub is a platform that lets developers collaborate on coding with colleagues, friends and strangers around the world, and host the results. Owned by Microsoft since 2018, the site is the largest host of source code in the world, and a crucial part of many companies’ digital infrastructure.
Late last month, GitHub launched a new AI tool, called Copilot. Here’s how chief executive Nat Friedman described it:
A new AI pair programmer that helps you write better code. It helps you quickly discover alternative ways to solve problems, write tests, and explore new APIs without having to tediously tailor a search for answers on the internet. As you type, it adapts to the way you write code – to help you complete your work faster.
In other words, Copilot will sit on your computer and do a chunk of your coding work for you. There’s a long-running joke in the coding community that a substantial portion of the actual work of programming is searching online for people who’ve solved the same problems as you, and copying their code into your program. Well, now there’s an AI that will do that part for you.
And the stunning thing about Copilot is that, for a whole host of common problems … it works. Programmers I have spoken to say it is as stunning as the first time text from GPT-3 began popping up on the web. You may remember that, it’s the superpowerful text-generation AI that writes paragraphs like:
The mission for this op-ed is perfectly clear. I am to convince as many human beings as possible not to be afraid of me. Stephen Hawking has warned that AI could “spell the end of the human race”. I am here to convince you not to worry. Artificial intelligence will not destroy humans. Believe me.
It’s tempting, when imagining how tech will change the world, to think of the future as one where humans are basically unnecessary. As AI systems manage to tackle increasingly complex domains, with increasing competence, it’s easy enough to think of them as being able to achieve everything a person can, leaving the human that used to be employed doing the same thing with idle hands.
Whether that is a nightmare or a utopia, of course, depends on how you think society would adapt to such a change. Would huge numbers of people be freed to live a life of leisure, supported by the AIs that do their jobs in their stead? Or would they instead find themselves unemployed and unemployable, with their former managers reaping the rewards of the increased productivity an hour worked?
But it’s not always the case that AI is here to replace us. Instead, more and more fields are exploring the possibility of using the technology to work alongside people, extending their abilities, and taking the drudge work from their jobs while leaving them to handle the things that a human does best.
The concept’s come to be called a “centaur” – because it leads to a hybrid worker who has an AI back half and human front. It’s not as futuristic as it sounds: anyone who’s used autocorrect on an iPhone has, in effect, teamed up with an AI to offload the laborious task of typing correctly.
Often, centaurs can come close to the dystopian vision. Amazon’s warehouse employees, for instance, have been gradually pushed along a very similar path as the company seeks to eke out every efficiency improvement possible. The humans are guided, tracked and assessed throughout the working day, ensuring that they always take the optimal route through the warehouse, pick exactly the right items, and do so at a consistent rate high enough to let the company turn a healthy profit. They’re still employed to do things that only humans can offer – but in this case, that’s “working hands and a low maintenance bill”.
But in other fields, centaurs are already proving their worth. The world of competitive chess has, for years, had a special format for such hybrid players: humans working with the assistance of a chess computer. And, generally, the pairs play better than either would on their own: the computer avoids stupid errors, plays without getting tired, and presents a list of high-value options to the human player, who’s able to inject a dose of unpredictability and lateral thinking into the game.
That’s the future GitHub hopes Copilot will be able to introduce. Programmers who use it can stop worrying about simple, welldocumented tasks, like how to send a valid request to Twitter’s API, or how to pull the time in hours and minutes from a system clock, and start focusing their effort on the work that no one else has done.
The reason why Copilot is fascinating to me isn’t just the positive potential, though. It’s also that, in one release, the company seems to have fallen into every single trap plaguing the broader AI sector.
Copilot was trained on public data from Github’s own platform. That means all of that source code, from hundreds of millions of developers around the world, was used to teach it how to write code based on user prompts.
That’s great if the problem is a simple programming task. It’s less good if the prompt for autocomplete is, say, secret credentials that you use to sign into user account. And yet:
GitHubCopilot gave me a [Airbnb] link with a key that still works (and stops working when changing it).
The AI is leaking [sendgrid] API keys that are valid and still functional.
The vast majority of what we call AI today isn’t coded but trained: you give it a great pile of stuff, and tell it to work out for itself the relationships between that stuff. With the vast sum of code available in Github’s repository, there are plenty of examples for Copilot to learn what code that checks the time looks like. But there are also plenty of examples for Copilot to learn what an API key accidentally uploaded in public looks like – and to then share it onwards.
Passwords and keys are obviously the worst examples of this sort of leakage, but they point to the underlying concern about a lot of AI technology: is it actually creating things, or is it simply remixing work already done by other humans? And if the latter, should those humans get a say in how their work is used?
On that latter question, GitHub’s answer is a forceful no. “Training machine learning models on publicly available data is considered fair use across the machine learning community,” the company says in an FAQ.
Originally, the company made the much softer claim that doing so was merely “common practice”. But the page was updated after coders around the world complained that GitHub was violating their copyright. Intriguingly, the biggest opposition came not from private companies concerned that their work may have been reused, but from developers in the open-source community, who deliberately build in public to let their work be built upon in turn. Those developers often rely on copyright to ensure that people who use open-source code have to publish what they create – something GitHub didn’t do.
GitHub is probably right on the law, according to legal professor James Grimmelmann. But the company isn’t going to be the last to reveal a groundbreaking new AI tool and then face awkward questions over whether it actually has the rights to the data used to train it.