Amazon built an ecommerce empire by automating much of the work needed to move goods and pack orders in its warehouses. There is still plenty of work for humans in those vast facilities because some tasks are too complex for robots to do reliably—but a new robot called Sparrow could shift the balance that Amazon strikes between people and machines.
Sparrow is designed to pick out items piled in shelves or bins so they can be packed into orders for shipping to customers. That’s one of the most difficult tasks in warehouse robotics because there are so many different objects, each with different shapes, textures, and malleability, that can be piled up haphazardly. Sparrow takes on that challenge by using machine learning and cameras to identify objects piled in a bin and plan how to grab one using a custom gripper with several suction tubes. Amazon demonstrated Sparrow for the first time today at the company’s robotics manufacturing facility in Massachusetts.
Amazon is currently testing Sparrow at a facility in Texas where the robot is already sorting products for customer orders. The company says Sparrow can handle 65 percent of the more than 100 million items in its inventory. Tye Brady, chief technologist at Amazon Robotics, says that range is the most impressive thing about the robot. “No one has the inventory that Amazon has,” he says. Sparrow can grasp DVDs, socks, and stuffies, but still struggles with loose or complex packaging.
Making machines capable of picking a wide range of individual objects with close to the accuracy and speed of humans could transform the economics of ecommerce. A number of robotics companies, including Berkshire Grey, Righthand Robotics, and Locus Robotics, already sell systems capable of picking objects in warehouses. Startup Covariant specializes in having robots learn how to handle items it hasn’t seen before on the job. But matching the ability of humans to handle any object reliably, and at high speed, remains out of reach for robots. A human can typically pick about 100 items per hour in a warehouse. Brady declined to say how quickly Sparrow can pick items, saying that the robot is “learning all the time.”
Automating more work inside warehouses naturally leads to thoughts of the specter of robots displacing humans. So far, the relationship between robotics and human workers in workplaces has been more complex. For instance, Amazon has increased its workforce even as it has rolled out more automation, as its business has continued to grow. The company appears sensitive to the perception that robots can disadvantage humans. At the event today the company spotlighted employees who had gone from low-level jobs to more advanced ones. However, internal data obtained by Reveal has suggested Amazon workers at more automated facilities suffer more injuries because the pace of work is faster. The company has claimed that robotics and other technology makes its facilities safer.
When asked about worker replacement, Brady said the role of robots is misunderstood. “I don’t view it as replacing people,” he said. “It’s humans and machines working together—not humans versus machines—and if I can allow people to focus on higher level tasks, that’s the win.”
Robots have become notably more capable in recent years, although it can be difficult to distinguish hype from reality. While Elon Musk and others show off futuristic humanoid robots that are many years from being useful, Amazon has quietly gone about automating a large proportion of its operations. The ecommerce company says it now manufactures more industrial robots per year than any company in the world.
Use of industrial robots is growing steadily. In October, the International Federation of Robotics reported that companies around the world installed 517,385 new robots during 2021, a 31 percent increase year-on-year, and a new record for the industry. Many of those new machines are either mobile robots that wheel around factories and warehouses carrying goods or examples of the relatively new concept of “collaborative” robots that are designed to be safe to work alongside humans. Amazon this year introduced a collaborative robot of its own called Proteus, which ferries shelves stacked with products around a warehouse, avoiding human workers as it goes.
At its event today, Amazon also demonstrated a new delivery drone, called MK30, that is capable of carrying loads of up to 5 pounds. Amazon has been testing drone delivery in Lockeford, California, and College Station, Texas, and says the new, more efficient drone will go into service in 2024. The company also showcased a new electric delivery vehicle made by Rivian that includes custom safety systems for collision warning and automatic braking, as well as a system called Fleet Edge that gathers street-view footage and GPS data to improve delivery routing.
As more and more problems with AI have surfaced, including biases around race, gender, and age, many tech companies have installed “ethical AI” teams ostensibly dedicated to identifying and mitigating such issues.
Twitter’s META unit was more progressive than most in publishing details of problems with the company’s AI systems, and in allowing outside researchers to probe its algorithms for new issues.
Last year, after Twitter users noticed that a photo-cropping algorithm seemed to favor white faces when choosing how to trim images, Twitter took the unusual decision to let its META unit publish details of the bias it uncovered. The group also launched one of the first ever “bias bounty” contests, which let outside researchers test the algorithm for other problems. Last October, Chowdhury’s team also published details of unintentional political bias on Twitter, showing how right-leaning news sources were, in fact, promoted more than left-leaning ones.
Many outside researchers saw the layoffs as a blow, not just for Twitter but for efforts to improve AI. “What a tragedy,” Kate Starbird, an associate professor at the University of Washington who studies online disinformation, wrote on Twitter.
“The META team was one of the only good case studies of a tech company running an AI ethics group that interacts with the public and academia with substantial credibility,” says Ali Alkhatib, director of the Center for Applied Data Ethics at the University of San Francisco.
Alkhatib says Chowdhury is incredibly well thought of within the AI ethics community and her team did genuinely valuable work holding Big Tech to account. “There aren’t many corporate ethics teams worth taking seriously,” he says. “This was one of the ones whose work I taught in classes.”
Mark Riedl, a professor studying AI at Georgia Tech, says the algorithms that Twitter and other social media giants use have a huge impact on people’s lives, and need to be studied. “Whether META had any impact inside Twitter is hard to discern from the outside, but the promise was there,” he says.
Riedl adds that letting outsiders probe Twitter’s algorithms was an important step toward more transparency and understanding of issues around AI. “They were becoming a watchdog that could help the rest of us understand how AI was affecting us,” he says. “The researchers at META had outstanding credentials with long histories of studying AI for social good.”
As for Musk’s idea of open-sourcing the Twitter algorithm, the reality would be far more complicated. There are many different algorithms that affect the way information is surfaced, and it’s challenging to understand them without the real time data they are being fed in terms of tweets, views, and likes.
The idea that there is one algorithm with explicit political leaning might oversimplify a system that can harbor more insidious biases and problems. Uncovering these is precisely the kind of work that Twitter’s META group was doing. “There aren’t many groups that rigorously study their own algorithms’ biases and errors,” says Alkhatib at the University of San Francisco. “META did that.” And now, it doesn’t.
Some robot experts watching saw a project that appeared to be quickly getting up to speed. “There’s nothing fundamentally groundbreaking, but they are doing cool stuff,” says Stefanie Tellex, an assistant professor at Brown University.
Henrik Christensen, who researches robotics and AI at UC Davis, calls Tesla’s homegrown humanoid “a good initial design,” but adds that the company hasn’t shown evidence it can perform basic navigation, grasping, or manipulation. Jessy Grizzle, a professor at the University of Michigan’s robotics lab who works on legged robots, said that although still early, Tesla’s project appeared to be progressing well. “To go from a man in a suit to real hardware in 13 months is pretty incredible,” he says.
Grizzle says Tesla’s car-making experience and expertise in areas such as batteries and electric motors may help it advance robotic hardware. Musk claimed during the event that the robot would eventually cost around $20,000—an astonishing figure given the project’s ambition and significantly cheaper than any Tesla vehicle—but offered no timeframe for its launch.
Musk was also vague about who his customers would be, or which uses Tesla might find for a humanoid in its own operations. A robot capable of advanced manipulation could perhaps be important for manufacturing, taking on parts of car-making that have not been automated, such as feeding wires through a dashboard or carefully working with flexible plastic parts.
In an industry where profits are razor-thin and other companies are offering electric vehicles that compete with Tesla’s, any edge in manufacturing could prove crucial. But companies have been trying to automate these tasks for many years without much success. And a four-limbed design may not make much sense for such applications. Alexander Kernbaum, interim director of SRI Robotics, a research institute that has previously developed a humanoid robot, says it only really makes sense for robots to walk on legs in very complex environments. “A focus on legs is more of an indication that they are looking to capture people’s imaginations rather than solve real-world problems,” he says.
Grizzle and Christensen both say they will be watching future Tesla demonstrations for signs of progress, especially for evidence of the robot’s manipulation skills. Staying balanced on two legs while lifting and moving an object is natural for humans but challenging to engineer in machines. “When you don’t know the mass of an object, you have to stabilize your body plus whatever you’re holding as you carry it and move it, Grizzle says.
Wise will be watching, too, and despite being underwhelmed so far, he hopes the project doesn’t flounder like Google’s ill-fated robotic company acquiring spree back in 2013, which sucked many researchers into projects that never saw the light of day. The search giant’s splurge included two companies working on humanoids: Boston Dynamics, which it sold off in 2017, and Schaft, which it shut down in 2018. “These projects keep getting killed because, lo and behold, they wake up one day and they realize robotics is hard,” Wise says.
Now head of the nonprofit Distributed AI Research, Gebru hopes that going forward people focus on human welfare, not robot rights. Other AI ethicists have said that they’ll no longer discuss conscious or superintelligent AI at all.
“Quite a large gap exists between the current narrative of AI and what it can actually do,” says Giada Pistilli, an ethicist at Hugging Face, a startup focused on language models. “This narrative provokes fear, amazement, and excitement simultaneously, but it is mainly based on lies to sell products and take advantage of the hype.”
The consequence of speculation about sentient AI, she says, is an increased willingness to make claims based on subjective impression instead of scientific rigor and proof. It distracts from “countless ethical and social justice questions” that AI systems pose. While every researcher has the freedom to research what they want, she says, “I just fear that focusing on this subject makes us forget what is happening while looking at the moon.”
What Lemoire experienced is an example of what author and futurist David Brin has called the “robot empathy crisis.” At an AI conference in San Francisco in 2017, Brin predicted that in three to five years, people would claim AI systems were sentient and insist that they had rights. Back then, he thought those appeals would come from a virtual agent that took the appearance of a woman or child to maximize human empathic response, not “some guy at Google,” he says.
The LaMDA incident is part of a transition period, Brin says, where “we’re going to be more and more confused over the boundary between reality and science fiction.”
Brin based his 2017 prediction on advances in language models. He expects that the trend will lead to scams. If people were suckers for a chatbot as simple as ELIZA decades ago, he says, how hard will it be to persuade millions that an emulated person deserves protection or money?
“There’s a lot of snake oil out there, and mixed in with all the hype are genuine advancements,” Brin says. “Parsing our way through that stew is one of the challenges that we face.”
And as empathetic as LaMDA seemed, people who are amazed by large language models should consider the case of the cheeseburger stabbing, says Yejin Choi, a computer scientist at the University of Washington. A local news broadcast in the United States involved a teenager in Toledo, Ohio, stabbing his mother in the arm in a dispute over a cheeseburger. But the headline “Cheeseburger Stabbing” is vague. Knowing what occurred requires some common sense. Attempts to get OpenAI’s GPT-3 model to generate text using “Breaking news: Cheeseburger stabbing” produces words about a man getting stabbed with a cheeseburger in an altercation over ketchup, and a man being arrested after stabbing a cheeseburger.
Language models sometimes make mistakes because deciphering human language can require multiple forms of common-sense understanding. To document what large language models are capable of doing and where they can fall short, last month more than 400 researchers from 130 institutions contributed to a collection of more than 200 tasks known as BIG-Bench, or Beyond the Imitation Game. BIG-Bench includes some traditional language-model tests like reading comprehension, but also logical reasoning and common sense.
Researchers at the Allen Institute for AI’s MOSAIC project, which documents the common-sense reasoning abilities of AI models, contributed a task called Social-IQa. They asked language models—not including LaMDA—to answer questions that require social intelligence, like “Jordan wanted to tell Tracy a secret, so Jordan leaned towards Tracy. Why did Jordan do this?” The team found large language models achieved performance 20 to 30 percent less accurate than people.
For years, tech companies have relied on something called the Fitzpatrick scale to classify skin tones for their computer vision algorithms. Originally designed for dermatologists in the 1970s, the system comprises only six skin tones, a possible contributor to AI’s well-documented failures in identifying people of color. Now Google is beginning to incorporate a 10-skin tone standard across its products, called the Monk Skin Tone (MST) scale, from Google Search Images to Google Photos and beyond. The development has the potential to reduce bias in data sets used to train AI in everything from health care to content moderation.
Google first signaled plans to go beyond the Fitzpatrick scale last year. Internally, the project dates back to a summer 2020 effort by four Black women at Google to make AI “work better for people of color,” according to a Twitter thread from Xango Eyeé, a responsible AI product manager at the company. At today’s Google I/O conference, the company detailed how wide an impact the new system could have across its many products. Google will also open source the MST, meaning it could replace Fitzpatrick as the industry standard for evaluating the fairness of cameras and computer vision systems.
“Think anywhere there are images of people’s faces being used where we need to test the algorithm for fairness,” says Eyeé.
The Monk Skin Tone scale is named after Ellis Monk, a Harvard University sociologist who has spent decades researching colorism’s impact on the lives of Black people in the United States. Monk created the scale in 2019 and worked with Google engineers and researchers to incorporate it into the company’s product development.
“The reality is that life chances, opportunities, all these things are very much tied to your phenotypical makeup,” Monk said in prepared remarks in a video shown at I/O. “We can weed out these biases in our technology from a really early stage and make sure the technology we have works equally well across all skin tones. I think this is a huge step forward.”
An initial analysis by Monk and Google research scientists last year found that participants felt better represented by the MST than by the Fitzpatrick scale. In an FAQ published Wednesday, Google says that having more than 10 skin tones can add complexity without extra value, unlike industries like makeup, where companies like Rihanna’s Fenty Beauty offer more than 40 shades. Google is continuing work to validate the Monk Skin Tone scale in places like Brazil, India, Mexico, and Nigeria, according to a source familiar with the matter. Further details are expected soon in an academic research article.
The company will now expand its use of the MST. Google Images will offer an option to sort makeup-related search results by skin tone based on the scale, and filters for people with more melanin are coming to Google Photos later this month. Should Google adopt the 10-skin-tone scale across its product lines, it could have implications for fairly evaluating algorithms used in Google search results, Pixel smartphones, YouTube classification algorithms, Waymo self-driving cars, and more.