This week Shreyans Jain, Senior Member of Technical Staff at Salesforce, joins me to discuss his passion for technological equality. This drives him to continue to work on and innovate on the Salesforce Platform and other technologies on accessibility.  Together we talk about the use of various AI technology and APIs to achieve the goal of broader equality across the internet and platforms.

Show Highlights:

  • Shreyans’ experience and passion for making information accessible through a variety of modes (other than visual) and some of his research on audio versus spatial language
  • Using the Einstein Vision AI as inspiration to create a Chrome extension which converts images into text, even when alternative text for an image does not exist
  • Shreyans’ quip voice assistant and its ability to interact with your documents and some ways you can extend the assistant by using the quip API platform or Quip Live Apps platform

Resources:

Shout Outs:

Episode Transcript

00:04

Shreyans: And I was really impressed that equality is one of them and that it really encourages the employees to donate 1% of the product time for the communities that they live in. So this particular aspect really made me really interested in it aligned with my own personal value.

00:27

Josh: That is Shreyans James, a senior member of technical staff here at Salesforce, who helps us visualize the data coming from our data center. I’m Josh Birk, a developer evangelist at Salesforce. And here on the Salesforce developer podcast, you’ll hear stories and insights from developers for developers. Today, we aren’t going to focus on Shrey’s day job, even though he has done some neat Web GL 3D representations of our data centers around the globe. Instead, we’re going to touch on Shrey was just mentioning a passion for equality that drives raid innovate with Salesforce accessibility related projects. And this isn’t new to him. He did this kind Research back when he was getting his master’s.

01:02

Shreyans: Yeah, so I worked in a lab when I was in grad school. And the focus of this lab was to kind of explore other modes on how you can convey information. So we’re talking about modes other than vision. So for example, we had projects that focused on how you can provide information to someone using just audio. And that could be spatial audio or 3D audio or spatial language, which is, you know, describing objects in terms of like three o’clock, six feet, there’s a chair. The other aspect was, was tactile mode. So how you can learn about new environments, using just touch. So the focus obviously, was to make the world more accessible to blind and low iPad users…

02:01

Josh: now Shrey describes lab that he worked at at something that had a lot of really neat toys and some advanced hardware. But beckoning back to the episode we had with Qingqing. Sometimes if you’re trying to get a high level of adoption to your technology, well, you go for the technology people have already in their hand and I mean that literally in the form of a smartphone.

02:20

Shreyans: So I was actually involved in a number of different projects. One of the main projects that I think I contributed on was the issue audio versus spatial language research. So I was dealing with can be used audio as an alternative interface to convey information. Now, this can be useful. This research is useful not only for blind users, but also for sighted users. Imagine you’re at a busy airport and you’re wearing your headphones when you are really want to go to a terminal now and you can open up your phone, you can Kind of with the indoor navigation technology that we have available on on airports, it would actually guide you. Similar to how a GPS guides you in our current markets. However, imagine that if you have to hold the phone and you have to keep pace of you know with your surroundings and make sure that you’re not need to not bump into someone. So in those those areas, spatial audio can be really useful because you can just put on your headphones and let the voice guide you.

03:33

Josh: Okay, just to make a little medium sized story slightly shorter. Shrey gets his master’s degree, he goes and works for Gallup, the polling company and he’s doing data visualization for them. He hears about Salesforce. It really likes our messaging when it comes to things like quality and volunteering time and applies and gets a job here. But our narrative is actually picking up with his relationship with our previous episode with Rodenback, who if you heard that episode, you know, is extremely passionate about accessibility technology himself in an accessibility specialist here at Salesforce.

04:10

Shreyans: So Adam was actually working in a different building. But I actually came to know about him through my manager, because my manager knew about my side projects and my interests. And he told me that you should probably meet Adam, I think you have, you would have a lot to talk about. And so I reached out to him, and then we went out for lunch, and I kind of told him about my research. And Adam is a really, really interesting person and reading person. So I think we connected on with levels and through these, you know, one of those conversations, actually, we got the idea of developing something that can at least try to fix that.

04:52

Josh: Now obviously, for an episode that’s going to be less than 20 minutes to Shrey and Adam here are not talking about trying to fix all the things wrong with the world. We’re talking very specifically about accessibility issues. And if you go back to Adams episode, you’ll know that if you do things in HTML that aren’t friendly to a screen reader, it can really disrupt the experience for a blind person. And one of the easiest things that people can do is add all tags to images. And if they don’t do that, then the screen reader is running without the information to him to operate properly. And so Adam and Shrey had had a fairly straightforward idea, why not use a little artificial intelligence…

05:31

Shreyans: Einstein Vision came out and again Einstein Vision is the technology, the artificial intelligence technology, that since force for white, which can take an image as an input and can give some label to that image as an output. So for example, if you give an input as a picture of a cake, it would convert that image into some text. So that is technology. And I thought that well, you know, this is exact doing exactly what I want the web to be doing or that the platform to be doing that, which since we have such a great, you have made such a huge improvement in, in artificial intelligence technology, we should be able to do it for all the images on the web.

06:21

Josh: It’s an it’s an interesting idea, which is both fanciful and also incredibly realistic. The idea that artificial intelligence could look at a page realized there’s something lacking in that page that is required to you have a screen reader have all the information it needs, run a little bit of AI, and then fix the image before there’s any kind of problem. Unfortunately, we’re not at that future yet. And so Shrey has to come up with a different solution.

06:49

Shreyans: And this is again, going back to my conversation with Adam. The only way we could fix that is through an extension. If the browser makers are not going to do that, then we have to fit for them. And so the way we thought about doing it was to create an extension. And in this case, a Chrome extension, the way the extension works is that you would go on and install the extension, just like you install any other extension. And then you, let’s say, I’m a user of an assistive technology, like a screen reader. And I go to a beach and I come across an image while, you know, kind of going through the flow of the web page. So what is the current state is that most assistive technologies if they do not know what that image, if the image does not have an alternative text in it, then what this technology will try to do is it will try to read off the whole file, and just in a hope that maybe the file input tell the user what this image is about. But you know, more often than not, these images are hosted on CDN and they have have really convoluted file names, and it becomes really annoying for the, for the user to go through those images, because it will, you know, really read out the whole file path. So the way the extension works is, you know you so you load a web page, then you know that there’s going to be some image tags. So you can do it both ways. You can either set it up, that it will do it for all the images, or you can do it on a web page by page basis. So let’s say I visit a web page. And I know that the developer was really sloppy and did not add any, you know, alternative text to the images. But there is really some value in those images that I need in order to get the whole context of that image of the webpage. So in that case, I can turn on my extension. And what it would do is it would take the URL for the image and it will send this URL to our Einstein. So Einstein. Now you have two options, you can either use the pre built models that Salesforce provides, which are the food classifier, General image classifier, a scene classifier. So it really depends on on what you’re trying to look for. And then Einstein would send a response, which is basically the labels, associated photos, images. And you basically pick the label with the which has the highest probability. And then you insert an alternative text onto the image tag, which says, the image may contain mountains or the image may contain mountains with people. And so it’s it really adds a lot of context into the web page, which would otherwise be completely missing.

09:50

Josh: This is a great example of how relatively complicated problem can be solved kind of simply provided you have some enterprise class artificial Intel. on hand, of course, and it should be noted the way Einstein vision works is that you have to have your own keys to talk to the API’s. And when Shrey was first setting it up, he was using his own cues. And so he needed to come up with a solution that could work for everybody.

10:14

Shreyans: So the way the Chrome extension is written right now is that you can, you are not necessarily tied into Einstein, you could potentially hook into some other image, Ai, if you want to. However, if you do want to use Einstein, there is a very easy process where you can set up app where we can provision an Einstein credentials for you. And that way, you can still use the Chrome extension with very minimal changes that you need to do and you will essentially have your own account, and all of the predictions would count against that.

10:57

Josh: Now, Shrey’s code is going to be up on his GitHub repo, if you would like to download, tinker or use that Chrome extension, we will put a link to that repo in the description for this episode. And moving on to one of the other projects, which combines a couple of interesting things. One of them using technology, which doesn’t get quite as much press as some of the other things that we have at Salesforce, like lightning web components or API’s, it’s Quip and the ability to interact with your document.

11:26

Shreyans: So I was really interested in, in, you know, being able to maybe read to some of the, you know, of the documents that I have on web, or even potentially right on them. So what I did is I created an application that can really kind of focus on to any platform be so for example, could be Alexa, or it could be, you know, the actions on Google, which is the OK Google platform, and really interact with my equip account in a way that I can ask, OK, Google, can you read the my to do list from quick and it will read out to me and then I can actually tell it that can you mark you know, by look as complete. And you know, it would actually do that or I can also say, can you add by bread into my to do list and it will do that. So, the really nice thing about it is that, you know that the only thing that we always have is our smartphone. And so you can always be so you would never miss out on like, you know, taking out your phone and writing down your idea where you can like just actually while driving, you can you can just say hey, I think I have this idea. Can you add this to my document and add my manager on it? And so, so that was that was kind of the goal. quip is super extensible. So there are many The two different ways you can extend quick. So the first one is the quick platform API itself. And the quick platform API would let you do anything that you can do with your browser or the clip app. So you know, create document, read document, you know, add on comments, things like that. But the really, the other platform that clip provides is the quick live apps platform. And live apps is where you can extend the functionality for quote for your own use cases. And I think this is really the distinguishing feature between quip and any other collaboration software out there.

13:38

Josh: Shrey’s project is a great example of using voice assistance with that document API. Again, his code is going to be up in this GitHub repo, we will put a link in the episode description. We’re also likely to cover more about quip live apps and a future episode of the pod.

And that’s our show. I really want to thank Shrey for his time and his energy when he innovating on the platform and showing us some great examples of things that we can do with accessibility and with the Salesforce platform. Before we go though, Shrey really want to give a shout out to a project that really helps to bring this episode full circle and go back to what we were talking about in the beginning the BTPTOO time that Salesforce employees were allowed to donate. And he has found a great project for helping find things that need that time.

14:25

Shreyans: So that’s when I found out about catchafire. So the way it works is that nonprofits can go and create projects, and the projects can range from one hour to several months. So they have different complexities, different timelines for different projects, and they put it out really as like job openings. Now the volunteers who are interested in go on to the platform and then select their projects based on a number of different criteria. So it could be based on the social cause they really care about or it could be The length of the project, or it could be the type of their expertise. For example, I’m a developer, so I usually tend to look for developer based projects. But there are also opportunities for marketers and content writers and digital artists and designers.

15:19

Josh: Thanks for listening, everybody. If you want to learn more about this podcast, head on over to developer.salesforce.com slash podcast where you can hear old episodes or you can find links to your favorite services, where you can subscribe and like. Thanks again and I’ll talk to you next week.

Get notified of new episodes with the new Salesforce Developers Slack app.