Smart Remote

A smart remote control with button, gesture, keyboard and voice input modes.

The remote control that is included with the cloud connector has buttons for arrows, OK, back, home, etc. You can navigate around a traditional TV interface using these buttons, but what if your use case calls for richer interactions?

In this tutorial, we'll build an app that lets you scan a QR code with your phone to use it as a smart remote control. It includes:

  1. The same buttons as the physical remote.
  2. A trackpad that captures your gestures.
  3. A keyboard so you can type.
  4. A microphone so you can speak voice commands.

It's like getting four remotes in one!

Architecture

The physical remote control accepts button presses and sends them to the cloud connector over Bluetooth, which forwards them over the internet to the Senza cloud, that delivers them to your web app as keydown events. It effectively emulates an ordinary keyboard, when the user presses OK on remote, your app receives the event as if they had pressed the Enter key.

Our app will have three parts: a web app that runs on Senza, a web app that runs on a mobile device, and a Socket.io server that is used for forwarding events. When the mobile app receives an input event, it will send a message to the Socket.io server which will simply forward it to the Senza app. In addition to keydown events, we will also handle mousemove and other input events.

The Socket.io server will serve the two web apps from its public folder. The default app that runs on the TV will be called index and the one that shows the remote control on the mobile device will be called remote. Each web app will have an HTML, JavaScript and CSS file.

Security

For security purposes, we want to make sure that only the user who can see the TV can use the smart remote to control it. The TV app generates a UUID that is included in the QR code, so when the user scans the QR code with their phone it receives the UUID in the URL. When the TV app firsts connects to the Socket.io server, it subscribes for notifications under that UUID. When the mobile app sends events to the server with the UUID in the message, they are forwarded to the TV app which has subscribed to receive the messages.

This communication happens outside of the Senza infrastructure, and does not rely on any Senza specific technology or use the client library. For development purposes that means that you can even run the TV app in a regular web browser on your computer, since it doesn't depend on the device ID.

Security and convenience often come at the expense of one another, and for this demo we have made some deliberate choices to improve convenience. For example, the UUID is cached in the TV app's local storage so that it is the same across launches of the app. That means that the user can pair their phone once and bookmark the app, as opposed to scanning the QR code every time they want to use the remote.

The principle is that if someone has access to the QR code, they can use the remote app to send input events to the TV app. Depending upon your use case, you may wish to consider implementing additional security measures, such as requiring a login to your system, obtaining an access token from your server, expiring the access token after a period of time, allowing the user to unpair connected remotes, and so on.

TV App

To demonstrate how we can receive gesture input, we'll create an app that shows a football emoji that the user can move around the screen.

Here's the HTML for our app consisting a white rectangle that we'll call the 'field', and inside it an image element for the QR code, an image element for the football, and a div that we'll use for displaying a text message.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Football</title>
    <link rel="stylesheet" href="index.css">
</head>
<body>
  <div id="container">
    <div id="field">
      <img id="qrcode" src="">
      <img id="football" draggable="false" src="images/football.png" alt="Football">
      <div id="text"></div>
    </div>
  </div>
</body>
<script src="js/socket.io-1.0.0.js"></script>
<script src="index.js"></script>
</html>

As we'll accept direct mouse input, when you run the app on your computer during development you'll be able to drag the ball around. The football has a draggable="false"attribute that will let the user drag the image element around on screen rather than letting them drag the image file to save it to their desktop.

Pairing

We'll add code to a file called index.js for the TV app. As mentioned above in the Security section, first we'll generate a UUID to uniquely identify the app on this device, and we'll cache it in local storage so that it's always the same. We'll informally call it the device ID even though we're not getting it from the client library.

function generateUUID() {
  let cachedUUID = localStorage.getItem('uuid');
  if (cachedUUID) return cachedUUID;
  let uuid = crypto.randomUUID();
  localStorage.setItem('uuid', uuid);
  return uuid;
}
let device = generateUUID();

function generateCode(text, size) {
  let data = encodeURIComponent(text);
  let src = `http://api.qrserver.com/v1/create-qr-code/?data=${data}&size=${size}x${size}`;
  qrcode.src = src;
}
let page = window.location.href;
if (page.endsWith("html") || page.endsWith("/")) {
  page = page.substring(0, page.lastIndexOf('/'));
}
let remoteLink = page + "/remote.html?device=" + device;
generateCode(remoteLink, 200);
console.log("Remote link", remoteLink);

This code will create a link to the remote page by modifying the link of the current page, replacing the filename with remote.html, and adding the UUID as a query parameter. It will display a QR code that points to the URL, and also log it to the console.

For development you'll want to use ngrok to create a tunnel from the public internet to your computer so that you can load the app from both Senza and your mobile phone. For example, the link to the remote page might look something like this:

https://f007ba110123.ngrok.app/remote.html?device=cff35951-d431-4076-96c5-6078f007ba11

When you scan the code on your phone, it will be paired!

Remote App

The remote app will feature several user interface elements:

  • a text field for accepting keyboard input.
  • a div for accepting mouse/touch input that we'll use as the trackpad.
  • an image of the remote control buttons with a clickable image map.

Here's the HTML:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=500">
  <title>Trackpad</title>
  <link rel="stylesheet" href="remote.css">
</head>
<body>
  <input type="text" id="textfield" placeholder=" search"><br><br>
  <div id="rectangle"></div><br>
  <img width="360" src="images/buttons.png" usemap="#image-map">
</body>
<script src="js/socket.io-1.0.0.js"></script>
<script src="remote.js"></script>
</html>

We'll add the code to a file called remote.js for the remote app. When the app starts up, we'll get the device ID from the query parameters:

let queryString = window.location.search;
let urlParams = new URLSearchParams(queryString);
let device = urlParams.has('device') ? urlParams.get('device') : "";

Socket.IO

When the TV app starts up, we'll connect to the Socket.io server and send a join message with the device ID. That way when the remote app sends events for this device, we'll receive them.

let socket = io.connect(location.hostname);
socket.emit("join", {device});

The code for the server is very short:

const express = require("express");
const app = express();
const errorHandler = require('errorhandler');
const hostname = process.env.HOSTNAME || 'localhost';
const port = parseInt(process.env.PORT, 10) || 8080;
const publicDir = process.argv[2] || __dirname + '/public';
const io = require('socket.io').listen(app.listen(port));

app.use(express.static(publicDir));
app.use(errorHandler({ dumpExceptions: true, showStack: true}));

console.log("Remote server running at " + hostname + ":" + port);

let events = ["mousedown", "mousemove", "mouseup", "keydown", "command"];

io.sockets.on('connection', (socket) => {
  socket.on("join", (message) => socket.join(message.device));
  
  events.forEach(event => {
    socket.on(event, async (message) => {
      console.log(event, message);
      socket.to(message.device).emit(event, message);
    });
  });
});

When the server receives a join message from the TV app, it calls the socket.join() function to join a virtual chat room. Then, for each type of event that we want to handle, when we receive an event from the remote app, we'll print it to the console and then emit the message to the device. That's literally the entire server!

In the remote app, we'll connect to the server and define a generic function for sending messages. Whenever this function is called we will inject the device ID into the message, log it to the console, and send it to the server.

let socket = io.connect(location.hostname);

function sendMessage(name, message) {
  message.device = device;
  console.log(name, message);
  socket.emit(name, message);
}

That takes care of all the infrastructure. Now we'll explore each of the four input modes: buttons, gestures, text and voice.

Buttons

Maybe you've lost the remote control behind the couch cushions, and want to use your phone instead? This app will include most of the functionality of the physical remote control.

We'll use this picture of the buttons :

Here's an image map that we can use so that you can click or tap on the image to press the buttons:

<map name="image-map">
  <area target="" alt="Up" title="Up" onclick="up()" coords="126,3,236,102" shape="rect">
  <area target="" alt="Down" title="Down" onclick="down()" coords="232,313,129,210" shape="rect">
  <area target="" alt="Right" title="Right" onclick="right()" coords="235,101,339,208" shape="rect">
  <area target="" alt="Left" title="Left" onclick="left()" coords="20,104,125,208" shape="rect">
  <area target="" alt="OK" title="OK" onclick="enter()" coords="126,102,234,210" shape="rect">
  <area target="" alt="Back" title="Back" onclick="escape()" coords="22,344,124,446" shape="rect">
  <area target="" alt="Home" title="Home" onclick="home()" coords="126,340,239,445" shape="rect">
  <area target="" alt="Command" title="Command" onclick="command()" coords="341,442,242,344" shape="rect">
</map>

In the script, we'll have some short functions for handling each of the buttons. They will all chain to the sendKey() function that sends a keydown event with the key as a message.

let sendKey = (key) => sendMessage("keydown", {key});

let left = () => sendKey("ArrowLeft");
let right = () => sendKey("ArrowRight");
let up = () => sendKey("ArrowUp");
let down = () => sendKey("ArrowDown");
let enter = () => sendKey("Enter");
let escape = () => sendKey("Escape");

On the TV app, we will use a keydown handler for accepting the Remote Input just like any other Senza app. In fact we'll use the same function for handling key input of any type: a physical keyboard, the Bluetooth remote, or messages from Socket.io:

socket.on('keydown', keydown);
document.addEventListener("keydown", keydown);

async function keydown(event) {
  console.log("keydown", event);
	switch (event.key) {
    case "ArrowUp": up(10); break;
    case "ArrowDown": down(10); break;
    case "ArrowLeft": left(10); break;
    case "ArrowRight": right(10); break;
    case "Enter": center(); break;
	}
	if (event.preventDefault) event.preventDefault();
}

let up = (n) => mousemove({movementX: 0, movementY: -1 * n, remote: true});
let down = (n) => mousemove({movementX: 0, movementY: n, remote: true});
let left = (n) => mousemove({movementX: -1 * n, movementY: 0, remote: true});
let right = (n) => mousemove({movementX: n, movementY: 0, remote: true});

function center() {
  football.style.left = '950px';
  football.style.top = '550px';
}

The arrow keys will move the ball around 10 pixels in any direction, the OK button will move it back to the center of the screen, and the back button will clear the text message if there is one.

Mouse

That's an elaborate way of doing something you could already do, but let's try something new: sending gesture input using a virtual trackpad. If you run the remote app on your computer you'll be able to test it by dragging with your mouse.

When you click and drag the mouse on a computer, you can receive these events:

trackpad.addEventListener('mousedown', mousedown);
trackpad.addEventListener('mousemove', mousemove);
trackpad.addEventListener('mouseup', mouseup);
  • mousedown: when the user clicks the button, includes location of the click
  • mousemove: fires repeatedly as the user drags, includes the distance moved
  • mouseup: when the user releases the button, includes location again

Here's the code for handling the mouse events:

let movementX = 0;
let movementY = 0;
let debounceTimeout;
let isMouseDown = false;

function mousedown(event) {
  isMouseDown = true;
  sendMessage("mousedown", {clientX: event.clientX, clientY: event.clientY});
}

function mousemove(event) {
  if ((isMouseDown || event.touches) && (event.movementX !== 0 || event.movementY !== 0)) {
    movementX += event.movementX;
    movementY += event.movementY;

    if (!debounceTimeout) {
      debounceTimeout = setTimeout(() => {
        sendMessage("mousemove", {movementX, movementY, remote: true});
        movementX = 0;
        movementY = 0;
        debounceTimeout = null;
      }, 20);
    }
  }
}

function mouseup(event) {
  if (isMouseDown) {
    isMouseDown = false;
    sendMessage("mouseup", {clientX: event.clientX, clientY: event.clientY});
  }
}

Normally you will receive mousemove events when the user hovers over an element, even if they are not holding the mouse button down. We use an isMouseDown flag so that we only process these events after mousedown and stop when we receive mouseup.

The mousemove events are fired very fast, even if the user hasn't moved the mouse at all! Here we ignore events that have no movement, and debounce the events by adding up the movements and combining them into a single message every 20 milliseconds. That limits the number of messages sent to the server while keeping the movement smooth.

In the TV app, we'll handle mouse input by using the same handler function to receive local events (if you run the app on your computer) as well as remote events from Socket.io.

document.addEventListener('mousedown', (event) => {
  isDragging = true;
  football.style.cursor = 'grabbing';
});

document.addEventListener('mousemove', mousemove);
socket.on('mousemove', mousemove);

function mousemove(event) {
  if (isDragging || event.remote) {
      const rect = field.getBoundingClientRect();
      let scale = event.remote ? 4.0 : 1.0;
      let newX = parseInt(football.style.left) + event.movementX * scale;
      let newY = parseInt(football.style.top) + event.movementY * scale;

      // Constrain within the rectangle
      newX = Math.max(150, Math.min(newX, rect.width - football.clientWidth + 150));
      newY = Math.max(150, Math.min(newY, rect.height - football.clientHeight + 150));

      football.style.left = `${newX}px`;
      football.style.top = `${newY}px`;
  }
}

document.addEventListener('mouseup', () => {
  isDragging = false;
  football.style.cursor = 'grab';
});

When we get a mousemove event, if we're in a drag or receiving a remote event we will move the ball around on the screen. For local events we'll move it around so it stays under the pointer, while for remote events we'll multiply the movements by 4x since the trackpad is smaller than the screen. If we're near the edge of the screen we'll constrain the movement of the ball within the bounds.

Touch

Now let's try adapting the app to work with touch input, which allows you to swipe with your finger instead of dragging with the mouse. Touch input is a little bit different than mouse input, and there are a different set of events. Rather than a single pointer location, multi-touch input has an array of touches.

trackpad.addEventListener("touchstart", touchstart, false);
trackpad.addEventListener("touchmove", touchmove, false);
trackpad.addEventListener("touchend", touchend, false);
  • touchstart: when the user starts a gesture by touching their finger(s)
  • touchmove: fires repeatedly as the user move their finger(s)
  • touched: when the user finishes a gesture and lifts their finger(s)

To keep things simple, we're going to only look at one finger, and we're going to adapt the touch input events to emulate a mouse drag. That way the TV app only needs to respond to mousedown events.

let previousTouch = null;

function touchstart(event) {
  previousTouch = null;
  const touch = event.touches[0];
  sendMessage("mousedown", {clientX: touch.pageX, clientY: touch.pageY});
}

function touchmove(event) {
  const touch = event.touches[0];
  if (previousTouch) {
    event.movementX = touch.pageX - previousTouch.pageX;
    event.movementY = touch.pageY - previousTouch.pageY;
    mousemove(event);
  };
  previousTouch = touch;
}

function touchend(event) {
  const touch = event.touches[0];
  sendMessage("mouseup", {clientX: touch.pageX, clientY: touch.pageY});
}

The touchmove event doesn't include the distance moved but just the coordinates. So we'll keep track of the previous touch and calculate the distance moved ourselves. Once we do that we'll chain to the mousemove() function defined in the previous section. Don't forget to reset the previous touch in the touchstart handler, otherwise you'll get a big jump from the previous gesture!

We can also send mousedown and mouseup events through the server, even though our use case doesn't require them.

Now we can drag the ball around using either mouse or touch input.

Text

What if you want to send text input to the TV app? Rather than using a cumbersome onscreen keyboard, why not use the keyboard that's already in your hand where you can type very fast?

To support text input, we'll display a text field where the user can type. On a physical phone this will bring up the onscreen keyboard. In the iOS Simulator, you can type using your computer keyboard or hit ⌘K to show the onscreen keyboard.

By this point you're already very familiar with the keydown event handler. We only need to add one more line of code to the remote app to support sending keyboard input to the TV app:

textfield.addEventListener('keydown', (event) => sendKey(event.key));

In the TV app, we'll add some cases to the switch statement in the existing keydown handler to handle text input. The default case will handle letters and such, and will simply append them to the text field. The backspace key will remove the last character as you would expect, and the back button on the remote (which maps to the escape key) will clear the input.

async function keydown(event) {
  console.log("keydown", event);
	switch (event.key) {
    ...
    case "Escape": text.innerHTML = ""; break;
    case "Backspace": text.innerHTML = text.innerHTML.substring(0, text.innerHTML.length - 1); break;
    case "Shift": break;
    case "Meta": break;
    default: text.innerHTML += event.key;
	}
	if (event.preventDefault) event.preventDefault();
}

So now when you type into the field you'll see the same text on the TV. The experience is very intuitive.

Voice

To capture voice commands, we'll take advantage of WebKit's built-in speech recognition capabilities. We'll wire up the content hub button on the image map to call a function called command() that turns voice recognition on and off. When you tap the button it will call recognition.start(), and it will start listening to what you are saying. When you tap the button again, it will call recognition.stop() and then send a command message to the server with the recognized text.

let recognition = new webkitSpeechRecognition();
let recognizing = false;

recognition.lang = 'en-US';
recognition.continuous = true;
recognition.interimResults = true;
recognition.onstart = () => recognizing = true;
recognition.onend = () => recognizing = false;
recognition.onresult = (event) => {
  for (let i = event.resultIndex; i < event.results.length; ++i) {
    let command = event.results[i][0].transcript;
    if (event.results[i].isFinal) {
      sendMessage("command", {command});

      if (recognizing) {
        recognition.stop();
      }
    }
  }
};

function command(event) {
  if (recognizing) {
    recognition.stop();
  } else {
    recognition.start();
  }
}

The server will forward command messages just like any other message. When the TV app receives a command message, it will handle it like this:

socket.on('command', (message) => {
  let command = message.command.toLowerCase();
  console.log("command", command);
  
  switch (command) {
    case "up": up(40); break;
    case "down": down(40); break;
    case "left": left(40); break;
    case "right": right(40); break;
    case "center": center(); break;
    case "clear": text.innerHTML = ""; break;
    default: text.innerHTML = message.command;
  }
});

function center() {
  football.style.left = '950px';
  football.style.top = '550px';
}

We'll respond to up/down/left/right commands to move the ball, you can say center to move the ball to the center of the screen, and the clear command will clear any text that has been displayed. If you say anything else, it will simply show the text on the screen just as if you had typed it.

Testing

You can test the mobile app in a browser on your computer if you want. Since you can't scan a QR code on your computer, you can look in the console output of the TV app to get the link for the remote app. You can also scan the QR code with your phone and then copy the address from your phone to your computer.

If you would like to test gesture input on a Mac, you can install the Xcode developer tools and open the page in the iOS Simulator in Safari. You can get the link to the page by scanning the QR code using a physical iPhone, copying the page address on the physical device, and pasting it into the browser in the Simulator. Alternatively, the link is logged to the console when the TV page loads so you can copy it from the debugger.

You can test the voice input features running the client on a mobile device or a desktop computer. In either case, your device will prompt you to grant access to the microphone to all the app to listen to you.

Conclusion

In this tutorial you've learned how to securely pair a mobile app with your TV app, using an extremely simple Socket.IO server to send commands from one to the other. Regardless of whether you want to use buttons, gestures, the keyboard or voice input, you can now use any of these input modes in your TV app. Regardless of what your app does, you're welcome to copy and paste the source code for the remote control directly into your app.


What’s Next