Resumen
This paper proposes a way to control home appliances using a multimodal interaction system such as speech, gestures, and smartphone applications. The sensor to capture speech, in the Indonesian language, and gestures from users are Kinect v2. Speech recognition process with the Google Cloud Speech, gesture recognition process with the K-Means clustering, and dialogue system process with the finite state machine. Users can also use the smartphone application to remotely control home appliances through mobile devices such as tablets or smartphones that are connected directly to the real-time database. There are two output responses from this system, namely the audio response generator to provide feedback to the user through the sound of the computer speaker and also provide an action to control home appliances use Esp8266. The average level of accuracy testing of interaction using dialogue systems and gesture are 92.5% and 79,25%. Interaction using dialogue systems is better than gesture. Smartphone applications can control home appliances properly.