Trasier's mission is to bring Distributed Tracing to the next level and help you to gain unknown insights into your processes by
Business Tracing.

Introduction

Trasier is a Busines Tracing as a service system build on top of Open Tracing standard.

Distributed Tracing

Is a method used to monitor and profile applications in a distributed environment. It gives insights into the real architecture of a micro services architecture and helps to track down technical issues.

Benefits of distributed tracing

  • Latency visualisation

  • Reveal the real architecture (shows dependencies between services)

  • Error analysis

  • Infrastructure check

High level architecture

The diagram below depicts the components needed to implement distributed tracing concepts.

500

Trace-Context (Specification)

Services need to understand the data they exchange with each other. To achieve this we need to have some kind of specification or api. In distributed tracing the specification is the Trace-Context.

Trace-Context consists of Trace-ID and Span-ID which are propagated between services with every request.

Context Propagation

There must be a mechanism to transfer the Trace-Context between services and within one service itself from the endpoint to the client side.

There are two ways to achieve this:

  • Explicit - as part of the service / method api

  • Implicit - using service instrumentation

There are a number of instrumentation libraries like Open Tracing or Open Census to achieve this. The context information within the service is usually propagated with help of a ThreadLocal, while context between services is usually propagated within the headers (for example http headers).

Decoupled collector - Tracer

The collector (Tracer or a Tracer Client) is usually an agent or instrumentation library that collects the messages and sends them asynchronously to the tracing backend.

Collector and Datastore

On the backend side, there must be a service or a collector that will receive, process and store the data. This might be for example a queuing system, or a simple service that sends the data to the datastore.

User interface

The data has to be presented to the user in some kind of way.

There are a few projects compatible with Open Tracing like Jaeger and Zipkin and a large amount of APM providers can nowdays visualize distributed tracing data.

Distributed tracing vocabulary

500

Span

Span is an operation that was executed or a unit of wok. Typically it is the communication of two services. Span carries all the necessary information needed for further analysis. This are for example operation the name (mandatory), start / end timestamps, error codes (in the response), headers, log entries, etc.

Spans can have parent spans. In the picture above we can see three spans, two root spans (requestOffer, bookOffer) and a child span (checkPayment as a child of bookOffer span).

A Span has an unique ID called Span-ID which is part of the Trace-Context.

Trace

Trace is a request to the system that goes through all services. It has a collection of spans. Trace has an unique ID called the Trace-ID and is part of the Trace-Context.

In the example above we can see two traces, once initialized by the offerRequest and one initialized by the bookingRequest.

Trace Context

Consist of the Trace-ID and Span-ID. The trace context must be propagated with every call between services as well as within a service itself.

Business Tracing

Business Tracing takes Distributed Tracing to the next level by focusing on the business aspects of your applications. It correlates and processes the communication of your business applications in a way that helps to track down business issues.

Conversation

Instead of tracking technical requests we are tracing processes from a business perspective.

To track the business process we need to add the Conversation-ID to the Trace-Context.

A Conversation is a collection of Traces. Typically the request initiator (Website or other UI) knows when a business process starts and when it ends and must generate proper `Conversation-ID`s.

In the example below the business process is a booking process. It starts with an offer request, goes through payment, ticketing, email sending etc. The UI knows that for example a new login starts a new business process and email sending may end a business process.

500

With the help of Converstaion we can trace the whole business process together with message payloads from the very beginning to the very end. Also asynchronous batch processing can be visible along with a business process if the Conversation-ID is known to the batch processor.

500

Message payloads

Unlike Distributed Tracing tools, Trasier has the ability to store message payloads (incoming and outgoing) within the span. Storing message payloads means that Trasier does not sample requests like other distributed tracing tools do.

Note that not all messages must be intercepted and stored. Trasier offers the ability to configure the system to not collect sensitive data (like customer data or payment information).

Use cases

Business Tracing

There are lots of use cases for Business Tracing.

Imagine a flight reservation system and a customer complaining that he booked a ticket to Sydney (Australia) but got a ticket to Sydney (Canada).

In flight reservation systems, the trip is selected at the very beginning of the business process and the ticket is produced at the very end. Was this an user error (the user didn’t notice that he picked the wrong destination) or was it a system error - incorrect data printed on the ticket?

Imagine another customer claims that was offered a flight for $200 but was charged $300. How to analyse this issue in a system where prices are dynamically changing? With Trasier - Business Tracing we would be able to find the conversation based on the customers booking reference and check what offers were send to his browser in the past.

500

Bug triage

Being able to quickly see where an error (technical or business error) comes from makes it easier to assign a bug to appropriate team.

Mocking and replaying

Having all the message payloads stored opens new possibilities for the development teams.

  • Mocking - instead of using fake objects or storing xml files which are hard to maintain, one could use a conversation id of a business proces to read all the responses from the tracing backend. If the system gets updated (new api), it is enough to replace the conversation id.

500

  • Replaying - instead of explaining the steps to reproduce a bug and manually re-enter all the details in some kind of UI (or creating tests to reproduce the error) one could with the help of conversation id read the requests send to the system, and re-send them once again to the locally deployed instance and immediately start debugging.

500

Predictive analysis

The data stored by the Trasier system is the perfect foundation for extensive research and data analysis. Gathering the equivalent information from your existing data stores is expensive and may not even be possible. The chance is to finally make take the step to predictive analysis on virtual processes.

Possible use cases:

  • Anomaly detection

    • Higher number of errors

    • Slower response times

    • Revenue drop

    • Ticket prices suddenly offered at a very low price

  • Business values

    • How much users typically are willing to pay for service upgrades

  • On the fly verifications

    • Validate on the fly that the amount paid was send to the user as email conformation (email service), to the SAP Service, to the payment provider.

How does Trasier work

Trasier uses an instrumentation library as a Tracing Client, which intercepts the communication between services and takes care of the context propagation.

The data is send to the Trasier backend. Trasier’s backend uses a high performance processing pipeline to process, validate, index and store the data. The data is stored encrypted, but the indexes are not. Thats why the user has to configure Trasier (using provided configuration options or by writing an interceptor) not to send sensitive data to the backend.

The data send to Trasier is indexed, meaning the user can search for phrases that occur in message payloads, operation names or message headers. Note that the data is optimized for indexing, so that xml tags, Base64 encodings, images, etc are stripped.

Registration

To register go to https://trasier.com/#/register. After filling in the registration form an activation email will be send to the given email address. During the activation the user will be asked to setup a new password.

After the account was activated, an email with the account id will be send.

From now on it is possible to access the Trasier UI at https://ui.trasier.com.

Note
Either the email address or the account id can be used for logging in.

After logging in user will be asked to configure spaces.

Configuring spaces

Space defines a storage for the data that is send to the tracing backend.

A space like an environment, for example dev, test or production are reasonable space names.

If an organization has multiple distinct applications that are traced independently, the space can be prefixed: bookingapp_dev iottrace_dev.

Note that it is not possible to link data between spaces, i.e. data stored in space prod is not visible in space test.

Once the space was created, a configuration details such as account id, client id and client secret can be displayed. This are needed for authenticating the tracing client.

spaces

Note
It is possible to delete a space. This operation cannot be undone and all previously stored data in this space will be removed.

Trasier integration

Authentication

Trasier uses the standard Oauth2 authentication with client id and client secret and every space has its own configuration.

The Spring based Trasier Java Client take care of the authentication automatically. Other projects can use the OAuthTokenSafe as a reference implementation.

Java Client

The Java client is a set of open source libraries hosted on github.

Configuration

Trasier is based on top of the open tracing. Both OpenTracing and Trasier must be enabled by a feature toggle:

opentracing.spring.web:
  enabled: ${TRASIER_ACTIVATED:true}
  client.enabled: ${TRASIER_ACTIVATED:true}
trasier:
  client:
    accountId: ${ACCOUNT_ID:1234}
    systemName: ${APP_NAME}
    activated: ${TRASIER_ACTIVATED:true}
    payloadTracingDisabled: ${TRASIER_PAYLOAD_TRACING_DISABLED:false}
    interceptor.sampling:
      url.skipPattern: .*/checkServlet|/callback/agent/log|/ws/stomp|/widgets/poll/WORKSTATIONCHECK

Aditionally one must configure spaces to tell Trasier where to send the data:

trasier:
  client:
    spaceKey: ${SPACE_KEY}
    clientId: ${CLIENT_ID}
    clientSecret: ${CLIENT_SECRET}

Integration with spring

Either GRPC or REST protocol can be used to send the data into the tracing bakend.

For GRPC (recommended) add the following dependency to your pom.xml file:

pom.xml
<!-- Trasier Dependencies -->
<dependency>
    <groupId>io.opentracing.contrib</groupId>
    <artifactId>opentracing-spring-web</artifactId>
    <version>0.3.3</version>
</dependency>
<dependency>
    <groupId>com.trasier</groupId>
    <artifactId>trasier-client-spring-grpc</artifactId>
    <version>1.4.2</version>
</dependency>
<dependency>
    <groupId>com.trasier</groupId>
    <artifactId>trasier-client-opentracing-spring-interceptor</artifactId>
    <version>1.4.2</version>
</dependency>

For REST add the following dependency to your pom.xml file:

pom.xml
<!-- Trasier Dependencies -->
<dependency>
    <groupId>io.opentracing.contrib</groupId>
    <artifactId>opentracing-spring-web</artifactId>
    <version>0.3.3</version>
</dependency>
<dependency>
    <groupId>com.trasier</groupId>
    <artifactId>trasier-client-spring-rest</artifactId>
    <version>1.4.2</version>
</dependency>
<dependency>
    <groupId>com.trasier</groupId>
    <artifactId>trasier-client-opentracing-spring-interceptor</artifactId>
    <version>1.4.2</version>
</dependency>

Add the following configs in the web.xml:

web.xml
<filter>
    <filter-name>TrasierBufferFilter</filter-name>
    <filter-class>com.trasier.opentracing.spring.interceptor.servlet.TrasierBufferFilter</filter-class>
</filter>
<filter-mapping>
    <filter-name>TrasierBufferFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>
<filter>
    <filter-name>TracingFilter</filter-name>
    <filter-class>io.opentracing.contrib.web.servlet.filter.TracingFilter</filter-class>
</filter>
<filter-mapping>
    <filter-name>TracingFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>
<listener>
    <listener-class>com.trasier.opentracing.spring.interceptor.servlet.TrasierSpringConfigListener</listener-class>
</listener>

Add the following bean definitions:

<bean id="trasierClientConfiguration" class="com.trasier.client.configuration.TrasierClientConfiguration">
        <property name="accountId" value="${trasier.accountId}" />
        <property name="spaceKey" value="${trasier.spaceKey}" />
        <property name="clientId" value="${trasier.clientId}" />
        <property name="clientSecret" value="${trasier.clientSecret}" />
        <property name="systemName" value="${trasier.clientSecret}" />
        <property name="activated" value="${trasier.activated}" />
        <property name="payloadTracingDisabled" value="${trasier.payloadTracingDisabled}" />
</bean>

<bean id="trasierEndpointConfiguration" class="com.trasier.client.configuration.TrasierEndpointConfiguration" />

<bean id="trasierSampleByOperationConfiguration" class="com.trasier.client.spring.spancontrol.TrasierSampleByOperationConfiguration" />

<bean id="trasierScopeManager" class="com.trasier.client.opentracing.TrasierScopeManager">
</bean>

<bean id="trasierTracer" class="com.trasier.client.opentracing.TrasierTracer">
        <constructor-arg index="0" ref="trasierSpringCacheClient" />
        <constructor-arg index="1" ref="trasierClientConfiguration" />
        <constructor-arg index="2" ref="trasierScopeManager" />
</bean>

<context:component-scan base-package="com.trasier.client.spring"/>
<context:component-scan base-package="com.trasier.opentracing.spring.interceptor"/>

Integration with spring boot

To use the GRPC protool to send the data into tracing backend add the following dependencies to your pom.xml file:

pom.xml
<!-- Trasier -->
<dependency>
    <groupId>com.trasier</groupId>
    <artifactId>trasier-client-spring-grpc-starter</artifactId>
    <version>1.4.2</version>
</dependency>

To use the REST protool to send the data into tracing backend add the following dependencies to your pom.xml file:

pom.xml
<!-- Trasier -->
<dependency>
    <groupId>com.trasier</groupId>
    <artifactId>trasier-client-spring-rest-starter</artifactId>
    <version>1.4.2</version>
</dependency>

A reference implementation can be found at https://github.com/trasiercom/springboot-example (select the trasier branch).

Tracing messages manually

One can manually write any message at any point in the application to the Trasier backend. This is useful if someone has an interceptor that is not yet supported by the library. There are also cases when it is important to see an internal state of the application along with the messages exchanged by services (complicated algorithms, filter logic, etc).

The following snipped demonstrates how to do that:

Filter.java
@Autowired
private Tracer tracer;

private void traceableFilterMethod(Filter filter, List<String> data) {

    Scope scope = tracer.buildSpan("MY_OPERATION_NAME")
                .withTag(Tags.SPAN_KIND.getKey(), Tags.SPAN_KIND_CLIENT)
                .startActive(true);

    Span trasierSpan = null;

    if (scope instanceof TrasierScope) {
        trasierSpan = ((TrasierSpan) scope.span()).unwrap();
    }

    if (trasierSpan != null) {
        trasierSpan.setIncomingContentType(ContentType.TEXT);
        trasierSpan.setBeginProcessingTimestamp(System.currentTimeMillis());
        trasierSpan.setIncomingData(data.toString());
    }

    filter.apply(data); // try-catch omitted for readability

    if (trasierSpan != null) {
        trasierSpan.setOutgoingContentType(ContentType.TEXT);
        trasierSpan.setFinishProcessingTimestamp(System.currentTimeMillis());
        trasierSpan.setOutgoingData(data.toString());
        trasierSpan.setStatus(TrasierConstants.STATUS_OK);

        tracer.scopeManager().active().close();
    }

}

In a similar way we can implement a custom http filter:

HttpFilter
@Autowired
private Tracer tracer;

@Override
public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) {

    CachedServletRequestWrapper request = createCachedRequest((HttpServletRequest) servletRequest);
    CachedServletResponseWrapper response = createCachedResponse((HttpServletResponse) servletResponse);

    Scope scope = tracer.buildSpan("MY_OPERATION_NAME")
                .withTag(Tags.SPAN_KIND.getKey(), Tags.SPAN_KIND_CLIENT)
                .startActive(true);

    Span trasierSpan = null;

    if (scope instanceof TrasierScope) {
        trasierSpan = ((TrasierSpan) scope.span()).unwrap();
    }

    if (trasierSpan != null) {
        currentSpan.setIncomingData(new String(request.getContentAsByteArray()));
        currentSpan.setIncomingHeader(getRequestHeaders(request));
        currentSpan.setIncomingContentType(ContentType.XML);
        currentSpan.setBeginProcessingTimestamp(System.currentTimeMillis());
    }

    filterChain.doFilter(request, response); // try-catch omitted for readability

    if (trasierSpan != null) {
        trasierSpan.setOutgoingData(response.getContentAsByteArray());
        trasierSpan.setOutgoingHeader(extractHeaders(messageContext.getResponse()));
        trasierSpan.setOutgoingContentType(ContentType.XML);
        trasierSpan.setFinishProcessingTimestamp(System.currentTimeMillis());
        trasierSpan.setStatus(TrasierConstants.STATUS_OK);

        tracer.scopeManager().active().close();
    }
}

Integration without spring

At the moment the Java Client is spring based only. Further implementations are planned. By then one hast to handle the context propagation in the application on its own. The API, interceptors and client interface can be taken from trasier-client-core and trasier-client-api. The OAuth can be implemented in a similar way as it is in OAuthTokenSafe.

Integration in non-java application

At the moment there are no clients for languages other than Java. One must take care about the context propagation, auth and send the data to Trasier Backend.