Rebuild hawkular metrics after cluster deploy on OpenShift 3.11.
All operations are performed on the management node.
Assume that there was a change that requires to execute deploy cluster playbook.
$ ansible-playbook -i hosts playbooks/deploy_cluster.yml
Now, there are no metrics.
Log in to the OpenShift cluster.
$ oc login https://openshift-example.example.org:8443 --token=GWJqvER5-N3PBEGt14bwjx9K39ztqUqoUGQxki19kud -n openshift-infra
Logged into "https://openshift-example.example.org:8443" as "admin" using the token provided. You have access to the following projects and can switch between them with 'oc project <projectname>': default development-milosz kube-public kube-service-catalog kube-system management-infra openshift openshift-ansible-service-broker openshift-console openshift-descheduler * openshift-infra openshift-logging openshift-metrics-server openshift-monitoring openshift-node openshift-sdn openshift-template-service-broker openshift-web-console ops-view Using project "openshift-infra".
Notice, hawkular-metrics
is not running.
$ oc get pods -n openshift-infra
NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-kwgrf 1/1 Running 0 10m hawkular-metrics-jxr42 0/1 Running 1 10m hawkular-metrics-schema-psqgr 1/1 Running 0 10m heapster-ckfpw 0/1 Running 1 10m
Everything here is falling apart.
$ oc logs --tail=10 hawkular-metrics-jxr42 -n openshift-infra
2020-04-15 10:17:38,031 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://127.0.0.1:9990 2020-04-15 10:17:38,031 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 11.0.0.Final (WildFly Core 3.0.8.Final) started in 9055ms - Started 343 of 593 services (340 services are lazy, passive or on-demand) 2020-04-15 10:17:47,406 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2020-04-15 10:17:47,406 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2020-04-15 10:17:57,410 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2020-04-15 10:17:57,411 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2020-04-15 10:18:07,414 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2020-04-15 10:18:07,414 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2020-04-15 10:18:17,417 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2020-04-15 10:18:17,417 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
$ oc logs --tail=15 hawkular-cassandra-1-kwgrf -n openshift-infra
Caused by: javax.net.ssl.SSLHandshakeException: null cert chain at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[na:1.8.0_181] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666) ~[na:1.8.0_181] at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:330) ~[na:1.8.0_181] at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:318) ~[na:1.8.0_181] at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1935) ~[na:1.8.0_181] at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:237) ~[na:1.8.0_181] at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052) ~[na:1.8.0_181] at sun.security.ssl.Handshaker$1.run(Handshaker.java:992) ~[na:1.8.0_181] at sun.security.ssl.Handshaker$1.run(Handshaker.java:989) ~[na:1.8.0_181] at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_181] at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1467) ~[na:1.8.0_181] at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1256) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1169) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] ... 16 common frames omitted
$ oc logs --tail=10 hawkular-metrics-schema-psqgr -n openshift-infra
INFO 2020-04-15 10:22:06,740 [main] com.datastax.driver.core.ClockFactory:newInstance:52 - Using native clock to generate timestamps. INFO 2020-04-15 10:22:06,746 [main] org.hawkular.metrics.schema.Installer:initSession:134 - Cassandra may not be up yet. Retrying in 5000 ms INFO 2020-04-15 10:22:11,746 [main] com.datastax.driver.core.ClockFactory:newInstance:52 - Using native clock to generate timestamps. INFO 2020-04-15 10:22:11,755 [main] org.hawkular.metrics.schema.Installer:initSession:134 - Cassandra may not be up yet. Retrying in 5000 ms INFO 2020-04-15 10:22:16,755 [main] com.datastax.driver.core.ClockFactory:newInstance:52 - Using native clock to generate timestamps. INFO 2020-04-15 10:22:16,761 [main] org.hawkular.metrics.schema.Installer:initSession:134 - Cassandra may not be up yet. Retrying in 5000 ms INFO 2020-04-15 10:22:21,761 [main] com.datastax.driver.core.ClockFactory:newInstance:52 - Using native clock to generate timestamps. INFO 2020-04-15 10:22:21,768 [main] org.hawkular.metrics.schema.Installer:initSession:134 - Cassandra may not be up yet. Retrying in 5000 ms INFO 2020-04-15 10:22:26,769 [main] com.datastax.driver.core.ClockFactory:newInstance:52 - Using native clock to generate timestamps. INFO 2020-04-15 10:22:26,775 [main] org.hawkular.metrics.schema.Installer:initSession:134 - Cassandra may not be up yet. Retrying in 5000 ms
$ oc logs --tail=10 heapster-ckfpw -n openshift-infra
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Hawkular metrics route will return 503 Service Unavailable
.
$ curl --head https://hawkular-metrics.openshift-example.example.org/hawkular/metrics
HTTP/1.0 503 Service Unavailable Pragma: no-cache Cache-Control: private, max-age=0, no-cache, no-store Connection: close Content-Type: text/html
To rebuild hawkular metrics delete its hawkular-metrics-schema
job.
$ oc get jobs -n openshift-infra
NAME DESIRED SUCCESSFUL AGE hawkular-metrics-schema 1 0 19m
$ oc delete jobs/hawkular-metrics-schema -n openshift-infra
job.batch "hawkular-metrics-schema" deleted
Execute openshift-metrics/schema.yml
playbook.
$ ansible-playbook -i hosts playbooks/openshift-metrics/schema.yml
[...] PLAY RECAP ******************************************************************************************************************************************************* localhost : ok=12 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0 openshift-example-infra-1.example.org : ok=0 changed=0 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0 openshift-example-lb-1.example.org : ok=1 changed=0 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0 openshift-example-master-1.example.org : ok=53 changed=2 unreachable=0 failed=0 skipped=37 rescued=0 ignored=0 openshift-example-node-1.example.org : ok=0 changed=0 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0 openshift-example-node-2.example.org : ok=0 changed=0 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0 INSTALLER STATUS ************************************************************************************************************************************************* Initialization : Complete (0:00:11) Wednesday 15 April 2020 12:25:24 +0200 (0:00:00.143) 0:00:15.571 ******* =============================================================================== Gathering Facts ------------------------------------------------------------------------------------------------------------------------------------------- 1.75s openshift_metrics : generate hawkular-metrics schema job -------------------------------------------------------------------------------------------------- 0.76s Gather Cluster facts -------------------------------------------------------------------------------------------------------------------------------------- 0.64s get openshift_current_version ----------------------------------------------------------------------------------------------------------------------------- 0.59s openshift_metrics : Applying /tmp/openshift-metrics-ansible-7Eo6Im/templates/hawkular_metrics_schema_job.yaml --------------------------------------------- 0.54s openshift_metrics : Checking generation of Job hawkular-metrics-schema ------------------------------------------------------------------------------------ 0.43s openshift_metrics : Create temp directory for all our templates ------------------------------------------------------------------------------------------- 0.40s openshift_metrics : Determine change status of Job hawkular-metrics-schema -------------------------------------------------------------------------------- 0.39s openshift_control_plane : slurp --------------------------------------------------------------------------------------------------------------------------- 0.39s openshift_metrics : list installed jobs ------------------------------------------------------------------------------------------------------------------- 0.38s Detecting Operating System from ostree_booted ------------------------------------------------------------------------------------------------------------- 0.38s openshift_metrics : Create temp directory for doing work in on target ------------------------------------------------------------------------------------- 0.36s Initialize openshift.node.sdn_mtu ------------------------------------------------------------------------------------------------------------------------- 0.33s openshift_metrics : Create temp directory local on control node ------------------------------------------------------------------------------------------- 0.25s Fetch ca.crt from cluster if exists ----------------------------------------------------------------------------------------------------------------------- 0.21s openshift_metrics : Copy the admin client config(s) ------------------------------------------------------------------------------------------------------- 0.20s set_fact -------------------------------------------------------------------------------------------------------------------------------------------------- 0.19s openshift_control_plane : stat ---------------------------------------------------------------------------------------------------------------------------- 0.18s openshift_metrics : slurp --------------------------------------------------------------------------------------------------------------------------------- 0.18s openshift_sanitize_inventory : Check for usage of deprecated variables ------------------------------------------------------------------------------------ 0.17s
Inspect hawkular-metrics-schema
job.
$ oc get jobs -n openshift-infra
NAME DESIRED SUCCESSFUL AGE hawkular-metrics-schema 1 1 15s
Inspect pods.
$ oc get pods -n openshift-infra
NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-kwgrf 1/1 Running 0 20m hawkular-metrics-jxr42 1/1 Running 3 20m hawkular-metrics-schema-td29l 0/1 Completed 0 1m heapster-ckfpw 1/1 Running 2 20m
Hawkular metrics route will return 200 OK
.
$ curl --head https://hawkular-metrics.openshift-example.example.org/hawkular/metrics
HTTP/1.1 200 OK Cache-Control: no-cache Vary: Origin,Accept-Encoding X-Powered-By: Undertow/1 Server: WildFly/11 Content-Type: application/json Content-Length: 132 Date: Wed, 15 Apr 2020 10:27:12 GMT Set-Cookie: a054b5d9e987bf679f10c9d29be39478=3ce5579d1b00caa62afe078c982aca15; path=/; HttpOnly; Secure
Inspect logs.
$ oc logs --tail=10 hawkular-metrics-jxr42 -n openshift-infra
2020-04-15 10:25:40,102 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-5) gc_grace_seconds for locks is set to 864000. Resetting to 0 2020-04-15 10:25:40,192 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-4) gc_grace_seconds for metrics_idx is set to 864000. Resetting to 0 2020-04-15 10:25:40,331 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-6) gc_grace_seconds for metrics_tags_idx is set to 864000. Resetting to 0 2020-04-15 10:25:40,332 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-6) gc_grace_seconds for retentions_idx is set to 864000. Resetting to 0 2020-04-15 10:25:40,391 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-8) gc_grace_seconds for scheduled_jobs_idx is set to 864000. Resetting to 0 2020-04-15 10:25:40,497 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-1) gc_grace_seconds for sys_config is set to 864000. Resetting to 0 2020-04-15 10:25:40,497 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-1) gc_grace_seconds for tasks is set to 864000. Resetting to 0 2020-04-15 10:25:40,543 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-3) gc_grace_seconds for tenants is set to 864000. Resetting to 0 2020-04-15 10:25:40,799 INFO [org.hawkular.metrics.core.util.GCGraceSecondsManager] (RxComputationScheduler-6) Finished gc_grace_seconds updates in 1344 ms 2020-04-15 10:27:12,678 WARN [org.jboss.resteasy.resteasy_jaxrs.i18n] (default task-23) RESTEASY002142: Multiple resource methods match request "HEAD /". Selecting one. Matching methods: [public javax.ws.rs.core.Response org.hawkular.metrics.api.jaxrs.handler.BaseHandler.baseJSON(), public void org.hawkular.metrics.api.jaxrs.handler.BaseHandler.baseHTML(javax.servlet.ServletContext) throws java.lang.Exception]
Everything is fine now.